# 作業2

## Draw Random Numbers from Uniform Distributions

### Write code to draw a set of 10,000 random numbers that is uniformly distributed in (-2,3). 
 - Hint: Stretch $U(0,1)$ to fit the bound of $U(-2,3)$.
 
### What are the theoretical mean and standard deviation of the distribution $U(-2,3)$ ?
 
### Show the mean and the standard deviation of the set of random numbers you've drawn. 

In [5]:
using Random
using Distributions
using Statistics 


In [18]:
data=rand(Uniform(-2, 3),10000)

10000-element Vector{Float64}:
  2.842161048015461
  1.4020855784556616
  2.06629798712299
  2.271712367231606
  2.7038221457272105
  0.9871607327443108
  0.3967335212094443
 -1.0122977575651353
 -1.4733608256825164
  2.119946812351275
  1.5066784671479958
  1.7815342350933259
 -1.0477866595931455
  ⋮
 -0.32707053080953385
 -1.4540961437563409
 -0.0776013773925337
  0.9009145766040403
 -1.604540086742972
  1.9838427187762937
 -0.49791081675168924
  2.059220580348315
  0.4376259142763592
  2.8735997529701827
  2.692772989985067
 -1.3625861292027235

In [51]:
print("theoretical \nMean:",mean(Uniform(-2, 3)),"\n")
print("Var:",var(Uniform(-2, 3)))

theoretical 
Mean:0.5
Var:2.0833333333333335

In [23]:
print("Data Mean:",mean(data),"\n")
print("Data Var:",var(data))

Data Mean:0.5092348401137712
Data Var:2.094171927491028

## Draw Random Numbers from Normal Distributions

### Use `randn()` to draw a $1000\times 2$ matrix of random numbers from $N(2,3)$ (a normal distribution with mean=2 and variance=3). Use `mean()` and `var()` to show that the empirical mean and variance are close to the theoretical values.
- Hint: `randn()` generates N(0,1) random variables; you have to scale it to the appropriate mean and variance.
- Hint: For constants `a` and `b`: $x \sim N(\mu, \sigma^2)$, then $a*x \sim N(a*\mu, \ a^2 \sigma^2)$ and $x+b \sim N(\mu+b, \ \sigma^2)$.

### Use `rand()` to draw a $1000\times 2$ matrix of random numbers from $N(2,3)$. Use `mean()` and `var()` to show that the empirical mean and variance are close to the theoretical values.
- Hint: `rand()` could take distributions as arguments, as we've shown in the class.

In [44]:
print("theoretical Mean:",2,"\n")
print("theoretical Var:",3)

theoretical Mean:2
theoretical Var:3

In [45]:
data=randn(1000,2)
data=data*sqrt(3).+2
print("Use rand\nMean:",mean(data),"\n")
print("Var:",var(data))

Use rand
Mean:1.9386496548428536
Var:3.0860121619593475

In [46]:
d = Normal(2,sqrt(3))
d=rand(d,(1000,2))
print("Use rand \nMean:",mean(d),"\n")
print("Var:",var(d))

Use rand 
Mean:1.9875930382309133
Var:2.951457717813412

## Draw Regression Data: Cross-Sectional Model

Suppose you write your own routine to do fancy estimation on cross-sectional and panel data models. You want to conduct a Monte Carlo analysis to see if the routine works as expected and the returned answer is correct. The first thing you have to do is to generate data with pre-specified parameter values. (So that you could apply your estimation routine to the data and see if the estimated parameter values match the pre-specified (*true*) values.) 

Let's start from the cross-sectional model. The model is:
\begin{aligned} 
  y_i & = \alpha + \beta' x_i + \epsilon_i,\qquad i=1,\ldots,N,\\
  \epsilon_i & \sim N(0, \sigma^2).
\end{aligned}   

There could be more than one $x_i$ variable in the model; let's denote the number of $x_i$ as $\textrm{nofX}$. Write a function to generate data $\{y_i, x_i\}$.The function should allow users to choose values of $\{\alpha, \beta, \sigma^2, \textrm{nofX}, N\}$.
  - Hint: The $x_i$s are assumed (in econometrics) to be fixed and exogenous and therefore the distribution from which they are generated is inconsequential. （前面那句看不懂意思沒關係，重要是下面這句：）You may assume that they are generated from normal distributions.

In [6]:
struct pair
    x
    y
end

In [7]:
function f(alpha,beta,variance,nofX,N)
    n=N
    epsilon=rand(Normal(0,sqrt(variance)),(n,1))
    x=randn(n,nofX)
    y=x*beta.+alpha.+epsilon
    return pair(x, y)
end

f (generic function with 1 method)

In [8]:
f(2,[1;8;2],5,3,10).x

10×3 Matrix{Float64}:
  0.246413  -0.299996  -0.602659
 -0.950908   1.40816    0.824723
 -1.83148   -0.563638   0.363264
  0.644738  -1.85396    1.38399
  0.723585  -0.959904  -0.431422
  0.46174    0.930508   1.05736
 -3.89429    0.969071  -0.0405705
 -0.579588   0.165256   1.47947
  0.877231  -0.622664   0.285446
 -0.165749  -2.42272    0.608176

In [10]:
f(2,[1;8;2],5,3,10).y

10×1 Matrix{Float64}:
  14.28390135492201
  12.79919331035803
 -17.352200423035505
  -2.43101010811044
  13.808679689272331
  -8.231582659190867
  -3.0419404933904697
   0.43049139049366447
   5.137050201028448
   2.944371630124516

## Draw Regression Data: Panel Model

Suppose you also want to generate panel data to test your routine. The model is
\begin{aligned}
    y_{it} & = \alpha_i + \beta x_{it} + \epsilon_{it},\qquad i=1,\ldots,N,\ t=1,\ldots,T,\\
   \epsilon_{it} & \sim N(0, \sigma^2).
\end{aligned}   

Here, $i$ is the individual index and $t$ is the time index. For instance, $w_{13}$ means the value of $w$ for the 1st individual at 3rd time period. Assume the above model is the random-effect (RE) panel data model where $\alpha_i \sim N(0,\sigma_a^2)$ is a random variable which is independently distributed from $x_{it}$. Write a function to generate data of $\{y_{it}, x_{it}\}$ with the options of $\{\beta, \sigma^2, \sigma_a^2, \textrm{nofX}, N, T\}$. 

  - Hint: Draw $\alpha_i$ and expand it (`repeat()`) to fill the time periods. Generate $x_{it}$ and $\epsilon_{it}$. Then combine these elements according to the equation to create $y_{it}$.
  - Hint: You don't really need to understand what is RE model to generate the data. Just follow the notation and it should be ok.
  - Hint: `repeat()` would be useful here.
  - Hint: The structure of the dataset should look like the following. Note that $\alpha_i$ is constant within a given $i$ but would change acorss different $i$'s. 


|	i	|	t	|	y_it	|	alpha_i	|	x_it	|
| ---	| ---	|	--- 	| ---		|	---     |
|	1	|	1	|	0.173 	|	0.12	|	0.183 	|
|	1	|	2	|	0.372 	|	0.12	|	0.804 	|
|	1	|	3	|	0.239 	|	0.12	|	0.072 	|
|	1	|	4	|	0.791 	|	0.12	|	0.272 	|
|	2	|	1	|	0.443 	|	-0.45	|	0.705 	|
|	2	|	2	|	0.825 	|	-0.45	|	0.619 	|
|	2	|	3	|	0.681 	|	-0.45	|	0.769 	|
|	2	|	4	|	0.694 	|	-0.45	|	0.575 	|
|	3	|	1	|	0.192 	|	1.29	|	0.067 	|
|	3	|	2	|	0.072 	|	1.29	|	0.553 	|
|	3	|	3	|	0.522 	|	1.29	|	0.280 	|
|	3	|	4	|	0.021 	|	1.29	|	0.306 	|







In [12]:
function f(beta,var,var_a,nofX,N,T)
    epsilon=rand(Normal(0,sqrt(var)),(N*T,1))
    x=randn(N*T,nofX)
    a=rand(Normal(0,sqrt(var_a)),N)
    alpha=repeat(a ,inner=T)
    y=x*beta.+alpha.+epsilon
    return pair(x, y)
end

f (generic function with 2 methods)

In [14]:
f(2,4,5,1,3,4).x

12×1 Matrix{Float64}:
 -1.4367227682115067
  1.5975099156227763
  0.672705683624197
 -1.381674243527139
 -0.34475620522644296
 -0.8195145854269038
 -0.30986549580465095
 -1.1374134225599404
 -0.27260217487608296
  0.7133596431348757
 -0.5721970916796492
  0.552195307041656

In [15]:
f(2,4,5,1,3,4).y

12×1 Matrix{Float64}:
  0.8309733548560265
  6.1502209525307645
  2.7319957332695686
  2.4103983338095905
 -5.806836332044119
 -6.055480349850983
 -0.98779627894293
 -2.493226714961878
  2.956546073122863
  1.1378567685102543
  2.64004928931082
 -1.0356293662357081