# 作業2

## Draw Random Numbers from Uniform Distributions

### Write code to draw a set of 10,000 random numbers that is uniformly distributed in (-2,3). 
 - Hint: Stretch $U(0,1)$ to fit the bound of $U(-2,3)$.
 
### What are the theoretical mean and standard deviation of the distribution $U(-2,3)$ ?
 
### Show the mean and the standard deviation of the set of random numbers you've drawn. 

In [2]:
using Distributions

d = Uniform(-2, 3);
println("theoretical mean = ", mean(d));
println("theoretical standard deviation = ", sqrt(var(d)));

a1 = rand(d, 10000);
println("mean = ", mean(a1));
println("standard deviation = ", sqrt(var(a1)));

theoretical mean = 0.5
theoretical standard deviation = 1.4433756729740645
mean = 0.489107366993748
standard deviation = 1.4415772168787697


## Draw Random Numbers from Normal Distributions

### Use `randn()` to draw a $1000\times 2$ matrix of random numbers from $N(2,3)$ (a normal distribution with mean=2 and variance=3). Use `mean()` and `var()` to show that the empirical mean and variance are close to the theoretical values.
- Hint: `randn()` generates N(0,1) random variables; you have to scale it to the appropriate mean and variance.
- Hint: For constants `a` and `b`: $x \sim N(\mu, \sigma^2)$, then $a*x \sim N(a*\mu, \ a^2 \sigma^2)$ and $x+b \sim N(\mu+b, \ \sigma^2)$.

### Use `rand()` to draw a $1000\times 2$ matrix of random numbers from $N(2,3)$. Use `mean()` and `var()` to show that the empirical mean and variance are close to the theoretical values.
- Hint: `rand()` could take distributions as arguments, as we've shown in the class.

In [3]:
m1 = randn(1000, 2) * sqrt(3) .+ 2;
println("m1 mean(empirical) = ", mean(m1));
println("m1 variance(empirical) = ", var(m1));

d = Normal(2, sqrt(3));
m2 = rand(d, (1000, 2));
println("m2 mean(empirical) = ", mean(m2));
println("m2 variance(empirical) = ", var(m2));

m1 mean(empirical) = 1.9330050569619814
m1 variance(empirical) = 3.0109499318740203
m2 mean(empirical) = 2.0044186938209108
m2 variance(empirical) = 2.9224489061537158


## Draw Regression Data: Cross-Sectional Model

Suppose you write your own routine to do fancy estimation on cross-sectional and panel data models. You want to conduct a Monte Carlo analysis to see if the routine works as expected and the returned answer is correct. The first thing you have to do is to generate data with pre-specified parameter values. (So that you could apply your estimation routine to the data and see if the estimated parameter values match the pre-specified (*true*) values.) 

Let's start from the cross-sectional model. The model is:
\begin{aligned} 
  y_i & = \alpha + \beta' x_i + \epsilon_i,\qquad i=1,\ldots,N,\\
  \epsilon_i & \sim N(0, \sigma^2).
\end{aligned}   

There could be more than one $x_i$ variable in the model; let's denote the number of $x_i$ as $\textrm{nofX}$. Write a function to generate data $\{y_i, x_i\}$.The function should allow users to choose values of $\{\alpha, \beta, \sigma^2, \textrm{nofX}, N\}$.
  - Hint: The $x_i$s are assumed (in econometrics) to be fixed and exogenous and therefore the distribution from which they are generated is inconsequential. （前面那句看不懂意思沒關係，重要是下面這句：）You may assume that they are generated from normal distributions.

In [46]:
using DataFrames

function genCrossSectionalData(α, β, σ, nofx, N)
    x = rand(Normal(), (N, nofx));
    ϵ = rand(Normal(0, σ), N);
    y = α .+ vec(x * transpose(β)) + ϵ;
    
    m = hcat(repeat(1:N), y, x);
    df = DataFrame(m, :auto);
    col_names = vcat(vec(["i", "y"]), ["x$i" for i in 1:nofx]);
    rename!(df, col_names);
    
    return df
end

# usage example 
genCrossSectionalData(2, [1 2 3 4 5], 4, 5, 7)

Row,i,y,x1,x2,x3,x4,x5
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,1.0,-1.95867,0.966174,-0.286486,-0.849059,0.144182,-1.48386
2,2.0,-1.2018,-0.674603,1.67055,0.552223,-0.426317,-0.390374
3,3.0,-2.35962,-0.573593,0.172284,0.776012,-1.41463,-0.473687
4,4.0,-6.2061,0.989915,-2.67461,0.513699,-0.0889016,-1.33287
5,5.0,10.202,0.494001,1.06947,0.209045,-1.3797,1.13499
6,6.0,14.9549,-3.73795,1.5853,-0.496657,1.80753,0.70596
7,7.0,19.4408,0.105969,-0.431515,1.62142,-0.0441126,1.71285


## Draw Regression Data: Panel Model

Suppose you also want to generate panel data to test your routine. The model is
\begin{aligned}
    y_{it} & = \alpha_i + \beta x_{it} + \epsilon_{it},\qquad i=1,\ldots,N,\ t=1,\ldots,T,\\
   \epsilon_{it} & \sim N(0, \sigma^2).
\end{aligned}   

Here, $i$ is the individual index and $t$ is the time index. For instance, $w_{13}$ means the value of $w$ for the 1st individual at 3rd time period. Assume the above model is the random-effect (RE) panel data model where $\alpha_i \sim N(0,\sigma_a^2)$ is a random variable which is independently distributed from $x_{it}$. Write a function to generate data of $\{y_{it}, x_{it}\}$ with the options of $\{\beta, \sigma^2, \sigma_a^2, \textrm{nofX}, N, T\}$. 

  - Hint: Draw $\alpha_i$ and expand it (`repeat()`) to fill the time periods. Generate $x_{it}$ and $\epsilon_{it}$. Then combine these elements according to the equation to create $y_{it}$.
  - Hint: You don't really need to understand what is RE model to generate the data. Just follow the notation and it should be ok.
  - Hint: `repeat()` would be useful here.
  - Hint: The structure of the dataset should look like the following. Note that $\alpha_i$ is constant within a given $i$ but would change acorss different $i$'s. 


|	i	|	t	|	y_it	|	alpha_i	|	x_it	|
| ---	| ---	|	--- 	| ---		|	---     |
|	1	|	1	|	0.173 	|	0.12	|	0.183 	|
|	1	|	2	|	0.372 	|	0.12	|	0.804 	|
|	1	|	3	|	0.239 	|	0.12	|	0.072 	|
|	1	|	4	|	0.791 	|	0.12	|	0.272 	|
|	2	|	1	|	0.443 	|	-0.45	|	0.705 	|
|	2	|	2	|	0.825 	|	-0.45	|	0.619 	|
|	2	|	3	|	0.681 	|	-0.45	|	0.769 	|
|	2	|	4	|	0.694 	|	-0.45	|	0.575 	|
|	3	|	1	|	0.192 	|	1.29	|	0.067 	|
|	3	|	2	|	0.072 	|	1.29	|	0.553 	|
|	3	|	3	|	0.522 	|	1.29	|	0.280 	|
|	3	|	4	|	0.021 	|	1.29	|	0.306 	|







In [47]:
function genPanelData(β, σ, σα, nofx::Integer, N::Integer, T::Integer)
    x = rand(Normal(), (N * T, nofx));
    ϵ = rand(Normal(0, σ), (N * T));
    
    αs = [repeat(rand(Normal(0, σα), 1), T) for _ in 1:N];
    α = αs[1];
    for i in 2:N
        α = vcat(α, αs[i]);
    end
    α = vec(α)
    
    y = α + vec(x * transpose(β)) + ϵ;
    m = hcat([i for i in 1:N for _ in 1:T], [i for _ in 1:N for i in 1:T], y, α, x)
    
    col_names = vcat(vec(["i" "t" "α" "y"]), ["x$i" for i in 1:nofx])
    df = DataFrame(m, :auto);
    rename!(df, col_names);
    
    return df
end

# usage example
genPanelData([1 2 3 4 5], 2, 3, 5, 4, 5) 

Row,i,t,α,y,x1,x2,x3,x4,x5
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,1.0,1.0,16.7941,7.86697,1.40796,1.62817,-0.100234,2.23697,-1.39978
2,1.0,2.0,-3.27877,7.86697,1.03417,-0.86898,-0.79571,-0.682059,-0.583259
3,1.0,3.0,4.34572,7.86697,-1.47984,0.603314,0.905927,0.559968,-1.09899
4,1.0,4.0,13.2054,7.86697,0.248917,-0.367608,-0.550322,1.27448,0.80534
5,1.0,5.0,7.85375,7.86697,-0.415601,-0.248869,0.823709,-1.05817,0.952851
6,2.0,1.0,3.15606,4.07328,-1.32431,1.04366,-0.259475,0.961963,-0.947292
7,2.0,2.0,18.9584,4.07328,-0.631389,1.20318,1.54582,0.843967,0.741295
8,2.0,3.0,6.1052,4.07328,-1.80559,-0.738059,0.544639,0.0105297,0.441611
9,2.0,4.0,-3.78242,4.07328,-0.248691,0.025421,-1.2689,-0.190716,-0.9009
10,2.0,5.0,-11.2848,4.07328,-1.82919,-2.71745,0.821358,-0.716592,-1.26155
