# Regression Assignment for ECON 8877 in the OSU

1. First, we test OLS with homoskedasticity

(a) For each row in the data, generate an error $\epsilon_{it}$ drawn iid from $N(0,7.5)$ (where 7.5 is the standard deviation, not variance).

(b) Next, generate a column of data called "Choice" using the formula
\begin{align*}
y_{it}=\beta_{0}+\beta_{1}T_{i}+\beta_{2}S_{i}+\beta_{3}W_{i}+\beta_{4}G_{i}+\beta_{5}X_{t}+\epsilon_{it}.
\end{align*}
Use the following "true" values for $\beta$:
\begin{align*}
\beta = (\beta_{0},\dots,\beta_{5})=(1,0.75,0,0.10,0,-0.10)
\end{align*}

(c) Run a regression to get $\hat{\beta}$. Note the $p$-values as well.

(d) Repeat the previous three steps 100 times.

(e) For each $\beta_{j}$, what is the median value of $\hat{\beta}_{j}$ (out of 100)? How close it to the true value of $\beta_{j}$? This is an estimate of the bias in your regression.

(f) For each $\beta_{j}$, count the fraction of the 100 regressions for which the $p$-value was below 0.05. This is the estimated power of that test.

We can deal with the problems above ((a)$\sim$(f)) with the code below.

We first make an environment for the project and add packages that are relevant. Also, activate all the packages.

In [2]:
using Pkg
pkg"activate @reg_env" # we create a new shared environment 
pkg"add CSV";pkg"add DataFrames";pkg"add MLBase";pkg"add Random";pkg"add LinearAlgebra";pkg"add Distributions";pkg"add GLM"
pkg"add CovarianceMatrices";pkg"add BrowseTables";
using CSV, DataFrames, MLBase, Random, LinearAlgebra, Distributions, GLM, CovarianceMatrices, BrowseTables



└ @ Pkg.REPLMode C:\Users\LG\AppData\Local\Programs\julia-1.9.4\share\julia\stdlib\v1.9\Pkg\src\REPLMode\REPLMode.jl:382


[32m[1m  Activating[22m[39m project at `C:\Users\LG\.julia\environments\reg_env`


[32m[1m    Updating[22m[39m registry at `C:\Users\LG\.julia\registries\JuliaComputingRegistry`
[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m 

to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


[32m[1m   Resolving[22m[39m

 package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`




[32m[1m   Resolving[22m[39m

 package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m

 to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


[32m[1m   Resolving[22m[39m 

package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


[32m[1m   Resolving[22m[39m 

package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


[32m[1m   Resolving[22m[39m

 package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


Then, make a function that make the csv file into a data frame and define the variables with names.

In [3]:
function environment()
    df = DataFrame(CSV.File("fakedata.csv"))
    SubjectID = df[:,1];
    DecisionID = df[:,2];
    Constant = df[:,3];
    Treatment = df[:,4];
    SessionID = df[:,5];
    SwitchPoint = df[:,6];
    Gender = df[:,7];
    Complexity = df[:,8];
    return SubjectID, DecisionID, Constant, Treatment, SessionID, SwitchPoint, Gender, Complexity
end

environment (generic function with 1 method)

Next, we make the function that conducts the main simulation.

In [18]:
function First(rep_time)
    SubjectID, DecisionID, Constant, Treatment, SessionID, SwitchPoint, Gender, Complexity = environment();
    n_i=length(unique(SubjectID));n_t=length(unique(DecisionID));
    X = [Constant Treatment SessionID SwitchPoint Gender Complexity];
    β = [1.0; 0.75; 0.0; 0.01; 0.0; -0.1]; n_β = length(β);
    β̂_Mat_builtin = zeros(n_β,rep_time);
    p_val_Mat_builtin = zeros(n_β,rep_time);
    for s = 1:rep_time
        ϵ = rand(Normal(0,7.5),n_i*n_t);
        Choice = X * β + ϵ; df=(;Choice,Treatment,SessionID, SwitchPoint, Gender, Complexity);
        lm_builtin = lm(@formula(Choice~Treatment+SessionID+SwitchPoint+Gender+Complexity),df);
        p_val_Mat_builtin[:,s] = coeftable(lm_builtin).cols[4];
        β̂_Mat_builtin[:,s] = coeftable(lm_builtin).cols[1];
    end # end from 1.(a) to 1.(d)
    median_β̂_builtin = median(β̂_Mat_builtin, dims=2);
    power_β̂_builtin = sum(p_val_Mat_builtin.<0.05, dims=2)./rep_time # 1.(f)    
    return β̂_Mat_builtin, p_val_Mat_builtin, median_β̂_builtin, power_β̂_builtin
end

β̂_Mat1, p_val_Mat1, median_β̂1, power_β̂1 = First(1000)

println(median_β̂1.-[1.0; 0.75; 0.0; 0.01; 0.0; -0.1])
println(power_β̂1)

[-0.022626215401090888; -0.01967510405241346; 0.0031795361697899644; -0.0008364421665377816; -0.004973402767749414; 0.0028496083330960265;;]
[0.328; 0.186; 0.056; 0.046; 0.049; 0.682;;]


We can see that the defined biases are quite small. However, the power of the test is very low. For example, the power of $β_{3}$ is almost similar to the rejection rate of $β_{2}$ and $β_{4}$, whose true values are 0. In return, Type I error is less occurring, i.e., almost exact.

Now let's move on to the second question.

2. Re-run that exercise, but with heterskedasticity. Specifically, let the standard deviation of $ϵ_{it}$ equal to the Decision ID number. So, for example, everyone whose choice on decision 8 has noise $ϵ_{it}\sim N(0,8)$. Re-generate these errors with that structure, and then re-generate the Choice data using the same linear equation as above.

(a) This time, run 100 regressions of OLS, then 100 each with HC0, HC1, HC2, and HC3.

In [4]:
function pnorm_std(x::Real) #Since I will make the manual code fot the test, we need to define this
    p = cdf(Normal(0,1),x)
    return p
end

function Second(rep_time)
    SubjectID, DecisionID, Constant, Treatment, SessionID, SwitchPoint, Gender, Complexity = environment();
    n_i=length(unique(SubjectID));n_t=length(unique(DecisionID));
    X = [Constant Treatment SessionID SwitchPoint Gender Complexity];
    β = [1.0; 0.75; 0.0; 0.01; 0.0; -0.1]; n_β = length(β);
    β̂_Mat = zeros(n_β,rep_time);
    ϵ = zeros(n_i*n_t);
    p_val_Mat_built = zeros(n_β,rep_time);
    p_val_Mat_HC0 = zeros(n_β,rep_time);
    p_val_Mat_HC1 = similar(p_val_Mat_HC0);
    p_val_Mat_HC2 = similar(p_val_Mat_HC0);
    p_val_Mat_HC3 = similar(p_val_Mat_HC0);
    for s = 1:rep_time
        for j = 1:(n_i*n_t)
            ϵ[j] = rand(Normal(0,DecisionID[j]))
        end
        Choice = X * β + ϵ; df=(;Choice,Treatment,SessionID, SwitchPoint, Gender, Complexity);
        β̂  = (X' * X) \ (X' * Choice); Choicê = X * β̂ ; ϵ̂  = Choice .- Choicê;
        Var_β̂_HC0 = inv(X' * X) * X' * Diagonal(diag(ϵ̂  * ϵ̂' )) * X * inv(X' * X);
        Var_β̂_HC1 = Var_β̂_HC0 .* (n_i*n_t) ./ (n_i*n_t - n_β);
        Var_β̂_HC2 = inv(X' * X) * X' * Diagonal(diag(ϵ̂  * ϵ̂') ./ (1 .- diag(X*inv(X'*X)*X'))) * X * inv(X' * X);
        Var_β̂_HC3 = inv(X' * X) * X' * Diagonal(diag(ϵ̂  * ϵ̂') ./ ((1 .- diag(X*inv(X'*X)*X')).^2)) * X * inv(X' * X);
        t_val_HC0 = β̂  ./ sqrt.(diag( Var_β̂_HC0 ));
        t_val_HC1 = β̂  ./ sqrt.(diag( Var_β̂_HC1 ));
        t_val_HC2 = β̂  ./ sqrt.(diag( Var_β̂_HC2 ));
        t_val_HC3 = β̂  ./ sqrt.(diag( Var_β̂_HC3 ));
        lm_builtin = lm(@formula(Choice~Treatment+SessionID+SwitchPoint+Gender+Complexity),df);
        for l = 1:n_β
            p_val_Mat_HC0[l,s] = ifelse(t_val_HC0[l]>0, 1-pnorm_std(t_val_HC0[l])+pnorm_std(-t_val_HC0[l]), pnorm_std(t_val_HC0[l])+1-pnorm_std(-t_val_HC0[l]));
            p_val_Mat_HC1[l,s] = ifelse(t_val_HC1[l]>0, 1-pnorm_std(t_val_HC1[l])+pnorm_std(-t_val_HC1[l]), pnorm_std(t_val_HC1[l])+1-pnorm_std(-t_val_HC1[l]));
            p_val_Mat_HC2[l,s] = ifelse(t_val_HC2[l]>0, 1-pnorm_std(t_val_HC2[l])+pnorm_std(-t_val_HC2[l]), pnorm_std(t_val_HC2[l])+1-pnorm_std(-t_val_HC2[l]));
            p_val_Mat_HC3[l,s] = ifelse(t_val_HC3[l]>0, 1-pnorm_std(t_val_HC3[l])+pnorm_std(-t_val_HC3[l]), pnorm_std(t_val_HC3[l])+1-pnorm_std(-t_val_HC3[l]));
        end
        p_val_Mat_built[:,s] = coeftable(lm_builtin).cols[4]
        β̂_Mat[:,s] = β̂ 
    end # end from 1.(a) to 1.(d)
    median_β̂  = median(β̂_Mat, dims=2); # 1.(e)
    power_β̂_builtin = sum(p_val_Mat_built.<0.05, dims=2)./rep_time
    power_β̂_HC0  = sum(p_val_Mat_HC0.<0.05, dims=2)./rep_time
    power_β̂_HC1  = sum(p_val_Mat_HC1.<0.05, dims=2)./rep_time
    power_β̂_HC2  = sum(p_val_Mat_HC2.<0.05, dims=2)./rep_time
    power_β̂_HC3  = sum(p_val_Mat_HC3.<0.05, dims=2)./rep_time
    p_val_Mat = [p_val_Mat_built;;;p_val_Mat_HC0;;;p_val_Mat_HC1;;;p_val_Mat_HC2;;;p_val_Mat_HC3]
    return β̂_Mat, p_val_Mat, median_β̂ , power_β̂_builtin, power_β̂_HC0, power_β̂_HC1, power_β̂_HC2, power_β̂_HC3
end

@time β̂_Mat2, p_val_Mat2, median_β̂2 , power_β̂_builtin2, power_β̂_HC02, power_β̂_HC12, power_β̂_HC22, power_β̂_HC32 = Second(100)

powerdata=DataFrame(nonrobust=vec(power_β̂_builtin2),HC0=vec(power_β̂_HC02),HC1=vec(power_β̂_HC12),HC2=vec(power_β̂_HC22),HC3=vec(power_β̂_HC32));
println(powerdata);
open_html_table(powerdata);

 16.094367 seconds (14.71 M allocations: 13.151 GiB, 6.86% gc time, 104.38% compilation time: 5% of which was recompilation)


[1m6×5 DataFrame[0m
[1m Row [0m│[1m nonrobust [0m[1m HC0     [0m[1m HC1     [0m[1m HC2     [0m[1m HC3     [0m
     │[90m Float64   [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m
─────┼───────────────────────────────────────────────
   1 │      0.2      0.31     0.31     0.31     0.3
   2 │      0.12     0.13     0.13     0.13     0.13
   3 │      0.08     0.08     0.07     0.07     0.07
   4 │      0.05     0.05     0.05     0.05     0.05
   5 │      0.08     0.08     0.08     0.08     0.08
   6 │      0.51     0.46     0.46     0.46     0.46




(b) Which methods give unbiased estimates?

Since regression is the same for all the methods, they are all unbiased estimates.

(c) Among methods with unbiased estimates, which has the most power (when $\beta_{j}\neq 0$).

I felt that it is a little vague to check this when there is only 100 repetition. So I have chosen to do this with 1000 repetition. As can be seen from the power table, within heteroskedasticity robust tests, HC0 has the largest power and power keeps go down from HC1 to HC3. When compared to non-robust one, for some $\beta$, non-robust reject more and for the others, robust tests reject more.

3. Now let's add session effects. The way to do this is first create a random column $ϵ_{it}\sim N(0,7.5)$ of uncorrelated errors. Then, for each session $S$, generate a \textit{single} random number $u_{S}\sim N(0,7.5)$. Finally, generate the Choice variable using the linear equation
\begin{align*}
y_{it}=\beta_{0}+\beta_{1}T_{i}+\beta_{2}S_{i}+\beta_{3}W_{i}+β_{4}G_{i}+\beta_{5}X_{t}+u_{S_{i}}+ϵ_{it}.
\end{align*}
Use the same values for $\beta$ as above.

(a) Run 100 regressions of OLS, ignoring the session effects. Are your estimates biased? Is your power affected? Are the tests still valid (rejecting 5% of the time when $\beta=0$)? Pay special attention to $\hat{\beta}_{2}$, which measures session effects directly.

In [6]:
function Third_a(rep_time)
    SubjectID, DecisionID, Constant, Treatment, SessionID, SwitchPoint, Gender, Complexity = environment();
    n_i=length(unique(SubjectID));n_t=length(unique(DecisionID));n_s=length(unique(SessionID));
    X = [Constant Treatment SessionID SwitchPoint Gender Complexity];
    β = [1.0; 0.75; 0.0; 0.01; 0.0; -0.1]; n_β = length(β);
    β̂_Mat = zeros(n_β,rep_time);
    p_val_Mat_builtin = zeros(n_β,rep_time);
    for s = 1:rep_time
        ϵ = rand(Normal(0,7.5),n_i*n_t);u = repeat(rand(Normal(0,7.5),n_s),inner=Int64(n_i*n_t/n_s));
        Choice = X * β + ϵ + u; df=(;Choice,Treatment,SessionID, SwitchPoint, Gender, Complexity);
        lm_builtin = lm(@formula(Choice~Treatment+SessionID+SwitchPoint+Gender+Complexity),df);
        p_val_Mat_builtin[:,s] = coeftable(lm_builtin).cols[4]
        β̂_Mat[:,s] = coeftable(lm_builtin).cols[1]
    end # end from 1.(a) to 1.(d)
    bias = median(β̂_Mat, dims = 2) .- β;
    power_β̂_builtin = sum(p_val_Mat_builtin.<0.05, dims=2)./rep_time
    return β̂_Mat, bias, power_β̂_builtin
end

@time β̂_Mat3a, bias3a, power3a =Third_a(1000)

println(DataFrame(bias = vec(bias3a), power = vec(power3a)))

  0.496358 seconds (712.35 k allocations: 583.866 MiB, 16.36% gc time)
[1m6×2 DataFrame[0m
[1m Row [0m│[1m bias         [0m[1m power   [0m
     │[90m Float64      [0m[90m Float64 [0m
─────┼───────────────────────
   1 │ -0.0268553      0.669
   2 │ -0.289288       0.771
   3 │  0.01295        0.791
   4 │ -0.00750561     0.489
   5 │ -0.00317886     0.009
   6 │ -0.000104357    0.434


The bias for $\beta_{1}$ keeps look large. But biases for other $β$ s are not very severe always, including $β_{2}$. The biggest problem seems to be the power of $β_{2}$. It should be close to 0.05 to be valid because the true value is 0, but the rejection rate is far more high.

(b) Run 100 regressions using with clustering at the session level. Does this fix any bias? Does it give better power? Are tests valid?

Let's describe how the clustered variance-covariance matrix will look like.
\begin{align*}
Cov(u_{S_{i}}+ϵ_{it},u_{S_{j}}+ϵ_{jt})=
\begin{cases}  
V(u_{S_{i}}),& i\neq j\:\&\:S_{i}=S_{j}\\
V(u_{S_{i}})+V(ϵ_{it}),& i=j\\
0, & o/w.
\end{cases}
\end{align*}
Then, in terms of estimator, we can use the method given in several textbooks and also used in Stata, which is 
\begin{align*}
    &\hat{V}_{\hat{\beta}}=a_{n}(X'X)^{-1}\hat{Ω}(X'X)^{-1}\\
    \text{where }&\hat{Ω}=\sum_{s=1}^{20}X_{s}'\hat{e}_{s}\hat{e}_{s}'X_{s}\text{ and }a_{n}=\frac{n-1}{n-k}\frac{S}{S-1}.
\end{align*}
Here, $S$ is the number of clusters.

In [15]:
function Third_b(rep_time)
    SubjectID, DecisionID, Constant, Treatment, SessionID, SwitchPoint, Gender, Complexity = environment();
    n_i=length(unique(SubjectID));n_t=length(unique(DecisionID));n_s=length(unique(SessionID));
    X = [Constant Treatment SessionID SwitchPoint Gender Complexity];
    β = [1.0; 0.75; 0.0; 0.01; 0.0; -0.1]; n_β = length(β);
    β̂_Mat = zeros(n_β,rep_time);
    p_val_Mat_cluster = zeros(n_β,rep_time);
    for s = 1:rep_time
        ϵ = rand(Normal(0,7.5),n_i*n_t);u = repeat(rand(Normal(0,7.5),n_s),inner=Int64(n_i*n_t/n_s));
        Choice = X * β + ϵ + u; df=(;Choice,Treatment,SessionID, SwitchPoint, Gender, Complexity);
        lm_builtin = lm(@formula(Choice~Treatment+SessionID+SwitchPoint+Gender+Complexity),df);
        β̂_Mat[:,s] = coeftable(lm_builtin).cols[1];
        resd = Choice .- fitted(lm_builtin);
        part_resd = collect(Iterators.partition(resd,Int64(n_i*n_t/n_s))); # partitioning the residuals wrt clusters
        part_X = [X[i:min(i+Int64(n_i*n_t/n_s)-1,Int64(n_i*n_t)),:] for i in 1:Int64(n_i*n_t/n_s):Int64(n_i*n_t)]; # partitioning the covariates wrt clusters
        Ω̂  = zeros(n_β,n_β);
        for j = 1:n_s
            X_g = part_X[j]
            resd_g = part_resd[j]
            Ω̂  = Ω̂  + X_g' * resd_g * resd_g' * X_g
        end
        a = (n_i*n_t-1)/(n_i*n_t-n_β)*n_s/(n_s-1);
        Var_β̂_cluster = a .* inv(X' * X) * Ω̂  * inv(X' * X);
        t_val_cluster = β̂_Mat[:,s] ./ sqrt.(diag( Var_β̂_cluster ));
        for l = 1:n_β
            p_val_Mat_cluster[l,s] = ifelse(t_val_cluster[l]>0, 1-pnorm_std(t_val_cluster[l])+pnorm_std(-t_val_cluster[l]), pnorm_std(t_val_cluster[l])+1-pnorm_std(-t_val_cluster[l]));
        end
    end # end from 1.(a) to 1.(d)
    bias = median(β̂_Mat, dims = 2) .- β;
    power_β̂_cluster = sum(p_val_Mat_cluster.<0.05, dims=2)./rep_time
    return β̂_Mat, bias, power_β̂_cluster
end

@time β̂_Mat3b, bias3b, power3b =Third_b(1000)

println(DataFrame(bias = vec(bias3b), power = vec(power3b)))

  1.115893 seconds (1.11 M allocations: 1.365 GiB, 19.94% gc time, 15.32% compilation time)
[1m6×2 DataFrame[0m
[1m Row [0m│[1m bias         [0m[1m power   [0m
     │[90m Float64      [0m[90m Float64 [0m
─────┼───────────────────────
   1 │  0.202582       0.103
   2 │ -0.09204        0.101
   3 │ -0.0082413      0.106
   4 │  0.0175407      0.073
   5 │  0.0242729      0.05
   6 │ -0.000971089    0.687


Since we don't change the way we estimate the $β$ s, there is no systematical reason that bias should change. However, it is quite prominent that the power of the most of the coefficients has dropped significantly. But still it is not valid as can be seen from the table. Being not sure about my manual code for clustered variance-covariance matrix, I used the package "FixedEffectModels" to double check. 

In [16]:
pkg"add FixedEffectModels"
using FixedEffectModels

function Third_b_ver2(rep_time)
    SubjectID, DecisionID, Constant, Treatment, SessionID, SwitchPoint, Gender, Complexity = environment();
    n_i=length(unique(SubjectID));n_t=length(unique(DecisionID));n_s=length(unique(SessionID));
    X = [Constant Treatment SessionID SwitchPoint Gender Complexity];
    β = [1.0; 0.75; 0.0; 0.01; 0.0; -0.1]; n_β = length(β);
    β̂_Mat = zeros(n_β,rep_time);
    p_val_Mat_cluster = zeros(n_β,rep_time);
    for s = 1:rep_time
        ϵ = rand(Normal(0,7.5),n_i*n_t);u = repeat(rand(Normal(0,7.5),n_s),inner=Int64(n_i*n_t/n_s));
        Choice = X * β + ϵ + u; df=(;Choice,Treatment,SessionID, SwitchPoint, Gender, Complexity);
        lm_builtin = reg(df, @formula(Choice~Treatment+SessionID+SwitchPoint+Gender+Complexity), Vcov.cluster(:SessionID));
        β̂_Mat[:,s] = coef(lm_builtin);
        p_val_Mat_cluster[:,s] = coeftable(lm_builtin).cols[4];
    end # end from 1.(a) to 1.(d)
    bias = median(β̂_Mat, dims = 2) .- β;
    power_β̂_cluster = sum(p_val_Mat_cluster.<0.05, dims=2)./rep_time
    return β̂_Mat, bias, power_β̂_cluster
end

@time β̂_Mat3b_ver2, bias3b_ver2, power3b_ver2 = Third_b_ver2(1000)
# We have to adjust the order of rows to match since FixedEffectModels give intercept at the last row.
β̂_Mat3b_ver2 = vcat(β̂_Mat3b_ver2[size(β̂_Mat3b_ver2,1),:]',β̂_Mat3b_ver2);
β̂_Mat3b_ver2 = β̂_Mat3b_ver2[1:size(β̂_Mat3b_ver2,1)-1]
bias3b_ver2 = vcat(bias3b_ver2[size(bias3b_ver2,1),:]',bias3b_ver2);
bias3b_ver2 = bias3b_ver2[1:size(bias3b_ver2,1)-1,:];
power3b_ver2 = vcat(power3b_ver2[size(power3b_ver2,1),:]',power3b_ver2);
power3b_ver2 = power3b_ver2[1:size(power3b_ver2,1)-1,:];

println(DataFrame(bias = vec(bias3b_ver2), power = vec(power3b_ver2)))

[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`


  0.562303 seconds (1.17 M allocations: 366.172 MiB, 8.66% gc time)
[1m6×2 DataFrame[0m
[1m Row [0m│[1m bias         [0m[1m power   [0m
     │[90m Float64      [0m[90m Float64 [0m
─────┼───────────────────────
   1 │  0.000472523    0.09
   2 │  0.0678176      0.073
   3 │  0.167526       0.066
   4 │ -0.0128377      0.053
   5 │  0.00037663     0.037
   6 │  0.0141433      0.629


Now the result shows that the variance-covariance may not be very different because the characteristics of power doesn't seem to be very much different.

(c) Run 100 regressions using random effects. Note that the $u_{S}$ is highly correlated with one of the $X$ columns, so we're told this is not valid. Analyze bias, power, and validity.

Now, to use the built-in package, I'll open "Econometrics" package and work with it.

In [17]:
pkg"add Econometrics"
using Econometrics



[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\LG\.julia\environments\reg_env\Manifest.toml`
