# Test of how Correlated Regressors (a "notion" of Collinearity) affects F-test

We will design the following DGP process:

\begin{align*}
y_{i} &= 2 + 3\cdot x_{i,1} + 4\cdot x_{i,2}+\varepsilon_{i} \\
x_{i,2}&=\rho\cdot x_{i,1}+\nu_{i} \\
x_{i,1}&\sim\text{i.i.d. }\mathcal{U}(0,1) \\
\nu_{i}&\mid x_{i,1}\sim\text{i.i.d. }\mathcal{N}(0,2) \\
\varepsilon_{i}&\mid x_{i,1},x_{i,2}\sim\text{i.i.d. }\mathcal{N}(0,1)
\end{align*}

In the above $\rho$ is a parameter that will control the degree to which $x_{1}$ and $x_{2}$ are correlated. This is one mechanism of affecting correlation. Another would be to generate a parameter that affects the disperson of the noise we are adding to the correlation process, $\nu$. The more dispersed the noise the less correlated the two random variables will be; the less dispersed the noise the greater the degree of correlation. Obviously, if we didn't add this noise to the process then $x_1$ and $x_2$ would be perfectly colinear. The added noise merely induces a correlation rather than a colinearity among random variables. This is what is meant by "degree of colinearity" among random variables.  

We know that estimation of the $\beta$s of the above process will be biased in small samples depending on the degree of correlation between the two regressors. However, in large samples and as $N\rightarrow\infty$ the estimation of the parameters will be consistent. So we will not be looking at this part of the problem. Rather, we will study the effect of correlation between regressors on the standard errors of the parameters and its effect on inference testing. 

We will look at a sample size of $N=50$ and for varying $\rho$s: $\rho=[1,5,10,100]$. We will do 1000 simulations each time.

In [124]:
%This cell initializes the process and the variables

%Sample Size
N = 50;

%Number of simulations
T = 1000;

%True betas
beta = [2,3,4]';
K = size(beta,1);

%Correlation parameters
rho = [1,5,10,100];

%Initialize the simulation results
beta_sim = NaN(size(beta,1),T,size(rho,2));
t_sim = NaN(size(beta,1),T,size(rho,2));
F_sim = NaN(T,size(rho,2));

%F-test R and r matrices (global sig of x_1,x_2)
R = [0,1,0;0,0,1];
r = [0;0];
q = size(r,1);

%Set the value of x_1 and the constant
x_1 = rand(N,1);
const = ones(N,1);

In [125]:
%Cell for the Monte-Carlo simulation

parfor p=1:size(rho,2)

    for t=1:T
    
        %generate the x_2
        x_2 = rho(p)*x_1 + 2*randn(N,1);
        
        %generate the design matrix
        X = [const x_1 x_2];
        
        %generate the Y vector from DGP equation
        Y = X*beta + randn(N,1);
        
        %estimate the DGP process
        beta_sim_1 = (X'*X)\X'*Y;
        beta_sim(:,t,p) = beta_sim_1;
        
        %generate the standard errors
        sig2 = ((Y - X*beta_sim_1)'*(Y - X*beta_sim_1))/(N-K);
        
        SE = sqrt(diag(sig2*inv(X'*X)));
        
        %generate the t values:
        t_sim(:,t,p)=beta_sim_1./SE;
        
        %generate the F values:
        F_sim(t,p) = ((R*beta_sim_1-r)'*inv(sig2*R*inv(X'*X)*R')*(R*beta_sim_1-r))/q;
    
    end
    
end

In [126]:
% percent accept
t_crit = tinv(1-0.05/2,N-K);
F_crit = 3.19505628;

Perc_accept_t = NaN(K,size(rho,2));
Perc_accept_F = NaN(1,size(rho,2));

for p=1:size(rho,2)

    Perc_accept_t(:,p) = sum(abs(t_sim(:,:,p))<t_crit,2)/T *100;
    Perc_accept_F(p) = sum(F_sim(:,p)<F_crit)/T*100;
end

In [128]:
Perc_accept_t
Perc_accept_F
mean_F = mean(F_sim)

Perc_accept_t =

         0         0         0         0
    0.3000    2.0000   13.2000   93.4000
         0         0         0         0


Perc_accept_F =

     0     0     0     0


mean_F =

   1.0e+05 *

    0.0175    0.0241    0.0434    2.3950


As we see from the above, as the correlational coefficient $\rho$ increases so as to create a larger correlation between $x_1$ and $x_2$, we see an increase in the percentage of times that we accept the null hypothesis of insignificance in the t-test for one of the correlated regresors (here, $x_1$). However, we jointly reject the null hypothesis of global insignificance in all runs because our F-stat increases in size as $\rho$ increases. 