# **Data Analytics I**

# Consistency of OLS

**Author:** [Anthony Strittmatter](http://www.anthonystrittmatter.com)

We investigate the finite sample properties of the OLS estimator
\begin{equation*}
\hat{\beta} = (X'X)^{-1}X'Y,
\end{equation*}
for increasing sample sizes. We conduct a Monte Carlo simulation study. The control variable $X$ and the error term $U$ are independent and follow a random normal distribution. The outcome variable $Y$ has the linear model
\begin{equation*}
Y = X \beta + U,
\end{equation*}
for $\beta = 1$. We repeat the simulation 2,000 times. 

## Define the Input Factors

In [None]:
############## Define Input Factors ##############

# Define Sample Sizes
sample_size <- c(10, 50, 200, 800, 4000, 12000)
rep <- 2000 # Number of replications

print('Input factors defined.')

## Monte Carlo Simulation

In [None]:
############## Monte Carlo Simulation ##############

# Set starting value for random number generators, such that results can be replicated
set.seed(1001)
     
# Generate matrices to store the results
beta <- matrix(NA, nrow = rep, ncol = length(sample_size))
        

# Make a loop
for (n in c(1:length(sample_size))) {
     
    # Data Generating Process
    X <- matrix(rnorm(sample_size[n]*rep,mean=0,sd=1), nrow = sample_size[n], ncol = rep) 
    U <- matrix(rnorm(sample_size[n]*rep,mean=0,sd=X^2), nrow = sample_size[n], ncol = rep) 
    Y = X + U
    
    for (i in c(1:rep)) {
              
        # Estimate the OLS coefficient
        beta[i,n] <- solve(t(X[,i])%*%X[,i])%*%t(X[,i])%*%Y[,i]
   
    }
}

print('Simulation executed.')

## Consistency

We plot the distibution of the estimated OLS coeffficients $\hat{\beta}$.

In [None]:
############## Consistency ##############

# Plot Panel
par(mfrow = c(2, 3))

# Histogram
for (n in c(1:length(sample_size))) {
    hist(beta[,n],xlim = c(-1,3), freq = FALSE, main = paste("N =", sample_size[n]), xlab = "beta")
    abline(v=1, col="red")
}