---
title: "Topics in Econometrics and Data Science: Tutorial 8"

---

#### General Note

You will very likely find the solution to these exercises online. We, however, strongly encourage you to work on these exercises without doing so. Understanding someone elseâ€™s solution is very different from coming up with your own. Use the lecture notes and try to solve the exercises independently.

# Section 2 cont'd: Linear Regression

## Exercise 3: Linear Regression: Inference II - Simulation for the Linear Regression Model (OLS)

Let us consider a univariate regression model 

$$Y = X\beta + \varepsilon,$$
where $Y$ is the outcome variable and $X$ is a regressor. $\varepsilon$ is drawn from a Normal distribution with variance $\mu = 0$ and $\sigma = 1$ and independent from $X$. We are interested in inference on $\beta$. 

In this exercise, we simulate data according to the regression model above in order to provide evidence in favor of the theoretical results we learned in the lecture.

### A)

Set up a data generating process (DGP) according to the regression model above. 

1. Write a function with inputs  $\beta$ and $n$ and outputs $Y$ and $X$. Generate $X$ as $X \sim_{i.i.d} N(0,1)$. \
\
**Hint:** You can use [`np.random.normal`](https://numpy.org/doc/2.0/reference/random/generated/numpy.random.normal.html).

In [72]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import statsmodels.api as sm

In [None]:
# Set up a function for the DGP
def DGP(beta, n):

    
    return Y,X 

2. Use your function to generate $n=20$ observations for $Y$ and $X$ and use $\beta = 1$ in your example.

In [None]:
# Try if it works
n = 20
beta = 1


3. Run a linear regression based on your generated data in part B. \
\
**Hint:** Use the package [`statsmodels`](https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html).

In [None]:
#  Run a linear regression 
olssim = sm.OLS(...)

### B) 

Set up a simulation study to estimate the *Bias* and *Standard Error* of $\hat{\beta}$. How do the results change if $n$ increases, e.g. $n=10, 20, 30, 40, 50, 100, 200, 400$? Illustrate your results with an appropriate graphic. Do your result support the claim that the OLS estimator is an unbiased estimator? What can you say about estimation uncertainty?

In [None]:
# Bias 
Bias = beta - olssim.params[0]
SE = olssim.bse[0]

print("Bias", Bias)
print("SE", SE)

In [None]:
# We start with a lopp for a varying number of observations 
beta = 1
nobs = [10, 20, 30, 40, 50, 100, 200, 400]

# We need to declare objects to save the results
Bias = np.zeros(len(nobs))
SE = np.zeros(len(nobs))
Estim = np.zeros(len(nobs))

np.random.seed(1234)

for i in range(0,len(nobs)):
    ...

### C)

Repeat the simulation from before $R=100$ times and report the average results (i.e. the average *Bias*, *Standard Error* and $\hat{\beta}$ *estimate* over the $R$ repetitions). Illustrate your results with an appropriate graphic.

**Hint:** Write a function that executes the estimation from before automatically and that repeats the calculations $R$ times. The inputs of this function are $n$ and $R$ and the outpus are the average *Bias*, the average *Standard Error* and the average $\hat{\beta}$ *estimate*.
    

In [None]:
# Input: nobs, nrep
# Given a number of observation, we want to repeat the simulation R times
# Output: mean Bias, SE and Estimates

def SIM(n, R):
    
    
    return mBias, mSE, mEstim

### D) 

In the lecture, we have learned that the t-statistic $t$ for the regression coefficient $\beta$ is asymptotically normal if $n$ is large. Use the simulation study to provide evidence that in the setting above it holds that:
    $$\sqrt{n}(\hat{\beta} - \beta) \xrightarrow[]{d} N(0,1)$$
**Hint:** Repeat the simulation from above. Now, we need to save the $\hat{\beta}$ so that we can compute $\sqrt{n}(\hat{\beta}-\beta)$. For this, write a new function similiar to the one in part C. The inputs are $n$ and $R$ and the output is an array (of length $R$) that contains the $\hat{\beta}$. Illustrate the results by generating a bar plot similar to that in Lecture 2 (slide 23).
   


In [None]:
# Modify the SIM Function

def SIM2(n, R):
    

            
    return Estim