# Linear Regression

In [1]:
# Import of packages
import numpy as np


# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

We're given the following linear equation: 

$$y_i = \beta_0 + \beta_1 x_{1,i}+\beta_2 x_{2,i}+\epsilon $$

Given the information in the exam questions we assume we have access to data of the independent variables $(x_{1,i},x_{2,i})$ and the dependent variabel $(y_i)$ for N individuals, where $i$ indexes individuals. 

The variable $\epsilon_i$, is a mean-zero stochastic shock. 

## Data Generating Process 

In [2]:
def DGP(N):

    # a. independent variables
    x1 = np.random.normal(0,1,size=N)
    x2 = np.random.normal(0,1,size=N)

    # b. errors
    eps = np.random.normal(0,1,size=N)

    extreme = np.random.uniform(0,1,size=N)
    eps[extreme < 0.05] += np.random.normal(-5,1,size=N)[extreme < 0.05]
    eps[extreme > 0.95] += np.random.normal(5,1,size=N)[extreme > 0.95]

    # c. dependent variable
    y = 0.1 + 0.3*x1 + 0.5*x2 + eps
    return x1, x2, y

** Data accessible: **

In [3]:
np.random.seed(2020)
x1,x2,y = DGP(10000)

# Question 1

In [12]:
#Creating the matrix X
a = [1]*10000
X = np.matrix(np.vstack((a,x1,x2)))
print('stacked:\n',X)

stacked:
 [[ 1.          1.          1.         ...  1.          1.
   1.        ]
 [-1.76884571  0.07555227 -1.1306297  ...  0.0370484   1.70892684
   2.06128052]
 [-0.18279442  0.78062368 -1.01220533 ... -1.44286811 -0.10668645
   0.55908184]]


In [13]:
#Transposing and inversing the matrix X according to the beta_hat given
x = X.transpose()
X_mul=X @ x
X_mul_inv=np.linalg.inv(X_mul)
x_mul=X_mul_inv@X
y1=y[:,np.newaxis]

In [14]:
# Estimating the vector of coefficients using OLS 
np.linalg.lstsq(x,y1)

(array([[0.0956821 ],
        [0.29294299],
        [0.50332771]]),
 array([37401.57637219]),
 3,
 array([100.44468527,  98.9919112 ,  98.41109419]))

# Question 2

In [20]:
#Prints the same results as obtained in Q1
beta=x_mul@y1
beta0_hat=float(beta[0])
beta1_hat=float(beta[1])
beta2_hat=float(beta[2])

In [21]:
#Predicting plane 
beta0_predict=[beta0_hat]*10000
x1_predict=beta1_hat*x1
x2_predict=beta2_hat*x2
y_predict = beta0_predict + x1_predict + x2_predict
y_predict

array([-0.51449434,  0.51072415, -0.74499893, ..., -0.61970033,
        0.54260199,  0.98092116])

# Conclusion