# Exercises For Session 9 (Numpy and Numba)

## Exercise 1: A primer on Numpy

Solve the exercises below to become acquainted with numpy.

### Exercise 1.1: Warm-up
Create a new function, call it "numpy_trial". Inside the function, do the following:

- Inizialize a random 4x4 matrix and fill it with random numbers with a loop. (Hint: consider using [numpy.zeros](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html))

- Draw a random array with the same number of dimensions as before, but without using any loop. (Hint: [numpy.random.rand](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.rand.html))

- Multiply the two matrices and compute the inverse of the result (if it exists).

- Reshape the final matrix. (Hint: [numpy.reshape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html))

- Run the function and display the final result. (Hint: use print())

### Exercise 1.2: Numpy for econometrics
The script below generates a random panel data.

In [2]:
# Initialization
import numpy as np
import pandas as pd

# Setup
np.random.seed(208)
ID            = 20
Periods       = 5
beta          = np.array([1, 0.5, 1.4, 3, 0.2, 5]) # True values

# Define function
def create_data(ID, Periods, beta):
    
    data_mu       = np.array([1, 0.7, -0.25, 0.6, 0.4, -0.1])                                              
    data_var      = [ [ 1.0000,  -0.2962,    0.3144,    0.5061,   -0.0014,   0.0077],
                    [-0.2962,   1.0000,    0.3082,    0.0301,   -0.0101,   0.5034],
                    [ 0.3144,   0.3082,    1.0000,    0.7012,    0.6674,   0.6345],
                    [ 0.5061,   0.0301,    0.7012,    1.0000,    0.1950,   0.2173],
                    [-0.0014,  -0.0101,    0.6674,    0.1950,    1.0000,   0.1860],
                    [ 0.0077,   0.5034,    0.6345,    0.2173,    0.1860,   1.0000] ]                       
    year          = np.sum(np.kron(np.linspace(1,Periods,Periods),np.identity(ID)),0)                      
    idx           = np.sum(np.kron(np.identity(Periods),np.linspace(1,ID,ID)),0)                           
    X             = np.exp(np.array(np.random.multivariate_normal(data_mu, data_var, ID*Periods)))         
    y             = X @ beta + np.random.normal(0,1,ID*Periods)
    data          = np.c_[year, idx, X, y]
    
    return data

# Call function
data = create_data(ID, Periods, beta)
#print(pd.DataFrame(data))                                                                    

Take a few moments to undestrand how to simulate data. When creating "year" and "idx", I make use of the kroenecker product ([np.kron](https://docs.scipy.org/doc/numpy/reference/generated/numpy.kron.html)) to create a matrix with year and ID indices. Then, I sum over rows to obtain a column vector with the desired indices. 

Create a new function, call it "Pooled_OLS", that takes a (y,X) as inputs and returns the Pooled OLS estimator of $\beta$ as output. Inside the function, do the following

- Using matrix multiplication, create your own Pooled OLS estimator of the vector beta. (Hint: use [np.linalg.inv](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.inv.html))

- Compute the Pooled OLS standard errors.

Run it with and without the [@njit](https://numba.pydata.org/numba-doc/dev/user/performance-tips.html) decorator. Do you notice any difference? Free feel to experiment a bit.

**Note**. If you correctly write the "Pooled_OLS" function, you should get exactly the same result as if you run Pooled OLS using the python package statsmodel. It is reassuring to see that the estimated values are very close to the true ones.

In [3]:
import statsmodels.api as sm
model = sm.OLS(data[:,8], data[:,2:8])
results = model.fit()
print(results.summary())

                                 OLS Regression Results                                
Dep. Variable:                      y   R-squared (uncentered):                   0.999
Model:                            OLS   Adj. R-squared (uncentered):              0.999
Method:                 Least Squares   F-statistic:                          1.755e+04
Date:                Tue, 21 Jan 2020   Prob (F-statistic):                   5.45e-141
Time:                        08:41:35   Log-Likelihood:                         -137.48
No. Observations:                 100   AIC:                                      287.0
Df Residuals:                      94   BIC:                                      302.6
Df Model:                           6                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

## Exercise 2: Getting Serious
Propose your own solutions for exercises 1 and 2 at [Quantecon](https://python.quantecon.org/numpy.html).