# Second portion of the final exam
## Instructions

This second portion of the final exam has 2 components.
1. `Theoretical questions:` you will be asked to provide short derivations and/or a proof. For the `theoretical questions`, please write your answers on a paper sheet. When you are done, you should scan your work with your phone to upload it on Canvas, as you did for the assignments.
2. `Coding in Python:` For the `coding questions`, you are given a code (below) which has a few missing parts. Your job is to complete it. The data for this exercise are in `ccapmmonthlydata.xls`. When you are done, you should upload your Python code on Canvas.

In essence, you should upload `two files` on Canvas: (1) a `pdf of your handwritten answers` and (2) `your completed code`. 

Please `note`: You have a total of `2 hours` to finish both portions of the final exam (i.e., the multiple-choice questions on Canvas under 'Quizzes' and this theoretical+coding problem). Please, make sure you allocate your time efficiently.

Good Luck!


# GMM estimation of the Consumption CAPM
## Description of the model
Consider a representative investor who lives for two periods ($t$ and $t+1$) and has income $e_t$ in period $t$ and $e_{t+1}$ in period $t+1$. 
The utility function of the representative investor is: 
\begin{equation*}
U(c_t, c_{t+1})=u(c_t) + \beta \mathbb{E}_t[u(c_{t+1}].
\end{equation*}
Assume that the utility function is logarithmic, that is
\begin{equation}
u(c_t) = \log(c_t),
\end{equation}
where $\log$ is, as always, the natural logarithm.

The investor can invest in an asset by buying $\vartheta$ shares at the unit price $p_t$. The asset's payoff $x_{t+1}$ in the second period is uncertain. The investor chooses how many units ($\vartheta$) of the asset to buy in order to maximize her/his utility function:
\begin{equation*}
\underset{\vartheta}{\max} \ \log(c_t) + \beta \mathbb{E}_t\left[\log(c_{t+1})\right],
\end{equation*}
subject to the income/wealth constraints
\begin{eqnarray*}
c_t &=& e_t - \vartheta p_{t},\\
c_{t+1} &=& e_{t+1} + \vartheta x_{t+1}.
\end{eqnarray*}

# `Question 1 (theory question)`
## Solve the maximization problem of the investor and show that the price of the asset at time $t$ can be written as:
$$
p_{t}=\mathbb{E}_{t}\left[\beta\left(\frac{c_{t}}{c_{t+1}}\right)x_{t+1}\right].
$$




# `Question 2 (theory question)`
## Show that the pricing equation above can be written in terms of returns on the asset
$$
1=\mathbb{E}_{t}\left[\beta\left(\frac{c_{t}}{c_{t+1}}\right)\left(1+R_{t+1}\right)\right].
$$




# `Question 3 (theory question)`
## Show that the equation in the previous question can be written as an unconditional mean, rather than as a conditional mean. In other words, show the steps to get from 
$$
1=\mathbb{E}_{t}\left[\beta\left(\frac{c_{t}}{c_{t+1}}\right)\left(1+R_{t+1}\right)\right],
$$

## to 
$$
\mathbb{E}\left[\beta\left(\frac{c_{t}}{c_{t+1}}\right)\left(1+R_{t+1}\right)-1\right] = 0.
$$


# `CODE`

The theoretical moment conditions are
$$
\mathbb{E}\left[\beta\left(\frac{c_{t}}{c_{t+1}}\right)\left(1+R_{t+1}^{i}\right)-1\right]=0 
\label{eq:theoretical_moments} 
$$
for all assets $i=1,...,10$. The logic is simple: if the model is correct, expected pricing errors (written in terms of returns) should be zero for all assets.

In this model we estimate only 1 parameter: $\beta$ (which measures the impatience of the average investor in the market). The code below estimates the model's parameter $\beta$ - and computes the corresponding standard error - using the data `ccapmmonthlydata.xls`. This is the same dataset that we used in class and you used in the homework. The code is `INCOMPLETE`. Answer `Question 4 through 8 below` to complete it.

### Let us begin by uploading the main libraries.

In [None]:
import os
import pandas as pd
import numpy as np
import scipy.optimize
from scipy.stats import t, norm, chi2
from platform import python_version

In [None]:
# The recommended python version is 3.8 or 3.9
print(python_version())
# Check current directory
os.getcwd()

# The data
To estimate the model we need data on consumption growth and returns on assets. We use data on 10 risky assets and consumption growth from the file `ccapmmonthlydata.xls`. The data are monthly observations.

This means that the variable `CONS_GROWTH` is the value of $\frac{c_{t+1}}{c_t}$ and the other columns represent the returns on each asset (i.e., $R^{1}, ..., R^{10}$) for each period $t$. 

In [None]:
data = pd.read_excel('ccapmmonthlydata.xls')
# data.describe()  #This command provides descriptive statistics for each colunm in the dataframe.

# The 10 columns of asset return data
ret = np.array(data.iloc[:, 2:])
# consumption growth data (c_{t+1}/c_{t}) is in the first column
cons = np.array(data.CONS_GROWTH)  
# The number of assets
number_assets = 10
# The number of observations
T = len(cons)

# `Question 4 (coding)`
## Complete the function `gmm` that computes the criterion function and the pricing errors for the GMM estimator in this model. This is the same GMM function that we have seen in class and you have already implemented ... `but the pricing errors are different`. 

The criterion is the following quadratic form with weigths $W_T$:

\begin{eqnarray*}
Q_T(\beta) = \underbrace{g_{T}(\beta )^{\top }}_{1\times N}\underbrace{W_{T}}_{N\times N}\underbrace{g_{T}(\beta )}_{N\times1},
\end{eqnarray*}

where the vector $g_T(\beta)$ is the empirical moment vector

\begin{eqnarray*}
\underbrace{g_T(\beta)}_{N\times1} &=& \frac{1}{T}\sum_{t=1}^{T-1}\underbrace{g(X_{t+1},\beta)}_{N\times1} \\
&=&\frac{1}{T}\sum_{t=1}^{T-1}\begin{pmatrix} g^{1}(X_{t+1},\theta) \\ g^{2}(X_{t+1},\theta) \\ ... \\g^{N}(X_{t+1},\beta)\end{pmatrix} \\
&=& \frac{1}{T}\sum_{t=1}^{T-1}\begin{pmatrix} \beta \left(\frac{c_{t}}{c_{t+1}}\right)(1+R^1_{t+1,t})-1 \\ \beta \left(\frac{c_{t}}{c_{t+1}}\right)(1+R^2_{t+1,t})-1 \\ ... \\ \beta \left(\frac{c_{t}}{c_{t+1}}\right)(1+R^N_{t+1,t})-1\end{pmatrix} 
\end{eqnarray*}

and $W_T$ is a square symmetric matrix of weights which we have to choose (see below).

The function `gmm` has several **inputs**:

1. `parameters` is a vector of parameters
2. `cons` is the vector of consumption growth
3. `ret` is the $T\times N$ matrix containing the returns on all assets, one per column
4. `W` is an $N\times N$ matrix of weights
5. `flag` is a string. If we use `criterion` the function `gmm` computes the criterion function $Q_T(\beta)$; if we use `pricing error`, it returns a $T\times N$ matrix containing the pricing errors for each time period (rows) and each asset (columns). 

The function `gmm` can, therefore, produce two **outputs**.
1. If `flag = 'criterion`, it will output the value of $Q_T(\beta) = g_T(\beta)^{\top} W_T g_T(\beta)$
2. If `flag = 'pricing error`, it will output the $T\times N$ matrix of pricing errors, where each row represents a time period $t$ and each column is a different asset.

In [None]:
def gmm(parameter, cons, ret, W, flag):

    p_error = np.zeros([T, number_assets])          # The matrix in which we are going to store the pricing errors
                                                    # The rows are time periods, the columns are assets

    # The following loop creates the pricing errors for each period and each asset    
    
    ##################################################
    # COMPLETE THE CODE BELOW
    ##################################################
    
     for j in range(number_assets):      
        p_error[:,j] = 

    ##################################################
    # END OF CODE TO BE MODIFIED
    ##################################################    

    g = np.mean(p_error,axis=0)  # Note that this is a row vector
    g = g.T                      # I transform it into column vector (to mirror the notation in the slides)
    
    if flag == 'criterion':
        f = g.T @ W @ g            # returns Q_T(theta)
    elif flag == 'pricing error':
        f = p_error                # returns the pricing errors matrix, T x N
    else:
        print("error: you need to choose either 'pricing error'  or 'criterion'")
    return f

## First-stage estimation
Next, we find the parameters by **minimizing the GMM criterion**:
We use `scipy.optimize.fmin` with the following inputs:

1. `func`. The function to minimize - in our case `gmm` - as defined in the previous snippet.
2. `x0`. The initial guess of the parameter $\beta$: `initial_guess`. This is just our initial guess of the parameter for evaluating the function `gmm` at the beginning of the minimization.
3. `args`. The arguments of the `gmm` function that are not parameters. For our problem, these are the data `cons` and `ret`, respectively. We have to provide the weight matrix `W` and the flag for the function `gmm` as well.

Additional inputs that are optional:

4. `xtol` and `ftol`. This is the tolerance for the minimizer and the function evaluation, respectively. The algorithm will stop automatically when it cannot find another minimizer that is smaller by at least `xtol` than the current one. Same as for the function evaluation. 
5. `maxiter`. The maximum number of iterations to try. The algorithm stops if it reaches `maxiter` attempts, even if it did not find a minimum.
6. `disp`. A variable indicating whether we want to see some results or not. `disp=0` will not show results, `disp=1` will provide additional information.

In [None]:
# First-stage weight matrix (the identity matrix)
W = np.eye(number_assets)
                         
# parameters used to initialize the optimization
initial_guess = 0.5

# minimize the gmm criterion to find the parameters estimates
estimates = scipy.optimize.fmin(func=gmm, 
                                  x0=initial_guess, 
                                  args=(cons, ret, W, 'criterion'), 
                                  xtol=1e-5, 
                                  ftol=1e-5,
                                  maxiter = 100000,
                                  disp=0)

# The first-stage parameter estimates
print(f'The first-stage estimate of the parameter is {estimates[0]:.3f}')

# `Question 5 (theory + coding)`
## Write (on your piece of paper) the formula for the matrix $\Phi_0$ and its estimator $\widehat{\Phi}_0$, so that you can compute the optimal weight matrix. Complete the code to compute the value of the matrix. 

In [None]:
####################################################################
# We compute the optimal weight matrix using the first-stage estimate
####################################################################

    ##################################################
    # COMPLETE THE CODE BELOW 
    ##################################################
    
    # The pricing errors evaluated at the first-stage estimate
    g_opt = gmm(estimates[0], cons,ret, W, 'pricing error');     

    # Phi_hat.
    Phi_hat = np.zeros([number_assets, number_assets])
    for j in range(T):
        Phi_hat = 


    ##################################################
    # END OF CODE TO BE MODIFIED
    ##################################################

# The optimal weight matrix is just the inverse of Phi_hat    
W_opt = np.linalg.inv(Phi_hat)

## Second-stage estimation (using optimal weight matrix `W_opt`)

In [None]:
# Second-stage estimation 
estimates_opt = scipy.optimize.fmin(func=gmm, 
                                    x0=initial_guess, 
                                    args=(cons, ret, W_opt, 'criterion'), 
                                    xtol=1e-5, 
                                    ftol=1e-5, 
                                    disp=0)

# The second-stage parameter estimates
print(f'The second-stage estimate of the parameter is {estimates_opt[0]:.3f}')

# `Question 6 (theory + coding)`
## Write (on your piece of paper) the formula for the matrix $\Gamma_0$ and its estimator $\widehat{\Gamma}_0$. Complete the code to compute the value of this matrix and that of $\widehat{\Phi}_0$. Both are needed for the computation of the standard error of the parameter $\beta$. 

In [None]:
     ##################################################
     # COMPLETE THE CODE BELOW 
     ##################################################

#############################
# Estimate Phi_hat
##############################

# The pricing errors evaluated at the optimal second-stage estimates
g_opt2 = gmm(estimates_opt, cons, ret, W_opt, 'pricing error')

# Phi_hat 
Phi_hat2 = np.zeros([number_assets, number_assets])
for j in range(T):
    Phi_hat2 = 

invPhi_hat2 = np.linalg.inv(Phi_hat2)  # This is the inverse of Phi_hat

#############################
# Estimate Gamma_hat
##############################

# we use derivatives directly in the loop
Gamma_hat = np.zeros([number_assets, 1])
for i in range(number_assets):
    Gamma_hat[i, 0] = 
    
    ##################################################
    # END OF CODE TO BE MODIFIED
    ##################################################    

    
################################
# Putting everything together: The estimated variance
################################
                               
# compute the variance-covariance matrix of the parameter estimates
VarCov = (1 / T) * np.linalg.inv(Gamma_hat.T @ invPhi_hat2 @ Gamma_hat)


##################################
# Standard errors and t-statistics
###################################

# the variances are on the diagonal
var_diag = np.diag(VarCov);
std_error = np.sqrt(var_diag);

# t-statistics
t_stats = estimates_opt/std_error     

# table of parameters, standard errors and t statistics
table_estimates = pd.DataFrame({'Stage 1': estimates, 'Stage 2': estimates_opt, 'Std Errors': std_error,
                                't stats': t_stats }, index = ['beta'])
print(table_estimates)

# `Question 7 (theory + coding)`
## Test the null hypothesis $H_0: \beta = 1$. Write the test (on your piece of paper) and code it. Provide both the test statistic and the p-value of the test. Do you reject the null at the 0.05 significance level?

In [None]:
#############################
# Test for H_0: beta = 1
#################################

    ##################################################
    # COMPLETE THE CODE BELOW 
    ##################################################


test_stat = 
pvalue = 


    ##################################################
    # END OF CODE TO BE MODIFIED
    ##################################################

# `Question 8 (theory + coding)`
## Are all of the pricing errors equal to zero? You need to test for over-identifying restrictions using Hansen's test. Write the test (on your piece of paper) and code it. Provide both the test statistic and the p-value of the test. Do you reject the null hypothesis at the 0.05 significance level?

In [None]:
#########################################
# Test for over-identifying restrictions
##########################################

    ##################################################
    # COMPLETE THE CODE BELOW
    ##################################################

test = 
Pvalue = 

    ##################################################
    # END OF CODE TO BE MODIFIED
    ##################################################    