#### Questions
In this exercise we study the gasoline market and look at the relation between consumption and price in the USA.
We will use yearly data on these variables from 1977 to 1999. Additionally we have data on disposable income, and
some price indices. More precisely we have
    • GC: log real gasoline consumption;        
    • PG: log real gasoline price index;     
    • RI: log real disposable income;       
    • RPT: log real price index of public transport;     
    • RPN: log real price index of new cars;     
    • RPU: log real price index of used cars.     
    
We consider the following model
$$ GC = β_1 + β_2PG + β_3RI + ε $$
(a) Give an argument why the gasoline price may be endogenous in this equation.       
(b) Use 2SLS to estimate the price elasticity (β2). Use a constant, RI, RPT, RPN, and RPU as instruments.        
(c) Perform a Sargan test to test whether the five instruments are correlated with ε. What do you conclude?        

(a) Give an argument why the gasoline price may be endogenous in this equation.    

High demand for gasoline -> high market price
GC and PG are determined simultaneously. (PG is endogenous)


(b) Use 2SLS to estimate the price elasticity (β2). 
Use a constant, RI, RPT, RPN, and RPU as instruments.     

In [3]:
%matplotlib inline
import sys
sys.path.append('/Users/CJ/Documents/bitbucket/xforex_v1/xforex_v3')
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from xforex.BackTesting.econometrics_tools import Econometrics_Tool
import numpy as np

dat = pd.read_csv(
        '/Users/CJ/Documents/bitbucket/xforex_v1/xforex_v3/training/econometrics/week4-endogeneity/TrainExer44.txt')
dat.describe()

Unnamed: 0,OBS,GC,PG,RI,RPN,RPT,RPU
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,1984.5,6.921244,-0.048086,3.331839,0.032685,-0.010379,-0.103642
std,8.803408,0.203019,0.200619,0.215418,0.137276,0.136789,0.099774
min,1970.0,6.634181,-0.379726,2.943489,-0.176815,-0.279452,-0.295147
25%,1977.25,6.74158,-0.190877,3.173783,-0.085284,-0.10826,-0.182283
50%,1984.5,6.85088,-0.126932,3.340562,0.005886,0.021885,-0.097073
75%,1991.75,7.053717,0.069089,3.515337,0.150454,0.088648,-0.014173
max,1999.0,7.301263,0.397735,3.685151,0.311872,0.171155,0.079524


In [14]:
model_stage1 = Econometrics_Tool().linear_fit(dat[['RI', 'RPN','RPT','RPU']], \
                                            dat['PG'])
model_stage1.summary()

0,1,2,3
Dep. Variable:,PG,R-squared:,0.887
Model:,OLS,Adj. R-squared:,0.869
Method:,Least Squares,F-statistic:,48.97
Date:,"Wed, 14 Sep 2016",Prob (F-statistic):,1.8e-11
Time:,10:54:53,Log-Likelihood:,38.812
No. Observations:,30,AIC:,-67.62
Df Residuals:,25,BIC:,-60.62
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
const,7.7410,0.834,9.285,0.000,6.024 9.458
RI,-2.2984,0.247,-9.303,0.000,-2.807 -1.790
RPN,-3.5279,0.352,-10.023,0.000,-4.253 -2.803
RPT,-0.8080,0.191,-4.225,0.000,-1.202 -0.414
RPU,0.2331,0.183,1.273,0.215,-0.144 0.610

0,1,2,3
Omnibus:,2.308,Durbin-Watson:,0.905
Prob(Omnibus):,0.315,Jarque-Bera (JB):,1.09
Skew:,0.209,Prob(JB):,0.58
Kurtosis:,3.835,Cond. No.,244.0


In [12]:
# PG_fit is the predicted value from stage 1
dat['PG_FIT'] = model_stage1.fittedvalues

model_stage2 = Econometrics_Tool().linear_fit(dat[['RI', 'PG_FIT']], \
                                            dat['GC'])
model_stage2.summary()

0,1,2,3
Dep. Variable:,GC,R-squared:,0.967
Model:,OLS,Adj. R-squared:,0.964
Method:,Least Squares,F-statistic:,393.3
Date:,"Wed, 14 Sep 2016",Prob (F-statistic):,1.08e-20
Time:,10:46:39,Log-Likelihood:,56.859
No. Observations:,30,AIC:,-107.7
Df Residuals:,27,BIC:,-103.5
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
const,5.0137,0.134,37.456,0.000,4.739 5.288
RI,0.5647,0.041,13.942,0.000,0.482 0.648
PG_FIT,-0.5444,0.046,-11.789,0.000,-0.639 -0.450

0,1,2,3
Omnibus:,2.965,Durbin-Watson:,0.663
Prob(Omnibus):,0.227,Jarque-Bera (JB):,2.503
Skew:,0.694,Prob(JB):,0.286
Kurtosis:,2.73,Cond. No.,70.9


(c) Perform a Sargan test to test whether the five instruments are correlated with ε. What do you conclude?   

if instruments are correlated with ε -> instrument not valid

In [20]:
model_sargan = Econometrics_Tool().linear_fit(dat[['RI', 'RPN','RPT','RPU']], \
                                            model_stage2.resid)
model_sargan.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.041
Model:,OLS,Adj. R-squared:,-0.113
Method:,Least Squares,F-statistic:,0.2667
Date:,"Wed, 14 Sep 2016",Prob (F-statistic):,0.897
Time:,11:03:43,Log-Likelihood:,57.486
No. Observations:,30,AIC:,-105.0
Df Residuals:,25,BIC:,-97.97
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
const,-0.2098,0.447,-0.469,0.643,-1.131 0.712
RI,0.0604,0.133,0.456,0.653,-0.213 0.333
RPN,0.0204,0.189,0.108,0.915,-0.369 0.409
RPT,-0.0512,0.103,-0.499,0.622,-0.263 0.160
RPU,-0.0702,0.098,-0.715,0.481,-0.273 0.132

0,1,2,3
Omnibus:,0.911,Durbin-Watson:,0.739
Prob(Omnibus):,0.634,Jarque-Bera (JB):,0.807
Skew:,0.371,Prob(JB):,0.668
Kurtosis:,2.69,Cond. No.,244.0


In [30]:
sargan_stat = model_sargan.nobs * model_sargan.rsquared
sargan_stat

1.2279491529004505

In [40]:
import scipy.stats as stats
degree_freedom = model_stage1.df_model + 1 - (model_stage2.df_model +1)
crit = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence*
                      df = degree_freedom)   # *

print("Critical value")
print(crit)

p_value = 1 - stats.chi2.cdf(sargan_stat,  # Find the p-value
                             df=degree_freedom)
print("P value")
print(p_value)

Critical value
5.99146454711
P value
0.541195565553


Not significate at 5% level
Cannot reject H0. -> instruments mean to be valid