## Questions
This test exercise is of an applied nature and uses data that are available in the data file TestExer3. We consider the so-called Taylor rule for setting the (nominal) interest rate. This model describes the level of the nominal interest rate that the central bank sets as a function of equilibrium real interest rate and inflation, and considers the current level of inflation and production. Taylor (1993) considers the model:

$$i_t = r^* + \pi_t +0.5(\pi_t - \pi^*) + 0.5g_t$$

with i_t the Federal funds target interest rate at time t, 
r∗ the equilibrium real federal funds rate, 
πt a measureof inflation,
π∗ the target inflation rate and gt the output gap (how much actual output deviates from potential output). We simplify the Taylor rule in two manners. First, we avoid determining r∗
and π∗ and simply add an intercept to the model to capture these two variables (and any other deviations in the means). Second, we consider production yy rather than the output gap. In this form the Taylor rule is 

$$i_t = β1 + β2π_t + β_3yt + ε_t$$

Monthly data are available for the USA over the period 1960 through 2014 for the following variables:

• INTRATE: Federal funds interest rate      
• INFL: Inflation     
• PROD: Production
• UNEMPL: Unemployment    
• COMMPRI: Commodity prices       
• PCE: Personal consumption expenditure     
• PERSINC: Personal income   
• HOUST: Housing starts   

(a) Use general-to-specific to come to a model. Start by regressing the federal funds rate on the other 7 variables and eliminate 1 variable at a time.

In [1]:
%matplotlib inline
import sys
sys.path.append('/Users/CJ/Documents/bitbucket/xforex_v1/xforex_v3')
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from xforex.BackTesting.econometrics_tools import Econometrics_Tool
import numpy as np

dat = pd.read_csv(
        '/Users/CJ/Documents/bitbucket/xforex_v1/xforex_v3/training/econometrics/week3-model-specifiction/TestExer3-TaylorRule-round1.txt',
       sep='\t').drop(['Unnamed: 9', 'Unnamed: 10','Unnamed: 11','Unnamed: 12','Unnamed: 13'], axis = 1)
dat.describe()
g2s_model = \
Econometrics_Tool().iter_linear_fit(dat[['PROD', 'INFL','UNEMPL','COMMPRI','PCE','PERSINC','HOUST']], \
                dat['INTRATE'])
print g2s_model.summary()

eliminate column: UNEMPL
eliminate column: PROD
                            OLS Regression Results                            
Dep. Variable:                INTRATE   R-squared:                       0.637
Model:                            OLS   Adj. R-squared:                  0.635
Method:                 Least Squares   F-statistic:                     229.9
Date:                Tue, 13 Sep 2016   Prob (F-statistic):          2.03e-141
Time:                        15:45:59   Log-Likelihood:                -1450.2
No. Observations:                 660   AIC:                             2912.
Df Residuals:                     654   BIC:                             2939.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
cons

(b) Use specific-to-general to come to a model. Start by regressing the federal funds rate on only a constant and add 1 variable at a time. Is the model the same as in (a)?

**ans:** below code shows to add 1 variable at a time and find the best combinations of variables. And the model seems the same as (a)

In [2]:
s2g_model = Econometrics_Tool().global_iter_linear_fit_aic(dat[['PROD', 'INFL','UNEMPL','COMMPRI','PCE','PERSINC','HOUST']], \
                dat['INTRATE'])

****************
Final model:
                            OLS Regression Results                            
Dep. Variable:                INTRATE   R-squared:                       0.637
Model:                            OLS   Adj. R-squared:                  0.635
Method:                 Least Squares   F-statistic:                     229.9
Date:                Tue, 13 Sep 2016   Prob (F-statistic):          2.03e-141
Time:                        15:45:59   Log-Likelihood:                -1450.2
No. Observations:                 660   AIC:                             2912.
Df Residuals:                     654   BIC:                             2939.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         -0.2401 

In [3]:
from tabulate import tabulate
print '|specific to general model:'
print tabulate(pd.DataFrame(s2g_model.params), headers='keys', tablefmt='psql')

print '\n|general to specific model:'
print tabulate(pd.DataFrame(g2s_model.params), headers='keys', tablefmt='psql')

|specific to general model:
+---------+-------------+
|         |           0 |
|---------+-------------|
| const   | -0.240119   |
| INFL    |  0.717527   |
| COMMPRI | -0.00750067 |
| PCE     |  0.340525   |
| PERSINC |  0.240242   |
| HOUST   | -0.0205297  |
+---------+-------------+

|general to specific model:
+---------+-------------+
|         |           0 |
|---------+-------------|
| const   | -0.240119   |
| INFL    |  0.717527   |
| COMMPRI | -0.00750067 |
| PCE     |  0.340525   |
| PERSINC |  0.240242   |
| HOUST   | -0.0205297  |
+---------+-------------+


(c) Compare your model from (a) and the Taylor rule of equation (1). Consider R^2, AIC and BIC. Which of the models do you prefer?
**ans:**
Taylor rule:

$$  federalFundsRate_t = const + b1*inflation + b2*production + ε_t $$ 
Please change below for the model comparison results. Model from (a) are better when comparing R^2, AIC and BIC

In [4]:
taylor_model = Econometrics_Tool().linear_fit(dat[['PROD', 'INFL']], \
                dat['INTRATE'])

df_compare = pd.DataFrame(index=['r-square', 'aic', 'bic'])

df_compare['taylor'] = [taylor_model.rsquared, taylor_model.aic, taylor_model.bic]
df_compare['general_to_specific'] = [g2s_model.rsquared,g2s_model.aic, g2s_model.bic]
print tabulate(df_compare, headers='keys', tablefmt='psql')

+----------+-------------+-----------------------+
|          |      taylor |   general_to_specific |
|----------+-------------+-----------------------|
| r-square |    0.574701 |              0.637361 |
| aic      | 3011.62     |           2912.42     |
| bic      | 3025.09     |           2939.38     |
+----------+-------------+-----------------------+


(d) Test the Taylor rule of equation (1) using the RESET test, Chow break and forecast test (with in both tests as break date January 1980) and a Jarque-Bera test. What do you conclude?

**ans:** 

(1)In RESET Test, p value is 0.11. null assumption cannot be rejected at 5% level. 
(2) In Chow break and forecast test, p value is 7.62693e-74 and 0.0647282, which suggests the possible instability of the taylor.
(3) In Jarque-Bera test, p value is 0.00198523, null assuption is rejected at 5% level, which suggests not-normality of error terms

In [5]:
# after 1980 model test
from scipy import stats
from statsmodels.stats.outliers_influence import reset_ramsey
from datetime import datetime

dat.index = dat.OBS.map(lambda x: datetime.strptime(x, '%Y:%m'))

df_stat = pd.DataFrame(index=['jarque_bera test', 'RESET', 'Chow-break', 'Chow-forcast'])
model =taylor_model


# RESET TEST： for higher degree dependency
RESET_test = [str(reset_ramsey(model, degree=2)).split(",")[0], \
              str(reset_ramsey(model, degree=2)).split(",")[1].split("=")[1]]

# chow break: stability
# dat.index = dat['Year']
chow_break = Econometrics_Tool().chow_break(dat[['PROD', 'INFL']], dat['INTRATE'], datetime(1980,1,1))


# chow forcast:

# dat.index = dat['Year']
chow_forcast = Econometrics_Tool().chow_forcast(dat[['PROD', 'INFL']], dat['INTRATE'], 0.2)



df_stat['stat'] = [stats.jarque_bera(model.resid)[0], RESET_test[0] , chow_break[0] ,chow_forcast[0]]
df_stat['p-value'] = [stats.jarque_bera(model.resid)[1], RESET_test[1], chow_break[1] ,chow_forcast[1]]


print tabulate(df_stat, headers='keys', tablefmt='psql')

                            OLS Regression Results                            
Dep. Variable:                INTRATE   R-squared:                       0.856
Model:                            OLS   Adj. R-squared:                  0.856
Method:                 Least Squares   F-statistic:                     1961.
Date:                Tue, 13 Sep 2016   Prob (F-statistic):          6.08e-278
Time:                        15:45:59   Log-Likelihood:                -1527.1
No. Observations:                 660   AIC:                             3058.
Df Residuals:                     658   BIC:                             3067.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
PROD           0.1596      0.018      8.819      0.0