# Regressions

## Linear regressions

__Import the Duncan/carData dataset__

In [2]:
import statsmodels.api as sm
dataset = sm.datasets.get_rdataset("Duncan", "carData")
df = dataset.data
df.head()

Unnamed: 0,type,income,education,prestige
accountant,prof,62,86,82
pilot,prof,72,76,83
architect,prof,75,92,90
author,prof,55,90,76
chemist,prof,64,86,90


__Estimate by hand the model $\text{income} = \alpha + \beta  \times \text{education}$ . Plot.__

__Compute total, explained, unexplained variance. Compute R^2 statistics__

__Use statsmodels (formula API) to estimate $\text{income} = \alpha + \beta  \times \text{education}$. Comment regression statistics.__

In [9]:
#https://www.statsmodels.org/stable/generated/statsmodels.formula.api.ols.html

from statsmodels.formula import api as smf

model_1 = smf.ols("income ~ education", df)
res_1 = model_1.fit()

<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffad5b135e0>

In [10]:
res_1.summary()

0,1,2,3
Dep. Variable:,income,R-squared:,0.525
Model:,OLS,Adj. R-squared:,0.514
Method:,Least Squares,F-statistic:,47.51
Date:,"Tue, 02 Feb 2021",Prob (F-statistic):,1.84e-08
Time:,11:29:50,Log-Likelihood:,-190.42
No. Observations:,45,AIC:,384.8
Df Residuals:,43,BIC:,388.5
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,10.6035,5.198,2.040,0.048,0.120,21.087
education,0.5949,0.086,6.893,0.000,0.421,0.769

0,1,2,3
Omnibus:,9.841,Durbin-Watson:,1.736
Prob(Omnibus):,0.007,Jarque-Bera (JB):,10.609
Skew:,0.776,Prob(JB):,0.00497
Kurtosis:,4.802,Cond. No.,123.0


The estimated regresssion is `income = 10.6 + 0.59 education`. At a 5% p-value level both the intercept and the coefficient are significant.
R-squared is 0.52: the model explains half of the variance.

__Use statsmodels to estimate $\text{income} = \alpha + \beta  \times \text{prestige}$. Comment regression statistics.__

In [None]:
# 

__Use statsmodels to estimate $\text{income} = \alpha + \beta  \times \text{education}  + \beta_2  \times \text{prestige}  $. Comment regression statistics.__

__WHich model would you recommend? For which purpose?__

__Plot the regression with prestige__

__Check visually normality of residuals__

## Finding the right model

__Import dataset from `data.dta`. Explore dataset (statistics, plots)__

__Our goal is to explain `z` by `x` and `y`. Run a regression.__

__Examine the residuals of the regression. What's wrong? Remedy?__

## Taylor Rule

In 1993, John taylor, estimated, using US data the regression: $i_t = i^{\star} + \alpha_{\pi} \pi_t + \alpha_{\pi} y_t$ where $\pi_t$ is inflation and $y_t$ the output gap (let's say deviation from real gdp from the trend).
He found that both coefficients were not significantly different from $0.5$.
Our goal, is to replicate the same analysis.

### Importing the Data

__Import `macrodata` dataset from statsmodels (https://www.statsmodels.org/devel/datasets/generated/macrodata.html). Describe briefly its content using the metadata.__

In [1]:
import statsmodels.api as sm
dataset = sm.datasets.macrodata.load_pandas()

In [32]:
# the dataset object contains some data on the dataset: explore them (dataset.+Tab)

__Extract the dataframe from the dataset object. Print first lines and summary statistics.__

In [3]:
df = dataset.data
df

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959.0,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.980,139.7,2.82,5.8,177.146,0.00,0.00
1,1959.0,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.150,141.7,3.08,5.1,177.830,2.34,0.74
2,1959.0,3.0,2775.488,1751.8,289.226,491.260,1916.4,29.350,140.5,3.82,5.3,178.657,2.74,1.09
3,1959.0,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.370,140.0,4.33,5.6,179.386,0.27,4.06
4,1960.0,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.540,139.6,3.50,5.2,180.007,2.31,1.19
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
198,2008.0,3.0,13324.600,9267.7,1990.693,991.551,9838.3,216.889,1474.7,1.17,6.0,305.270,-3.16,4.33
199,2008.0,4.0,13141.920,9195.3,1857.661,1007.273,9920.4,212.174,1576.5,0.12,6.9,305.952,-8.79,8.91
200,2009.0,1.0,12925.410,9209.2,1558.494,996.287,9926.4,212.671,1592.8,0.22,8.1,306.547,0.94,-0.71
201,2009.0,2.0,12901.504,9189.0,1456.678,1023.528,10077.5,214.469,1653.6,0.18,9.2,307.226,3.37,-3.19


In [34]:
df.head()

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959.0,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1,1959.0,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74
2,1959.0,3.0,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09
3,1959.0,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06
4,1960.0,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19


### Preparing the Data

__Compute inflation as the growth and store it in the dataframe as variable `π`.__

__Add nominal interest rate to the database (use the Fisher relation).__

__*Detrend* GDP using Hodrick-Prescott filter. If needed, Check wikipedia and the [documentation](https://www.statsmodels.org/dev/generated/statsmodels.tsa.filters.hp_filter.hpfilter.html). The result is a trend tau and a residual epsilon. Store `log(tau/residual)` as `y`__

## Run the regression

__Run the basic regression. Interpret the results.__

__Which control variables would you propose to add? Does it increase prediction power? How do you interpret that?__