# Regression Homework
In this homework we will review the process of generating an Ordinary Least Squares regression model. We will review the information that it can tell us about the relationship between variables.

Like always, we need to load in a few libaries first:

In [None]:
import statsmodels.formula.api as smf
import statsmodels.api as sm
from datascience import Table
from questioner import question, multiple_choice

Next, we need to load in our election data. This table represents presidential election outcomes from 1880 to now. In each row, we have collected information about different features during that year, such as inflation or the presence of a war.

In [None]:
elections = Table.read_table('jason_data/data/fair.csv')
elections

If we want to see the relationship between vote share and a variable, such as economic growth, we use the <code>smf.ols('y_variable ~ x_variable', data=data_table).fit()</code> function. To see the result, we call <code>.summary()</code>. Below, we produce the regression results for the relationship between vote share and inflation.  

**NOTE:** Most of the results of this table are outside the scope of this course. The important values for you to consider are the number of observations, adjusted R-squared value and the coefficient, standard errors, t-statistics, and p-values associated with the different independent variables.

In [None]:
vote_inflation_ols = smf.ols('VOTE ~ INFLATION', data=elections).fit()
vote_inflation_ols.summary()

In the table that is produced by the <code>.summary()</code> function, there is a row labeled "INFLATION".What does it tell us about the coefficient for the linear relationship between inflation and presidential vote share?

*YOUR ANSWER HERE*

Now, let's produce the OLS regression results for vote share and economic growth (the "GROWTH" column): 

In [None]:
vote_inflation_ols = ...
vote_inflation_ols.summary()

Using the "GROWTH" row, we can review what we can determine about the relationship between growth and presidential vote share. Is the relationship statistically significant?

*YOUR ANSWER HERE*

To use multiple variables, we can modify how we interact with the original function like so: <code>smf.ols('dependent_variable ~ independent_var_1 + independent_var_2 + ...', data=data_table).fit()</code>. Below, we run the regression between the two independent variables economic growth and monetary inflation.

In [None]:
vote_inflation_growth_ols = smf.ols('VOTE ~ INFLATION + GROWTH', data=elections).fit()
vote_inflation_growth_ols.summary()

Compare the coefficient and p-values for the two independent variables compared to when we just ran bivariate regression using each of them individually. How do these values change?

*YOUR ANSWER HERE*

Now, run the multivariate regression for the relationship between voteshare and "GOODNEWS" and "WAR":

In [None]:
vote_goodnews_war_ols = ...
vote_goodnews_war_ols.summary()

**Coeffecient Review:**
Using the coefficients for the intercept, GOODNEWS, and WAR variables, during peace time, how many months of good economic news is necessary for the incumbent to win?

*YOUR ANSWER HERE*

Is GOODNEWS statistically significant at the .05 level? What about at .01? What does this imply about positive economic news and incumbent voteshare?

*YOUR ANSWER HERE*

Let's practice generating confidence intervals. As we have seen in past lectures, the 95% confidence interval is calculated with $\beta \pm t_{critical} * se(\beta)$. Let's find the 95% confidence interval for the GOODNEWS coefficient.

Using the number of observations in the summary and the t-table in the back of your textbook, find the critical value of t, and store it in the variable below.

In [None]:
t_critical = ...
t_critical

Next, use the summary table to store the standard error for the GOODNEWS coefficient.

In [None]:
goodnews_se = ...
goodnews_se

Using the standard error, calculate the 95% confidence interval. In the cell below, fill out the values for the lower and upper bound of the interval. Does it match what the <code>.summary()</code> function returns?

In [None]:
goodnews_lower =  0.9843 - t_critical*goodnews_se
goodnews_upper = ...
goodnews_lower, goodnews_upper

Interpret this confidence interval: what can we say about the effect of good news on incumbent vote share?

*YOUR ANSWER HERE*

## OLS Review: Population and Sample Models:
In the following questions, the models in focus are bivariate, using the population model ${Y_i} = \alpha + \beta X_i+u_i$ and sample model ${Y_i} = \hat{\alpha} + \hat{\beta}X_i+\hat{u_i}$  

**NOTE: You need to run the cells for the questions to properly render. Once your answer has been selected, tap on the next cell if you want to use SHIFT-ENTER functionality.**

Which of the following statements are accurate about the population regression model?

In [None]:
question('$u_i$ is the stochastic component of $Y_i$.')
question('$\hat{α}+\hat{β}X_i$ is the systematic component of $Y_i$')
question('Both (a) and (b) are correct')
question('Neither (a) nor (b) are correct')

Which of the following statements are accurate about the population regression model?

In [None]:
question('$\hat{u}_i$ is an estimate of u_i')
question('$X_i$ is assumed to be measured without error')
question('Both (a) and (b) are correct')
question('Neither (a) nor (b) are correct')

Which of the statements are accurate?

In [None]:
question('By specifying a bivariate regression model we are assuming that the impact of a one unit increase in $X_i$ will always be β.')
question('By specifying a bivariate regression model we are assuming that there are no other variables that cause $Y_i$.')
question('Both (a) and (b) are correct')
question('Neither (a) nor (b) are correct')

## Saving Your Notebook
Now that you've finished the homework, we need to save it! To do this, click <code>File</code> $\rightarrow$ <code>Download as</code> $\rightarrow$ <code>PDF via Chrome</code>