# Regression Homework
In this homework we will review the process of generating an Ordinary Least Squares regression model. We will review the information that it can tell us about the relationship between variables.

Next, we need to load in our election data. This table represents presidential election outcomes from 1880 to 2016. In each row, we have collected information about different features during that year, such as inflation or the presence of a war.

In [2]:
elections <- read.csv('FairFPSR3.csv')
head(elections)

Here is the **codebook** that tells you what each variable means.

`inc_vote`: % of major party presidential vote won by incumbent party

`year`: Year of the presidential election

`inflation`: Inflation rate

`goodnews`: Number of quarters in the first 15 quarters of admin in which econ growth>3.2%

`growth`: % change in real GDP per capita

If we want to see the relationship between vote share and a variable, such as economic growth, we use the <code>lm(data$y_variable ~ data$x_variable)</code> function. To see the result, we call <code>summary()</code>. Below, we produce the regression results for the relationship between incumbent vote share and inflation.  

**NOTE:** The important values for you to consider are the number of observations, adjusted R-squared value and the coefficient, standard errors, t-statistics, and p-values associated with the different independent variables.

In [4]:
reg1 <- lm(elections$inc_vote ~ elections$inflation) 
summary(reg1)

<!-- BEGIN QUESTION -->

### Question 1:
In the output that is produced by the <code>summary()</code> function, there is a row labeled *elections$inflation*. How do we interpret the coefficient for the linear relationship between inflation and presidential vote share?

<!--
BEGIN QUESTION
name: q1
manual: true
points: 1
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Question 2:
Now, let's produce the OLS regression results for incumbent vote share and economic growth and replace the "..." with the correct variables: 

<!--
BEGIN QUESTION
name: q2
manual: false
points: 1
-->

In [5]:
## YOUR ANSWER HERE
vote_inflation_ols <- lm(elections$... ~ elections$...)
summary(vote_inflation_ols)

In [None]:
. = ottr::check("tests/q2.R")

<!-- BEGIN QUESTION -->

### Question 3:
Using the *elections$growth* row, we can review the relationship between growth and presidential vote share. Is the relationship statistically significant?

<!--
BEGIN QUESTION
name: q3
manual: true
points: 1
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



To use multiple variables, we can modify how we interact with the original function like so: <code>lm(data$dependent_variable ~ data$independent_var_1 + data$independent_var_2 + ...)</code>. Below, we run the regression between the two independent variables economic growth and monetary inflation.

In [7]:
vote_inflation_growth_ols <- lm(elections$inc_vote ~ elections$inflation + elections$growth)
summary(vote_inflation_growth_ols)

<!-- BEGIN QUESTION -->

### Questions 4:
Compare the coefficient and p-values for the two independent variables compared to when we just ran bivariate regression using each of them individually. How do these values change?

**Note:** Make sure you run the cell above to generate the multivariate regression. 

<!--
BEGIN QUESTION
name: q4
manual: true
points: 1
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Question 5:
Now, run the multivariate regression for the relationship between voteshare and "goodnews" and "growth" and replace the   "..." with the correct columns: 

<!--
BEGIN QUESTION
name: q5
manual: false
points: 1
-->

In [6]:
## YOUR ANSWER HERE
vote_goodnews_war_ols <- lm(elections$... ~ elections$... + elections$...)
summary(vote_goodnews_war_ols)

In [None]:
. = ottr::check("tests/q5.R")

<!-- BEGIN QUESTION -->

### Question 6:
**Coeffecient Review:**

(a) Using the coefficients for the intercept, goodnews, and growth variables, holding growth at its mean, how many months of good economic news is necessary for the incumbent to win?
**Hint:** The mean of economic growth is 0.7635

<!--
BEGIN QUESTION
name: q6a
manual: true
points: 1
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

(b) Is goodnews statistically significant at the .05 level? What about at .01? What does this imply about positive economic news and incumbent voteshare?

**Hint:** Use a two tailed t test.

<!--
BEGIN QUESTION
name: q6b
manual: true
points: 1
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



### Question 7:
Let's practice generating confidence intervals. As we have seen in past lectures, the 95% confidence interval is calculated with $\beta \pm t_{critical} * se(\beta)$. Let's find the 95% confidence interval for the GOODNEWS coefficient.

(a) Using the number of observations in the summary and the t-table in the back of your textbook, find the critical value of t, and store it in the variable below.

<!--
BEGIN QUESTION
name: q7a
manual: false
points: 1
-->

In [8]:
t_critical <- ...
t_critical

In [None]:
. = ottr::check("tests/q7a.R")

(b) Next, use the summary output to store the standard error for the goodnews coefficient.

<!--
BEGIN QUESTION
name: q7b
manual: false
points: 1
-->

In [10]:
goodnews_se <- ...
goodnews_se

In [None]:
. = ottr::check("tests/q7b.R")

(c) Using the standard error, calculate the 95% confidence interval. In the cell below, fill out the values for the lower and upper bound of the interval. 

<!--
BEGIN QUESTION
name: q7c
manual: false
points: 1
-->

In [12]:
## YOUR ANSWER HERE
goodnews_lower <-  ... - ...*goodnews_se
goodnews_upper <- ... + ...*goodnews_se
q7c.answer <-c(goodnews_lower, goodnews_upper)
q7c.answer

In [None]:
. = ottr::check("tests/q7c.R")

<!-- BEGIN QUESTION -->

(d) Interpret this confidence interval: what can we say about the effect of good news on incumbent vote share?

<!--
BEGIN QUESTION
name: q7d
manual: true
points: 1
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



## OLS Review: Population and Sample Models:

### Question 8
In the following questions, the models in focus are bivariate, using the population model ${Y_i} = \alpha + \beta X_i+u_i$ and sample model ${Y_i} = \hat{\alpha} + \hat{\beta}X_i+\hat{u_i}$  


(a) Which of the following statements are accurate about the population regression model?

a) ${u}_i$ is the stochastic component of $Y_i$  
b) $\hat{\alpha_i} + \hat{\beta} X_i$ is the systematic component of $Y_i$  
c) Both (a) and (b) are correct  
d) Neither (a) nor (b) are correct

<!--
BEGIN QUESTION
name: q8a
manual: false
points: 2
-->

In [14]:
q8a.answer <- ...

In [None]:
. = ottr::check("tests/q8a.R")

(b) Which of the following statements are accurate about the population regression model?

a) $\hat{u}_i$ is an estimate of $u_i$  
b) $X_i$ is assumed to be measured without error  
c) Both (a) and (b) are correct  
d) Neither (a) nor (b) are correct

<!--
BEGIN QUESTION
name: q8b
manual: false
points: 2
-->

In [16]:
q8b.answer <- ...

In [None]:
. = ottr::check("tests/q8b.R")

(c) Which of the statements are accurate?

a) By specifying a bivariate regression model we are assuming that the impact of a one unit increase in $X_i$ will always be $\beta$.  
b) By specifying a bivariate regression model we are assuming that there are no other variables that cause $Y_i$.  
c) Both (a) and (b) are correct    
d) Neither (a) nor (b) are correct 

<!--
BEGIN QUESTION
name: q8c
manual: false
points: 2
-->

In [18]:
q8c.answer <- ...

In [None]:
. = ottr::check("tests/q8c.R")

<!-- END QUESTION -->



-----

## Submitting Your Notebook

Congratulations! You have now finished your final coding assignment. Well done!

Before you head off to celebrate your work, please remember to save your work and submit it before the deadline!

To submit your notebook...

### 1. Select the cell below and hit run. Then wait 10 seconds.

In [None]:
# Don't worry about what is in this cell, just run it!
nb_name <- "Homework7.ipynb"
system(paste0("python3 -c 'import otter; otter.Notebook(\"", nb_name, "\").export()'"))
fp = tail(sort(Sys.glob("*.zip")), n=1)
IRdisplay::display_html(paste0("<a style='font-size: 36px;'' href='", fp, "' download='", fp, "'>Click here to download your submission</a>"))

After you hit "Run" on the cell above, wait for a moment (about 10 seconds), then click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to run the cell above again before you submit to get a new version of it.)

### 2. Submit the .zip file you just downloaded <a href="https://www.gradescope.com/courses/402785" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try the "old way": hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.