You'll create a formula to define a one-variable modeling task, and then fit a linear model to the data. You are given the rates of male and female unemployment in the United States over several years ([Source](http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/slr02.html)).

The task is to predict the rate of female unemployment from the observed rate of male unemployment. The outcome is `female_unemployment`, and the input is `male_unemployment`.

The sign of the variable coefficient tells you whether the outcome increases (+) or decreases (-) as the variable increases.

Recall the calling interface for `lm()` is:

`lm(formula, data = ___)`

In [2]:
unemployment <- readRDS('data/unemployment.rds')

# unemployment is loaded in the workspace
summary(unemployment)

# Define a formula to express female_unemployment as a function of male_unemployment
fmla <- female_unemployment ~ male_unemployment

# Print it
print(fmla)

# Use the formula to fit a model: unemployment_model
unemployment_model <- lm(fmla, data = unemployment)

# Print it
print(unemployment_model)

 male_unemployment female_unemployment
 Min.   :2.900     Min.   :4.000      
 1st Qu.:4.900     1st Qu.:4.400      
 Median :6.000     Median :5.200      
 Mean   :5.954     Mean   :5.569      
 3rd Qu.:6.700     3rd Qu.:6.100      
 Max.   :9.800     Max.   :7.900      

female_unemployment ~ male_unemployment

Call:
lm(formula = fmla, data = unemployment)

Coefficients:
      (Intercept)  male_unemployment  
           1.4341             0.6945  



The coefficient for male unemployment is positive, so female unemployment increases as male unemployment does. Linear regression is the most basic of regression approaches.

## Examining a model

Let's look at the model unemployment_model that you have just created. There are a variety of different ways to examine a model; each way provides different information. We will use `summary()`, `broom::glance()`, and `sigr::wrapFTest()`.

In [3]:
# Install additional packages

install.packages(c('broom', 'sigr'))


The downloaded binary packages are in
	/var/folders/6w/tb8lgx5n4_z__f78m5kmnjkm0000gn/T//RtmpwWv6go/downloaded_packages


In [13]:
# Load packages
library(broom)
library(sigr)

# Print unemployment_model
print(unemployment_model)

# Call summary() on unemployment_model to get more details
summary(unemployment_model)

# Call glance() on unemployment_model to see the details in a tidier form
glance(unemployment_model)

# Call wrapFTest() on unemployment_model to see the most relevant details
wrapFTest(unemployment_model)


Call:
lm(formula = fmla, data = unemployment)

Coefficients:
      (Intercept)  male_unemployment  
           1.4341             0.6945  




Call:
lm(formula = fmla, data = unemployment)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.77621 -0.34050 -0.09004  0.27911  1.31254 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        1.43411    0.60340   2.377   0.0367 *  
male_unemployment  0.69453    0.09767   7.111 1.97e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5803 on 11 degrees of freedom
Multiple R-squared:  0.8213,	Adjusted R-squared:  0.8051 
F-statistic: 50.56 on 1 and 11 DF,  p-value: 1.966e-05


r.squared,adj.r.squared,sigma,statistic,p.value,df,logLik,AIC,BIC,deviance,df.residual
0.8213157,0.8050716,0.5802596,50.56108,1.965985e-05,2,-10.28471,26.56943,28.26428,3.703714,11


[1] "F Test summary: (R2=0.82, F(1,11)=51, p=2e-05)."

There are several different ways to get diagnostics for your model. Use the one that suits your needs or preferences the best.

## Predicting from the unemployment model

In this exercise, you will use your unemployment model `unemployment_model` to make predictions from the unemployment data, and compare predicted female unemployment rates to the actual observed female unemployment rates on the training data, `unemployment`. You will also use your model to predict on the new data in newrates, which consists of only one observation, where male unemployment is 5%.

The `predict()` interface for lm models takes the form

    predict(model, newdata)

You will use the `ggplot2` package to make the plots, so you will add the prediction column to the `unemployment` data frame. You will plot outcome versus prediction, and compare them to the line that represents perfect predictions (that is when the outcome is equal to the predicted value).

The `ggplot2` command to plot a scatterplot of `dframe$outcome` versus `dframe$pred` (`pred` on the x axis, `outcome` on the y axis), along with a blue line where `outcome == pred` is as follows:

    ggplot(dframe, aes(x = pred, y = outcome)) + 
           geom_point() +  
           geom_abline(color = "blue")

In [4]:
newrates <- data.frame(5)
colnames(newrates) <- 'male_unemployment'

In [5]:
newrates

male_unemployment
5


In [8]:
# unemployment is in your workspace
summary(unemployment)

# newrates is in your workspace
newrates

# Predict female unemployment in the unemployment data set
unemployment$prediction <-  predict(unemployment_model, unemployment)

# load the ggplot2 package
library(ggplot2)

# Make a plot to compare predictions to actual (prediction on x axis). 

options(repr.plot.width = 6, repr.plot.height = 6)
options(jupyter.plot_mimetypes = "image/svg+xml") 

p <- ggplot(unemployment, aes(x = prediction, y = female_unemployment)) + 
  geom_point() +
  geom_abline(color = "blue")
p + ggtitle(sprintf(
    "Plot width = %s, plot height = %s", 
    getOption("repr.plot.width"),
    getOption("repr.plot.height")
))

# Predict female unemployment rate when male unemployment is 5%
pred <- predict(unemployment_model, newrates)

# Print it
pred

 male_unemployment female_unemployment   prediction   
 Min.   :2.900     Min.   :4.000       Min.   :3.448  
 1st Qu.:4.900     1st Qu.:4.400       1st Qu.:4.837  
 Median :6.000     Median :5.200       Median :5.601  
 Mean   :5.954     Mean   :5.569       Mean   :5.569  
 3rd Qu.:6.700     3rd Qu.:6.100       3rd Qu.:6.087  
 Max.   :9.800     Max.   :7.900       Max.   :8.240  

male_unemployment
5


ERROR: Error in file(con, "rb"): cannot open the connection
