- Fit a linear regression model with 2+ features of your choice. Get and plot the coefficients.
- Use train-test split or leave-one-out cross-validation to get regression metrics: MSE, RMSE, MAE, R^2.
- Visualize the plane of best fit in 3D, with 2 features.
- Write a short, simple blog post about your elections model. (Or about your pageviews model, whichever you prefer!)
- AWESOME BUT DIFFICULT STRETCH GOAL: In your 3D visualization, can you include the actual datapoints, like in this notebook? https://nbviewer.jupyter.org/urls/s3.amazonaws.com/datarobotblog/notebooks/multiple_regression_in_python.ipynb Can you also include the residual lines from the datapoints to the plane of the best fit, like in _An Introduction to Statistical Learning?_

In [10]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [6]:
columns = ['Year','Incumbent Party Candidate','Other Candidate','Incumbent Party Vote Share']

data = [[1952,"Stevenson","Eisenhower",44.6],
        [1956,"Eisenhower","Stevenson",57.76],
        [1960,"Nixon","Kennedy",49.91],
        [1964,"Johnson","Goldwater",61.34],
        [1968,"Humphrey","Nixon",49.60],
        [1972,"Nixon","McGovern",61.79],
        [1976,"Ford","Carter",48.95],
        [1980,"Carter","Reagan",44.70],
        [1984,"Reagan","Mondale",59.17],
        [1988,"Bush, Sr.","Dukakis",53.94],
        [1992,"Bush, Sr.","Clinton",46.55],
        [1996,"Clinton","Dole",54.74],
        [2000,"Gore","Bush, Jr.",50.27],
        [2004,"Bush, Jr.","Kerry",51.24],
        [2008,"McCain","Obama",46.32],
        [2012,"Obama","Romney",52.00], 
        [2016,"Clinton","Trump",48.2]]
        
votes = pd.DataFrame(data=data, columns=columns)
votes = votes.set_index('Year')

gdp = pd.read_csv('https://raw.githubusercontent.com/WillHK/DS-Unit-2-Regression-1/master/module3-doing-linear-regression/gdp.csv')
gdp = gdp.drop('Line', axis=1)
gdp = gdp.rename({'Unnamed: 1': 'Year'}, axis=1)
gdp = gdp.T
gdp.columns = gdp.iloc[0]
gdp = gdp.drop(['Year', '1947', '1948', '2017', '2018'], axis=0)
gdp.index = gdp.index.astype('int64')
election_year_gdp = gdp.loc[votes.index.values]
df = pd.concat([votes, election_year_gdp], axis=1, join='outer')
df.columns = df.columns.str.strip()

target = 'Incumbent Party Vote Share'
mean_baseline = [df[target].mean()] * len(df)
df['Mean Baseline'] = df[target].mean()
df['Error'] = df['Mean Baseline'] - df[target]
df['Absolute Error'] = df['Error'].abs()

df = df.drop(['Change in private inventories', 'Net exports of goods and services', 'Addendum:'], axis=1)
cols = ['Incumbent Party Vote Share', 'Gross domestic product', 'Personal consumption expenditures', 'Goods', 'Durable goods', 'Nondurable goods', 'Services', 'Gross private domestic investment', 'Fixed investment', 'Nonresidential', 'Structures', 'Equipment', 'Intellectual property products', 'Residential', 'Exports', 'Goods', 'Services', 'Imports', 'Goods', 'Services', 'Government consumption expenditures and gross investment', 'Federal', 'National defense', 'Nondefense', 'State and local', 'Mean Baseline', 'Error', 'Absolute Error']
for col in cols:
    df[col] = df[col].apply(pd.to_numeric)
df.head()
nonfarm_payroll = pd.read_csv('https://raw.githubusercontent.com/WillHK/DS-Unit-2-Regression-1/master/module3-doing-linear-regression/non-farm-payroll-annual-change.csv', parse_dates=['DATE'])
nonfarm_payroll = nonfarm_payroll.drop(nonfarm_payroll.tail(1).index)
nonfarm_payroll = nonfarm_payroll.rename({"DATE": "Year", 'PAYEMS_PC1': 'pct_change'}, axis=1)
nonfarm_payroll['pct_change'] = pd.to_numeric(nonfarm_payroll['pct_change'])
years = nonfarm_payroll['Year'].dt.year
nonfarm_payroll = nonfarm_payroll.set_index(years)
nonfarm_payroll = nonfarm_payroll.drop('Year', axis=1)
election_year_payroll = nonfarm_payroll.loc[votes.index.values]
df = pd.concat([df, election_year_payroll], axis=1, join='outer')

     Incumbent Party Candidate Other Candidate  Incumbent Party Vote Share
Year                                                                      
1952                 Stevenson      Eisenhower                       44.60
1956                Eisenhower       Stevenson                       57.76
1960                     Nixon         Kennedy                       49.91
1964                   Johnson       Goldwater                       61.34
1968                  Humphrey           Nixon                       49.60


In [7]:
df

Unnamed: 0,Incumbent Party Candidate,Other Candidate,Incumbent Party Vote Share,Gross domestic product,Personal consumption expenditures,Goods,Durable goods,Nondurable goods,Services,Gross private domestic investment,Fixed investment,Nonresidential,Structures,Equipment,Intellectual property products,Residential,Exports,Goods.1,Services.1,Imports,Goods.2,Services.2,Government consumption expenditures and gross investment,Federal,National defense,Nondefense,State and local,"Gross domestic product, current dollars",Mean Baseline,Error,Absolute Error,pct_change
1952,Stevenson,Eisenhower,44.6,4.1,3.2,2.3,-2.2,3.8,4.5,-8.2,-0.6,-0.1,-0.4,-2.9,24.8,-1.6,-4.3,-5.1,-0.3,8.8,2.0,30.4,19.8,28.4,30.7,12.3,1.6,5.9,51.828235,7.228235,7.228235,2.0384
1956,Eisenhower,Stevenson,57.76,2.1,2.9,1.5,-3.7,3.5,4.8,-0.3,1.4,6.8,10.5,2.6,16.6,-7.9,16.5,17.3,13.1,8.1,8.9,6.5,0.2,-1.3,-0.5,-6.7,3.2,5.6,51.828235,-5.931765,5.931765,3.41271
1960,Nixon,Kennedy,49.91,2.6,2.7,1.8,2.0,1.7,3.9,0.4,1.2,5.5,7.9,3.8,6.1,-6.8,17.4,23.4,1.6,1.3,-1.6,7.8,0.6,-1.6,-0.8,-5.9,4.3,4.0,51.828235,1.918235,1.918235,1.72702
1964,Johnson,Goldwater,61.34,5.8,6.0,6.0,9.3,4.7,5.9,7.9,9.1,10.7,10.4,12.6,4.5,6.0,11.8,13.9,5.5,5.3,6.6,2.6,2.4,-0.4,-3.1,10.4,6.7,7.4,51.828235,-9.511765,9.511765,2.86729
1968,Humphrey,Nixon,49.6,4.9,5.8,6.2,11.1,4.2,5.3,6.0,7.0,4.8,1.4,6.1,7.5,13.5,7.9,8.1,7.4,14.9,20.7,1.8,3.4,1.5,1.6,1.3,6.0,9.4,51.828235,2.228235,2.228235,3.17166
1972,Nixon,McGovern,61.79,5.3,6.1,6.5,12.4,4.0,5.7,11.3,11.4,8.7,3.1,12.7,7.0,17.4,7.8,10.9,-0.4,11.3,13.6,4.2,-0.5,-3.2,-6.9,7.2,2.2,9.8,51.828235,-9.961765,9.961765,3.44428
1976,Ford,Carter,48.95,5.4,5.6,7.0,12.5,4.8,4.3,19.1,9.8,5.7,2.4,6.1,10.8,22.1,4.4,5.1,1.1,19.6,22.6,6.9,0.5,0.1,-0.6,1.7,0.8,11.2,51.828235,2.878235,2.878235,3.1607
1980,Carter,Reagan,44.7,-0.3,-0.3,-2.5,-8.0,-0.2,1.6,-10.1,-5.9,0.0,5.9,-4.4,5.0,-20.9,10.8,12.3,4.2,-6.7,-7.4,-2.2,1.8,4.2,3.6,5.4,-0.2,8.8,51.828235,7.128235,7.128235,0.66279
1984,Reagan,Mondale,59.17,7.2,5.3,7.2,14.3,4.1,3.8,27.3,16.2,16.7,13.9,19.4,13.7,14.8,8.2,7.1,11.8,24.3,24.2,25.1,3.5,3.2,5.0,-1.4,3.9,11.1,51.828235,-7.341765,7.341765,4.71146
1988,"Bush, Sr.",Dukakis,53.94,4.2,4.2,3.7,5.7,2.6,4.5,2.5,3.3,5.0,0.7,6.6,7.1,-0.9,16.2,17.8,11.9,3.9,4.1,3.4,1.2,-1.5,-0.5,-4.3,3.8,7.9,51.828235,-2.111765,2.111765,3.19339


In [11]:
one_off = df.iloc[-1]
df = df.drop(df.tail(1).index)
one_off

Incumbent Party Candidate                                      Obama
Other Candidate                                               Romney
Incumbent Party Vote Share                                        52
Gross domestic product                                           2.2
Personal consumption expenditures                                1.5
Goods                                                            2.1
Durable goods                                                      6
Nondurable goods                                                 0.4
Services                                                         1.2
Gross private domestic investment                                 11
Fixed investment                                                  10
Nonresidential                                                   9.5
Structures                                                        13
Equipment                                                         11
Intellectual property products    