# Presidential Approval Ratings

* Want to play around with slope graphs
* Start/Ending approval ratings seem like a good place to start

## Source
* [The American Presidency Project at UCSB](http://www.presidency.ucsb.edu/data.php) by John Woolley and Gerhard Peters for the approval rating data
    * Data is originally from Gallup Poll, compiled by Gerhard Peters
    * I went through the individual tables and aggregated them
* Wikipedia article on [List of Presidents of the United States](https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States#Presidents) for term dates

In [1]:
import math
import codecs
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# Open tab separated text file using codecs to avoid newline formatting issues
doc = codecs.open('president_approval_ratings.txt', 'rU', 'UTF-8')
df_ratings = pd.read_csv(doc, sep='\t')
df_ratings.head(5)

Unnamed: 0,President,Start Date,End Date,% Approval,% Disapproval,% No opinion/data
0,Harry S. Truman,5/29/1945,5/29/1945,86,2,10
1,Harry S. Truman,8/22/1945,8/22/1945,91,2,5
2,Harry S. Truman,10/3/1945,10/3/1945,80,8,10
3,Harry S. Truman,10/31/1945,10/31/1945,75,13,11
4,Harry S. Truman,1/3/1946,1/3/1946,63,22,14


* Great, but the column names need to be changed to make indexing easy
* Also probably a good idea to get those dates into datetime format

In [3]:
df_ratings.columns = ['pres',
                      'poll_start',
                      'poll_end',
                      'pct_app',
                      'pct_dis',
                      'pct_none']

# Convert M/D/YYYY to pandas datetime YYYY-MM-DD
df_ratings.poll_start = pd.to_datetime(df_ratings.poll_start)
df_ratings.poll_end = pd.to_datetime(df_ratings.poll_end)

df_ratings.head(5)

Unnamed: 0,pres,poll_start,poll_end,pct_app,pct_dis,pct_none
0,Harry S. Truman,1945-05-29,1945-05-29,86,2,10
1,Harry S. Truman,1945-08-22,1945-08-22,91,2,5
2,Harry S. Truman,1945-10-03,1945-10-03,80,8,10
3,Harry S. Truman,1945-10-31,1945-10-31,75,13,11
4,Harry S. Truman,1946-01-03,1946-01-03,63,22,14


* Now I want to find the list of Presidents that are represented here

In [4]:
print '\n'.join(df_ratings.pres.unique())

Harry S. Truman
Dwight D. Eisenhower
John F. Kennedy
Lyndon B. Johnson
Richard Nixon
Gerald R. Ford
Jimmy Carter
Ronald Reagan
George Bush
William J. Clinton
George W. Bush
Barack Obama
Donald J. Trump


* How many presidents is that?

In [5]:
len(df_ratings.pres.unique())

13

* I would like to do a slope graph from the President's approval ratings at the start and end of their terms, so I need a list of those dates.
* Probably also use this to calculate monthly ratings, or ratings at intervals of the terms, etc.
* Thanks to [the List of US Presidents Wikipedia article](https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States#Presidents) I made a tab separated text file of the term dates

In [6]:
doc = codecs.open('president_terms.txt', 'rU', 'UTF-8')
df_terms = pd.read_csv(doc, sep='\t')
df_terms.columns = ['pres', 'term_start', 'term_end']

# Convert M/D/YYYY to pandas datetime YYYY-MM-DD
df_terms.term_start = pd.to_datetime(df_terms.term_start)
df_terms.term_end = pd.to_datetime(df_terms.term_end)
df_terms

Unnamed: 0,pres,term_start,term_end
0,Harry S. Truman,1945-04-12,1953-01-20
1,Dwight D. Eisenhower,1953-01-20,1961-01-20
2,John F. Kennedy,1961-01-20,1963-11-22
3,Lyndon B. Johnson,1963-11-22,1969-01-20
4,Richard Nixon,1969-01-20,1974-08-09
5,Gerald R. Ford,1974-08-09,1977-01-20
6,Jimmy Carter,1977-01-20,1981-01-20
7,Ronald Reagan,1981-01-20,1989-01-20
8,George Bush,1989-01-20,1993-01-20
9,William J. Clinton,1993-01-20,2001-01-20


In [7]:
df_terms['days_total'] = [int(i.days) for i in (df_terms.term_end - df_terms.term_start)]
df_terms

Unnamed: 0,pres,term_start,term_end,days_total
0,Harry S. Truman,1945-04-12,1953-01-20,2840
1,Dwight D. Eisenhower,1953-01-20,1961-01-20,2922
2,John F. Kennedy,1961-01-20,1963-11-22,1036
3,Lyndon B. Johnson,1963-11-22,1969-01-20,1886
4,Richard Nixon,1969-01-20,1974-08-09,2027
5,Gerald R. Ford,1974-08-09,1977-01-20,895
6,Jimmy Carter,1977-01-20,1981-01-20,1461
7,Ronald Reagan,1981-01-20,1989-01-20,2922
8,George Bush,1989-01-20,1993-01-20,1461
9,William J. Clinton,1993-01-20,2001-01-20,2922


In [8]:
row = 0
col = 0
for i in df_terms.days_total:
    for j in np.linspace(0,1,6):
        int(i*j)
        row += 1
    col += 1

0
568
1136
1704
2272
2840
0
584
1168
1753
2337
2922
0
207
414
621
828
1036
0
377
754
1131
1508
1886
0
405
810
1216
1621
2027
0
179
358
537
716
895
0
292
584
876
1168
1461
0
584
1168
1753
2337
2922
0
292
584
876
1168
1461
0
584
1168
1753
2337
2922
0
584
1168
1753
2337
2922
0
584
1168
1753
2337
2922
0
58
117
176
235
294


In [9]:
df_terms['term_25'] = df_terms['term_start'] + pd.Timedelta(days=(df_terms['days_total'][1]*i))

In [10]:
this

NameError: name 'this' is not defined