# US Election 2012 Polls Dataset

1. Who was being polled and what was their party affiliation?
2. Did the poll results favor Romney or Obama?
3. How do undecided voters effect the poll?
4. How did voter sentiment change over time?
5. Can we see an effect in the polls from the debates?


In [1]:
import pandas as pd
from pandas import Series,DataFrame
import numpy as np

# For visualization
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
plt.style.use("ggplot")
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
from datetime import datetime

from __future__ import division

In [2]:
import requests

In [3]:
from io import StringIO

In [4]:
url = "http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv"

In [None]:
source = requests.get(url).text

In [None]:
# Use StringIO to avoid an IO error with pandas
poll_data = StringIO(source) 

In [None]:
poll_df = pd.read_csv(poll_data)

In [None]:
poll_df.info()

In [None]:
poll_df.head()

In [None]:
poll_df.shape

In [None]:
# Checking Null Values
poll_df.isnull().sum()

In [None]:
poll_df.Obama.unique()

In [None]:
poll_df.Romney.unique()

In [None]:
poll_df.Population.unique()

In [None]:
poll_df.Undecided.unique()

In [None]:
poll_df.Other.unique()

In [None]:
poll_df.Partisan.unique()

In [None]:
poll_df.Affiliation.unique()

Let's delete the Question Text Column.

In [None]:
del poll_df['Question Text']

#### Exploratory Data Analysis

In [None]:
plt.figure(figsize=(9,9))
poll_df['Affiliation'].value_counts(normalize=True).plot(kind = 'bar')
plt.title('Affliation distribution', fontsize=20)
plt.xlabel('Affliation',fontsize=15)
plt.ylabel('% Proportion',fontsize=15)
plt.show()

### Looks like people are overall relatively neutral, but still leaning towards Democratic affiliation. 

In [None]:
plt.figure(figsize=(9,9))
sns.countplot(x = "Affiliation", hue = "Population", data = poll_df)

In [None]:
#Looks like we have a strong showing of likely voters and Registered Voters, 
# so the poll data should hopefully be a good reflection on the populations polled.

Let's go ahead and take a look at the averages for Obama, Romney , and the polled people who remained undecided.

In [None]:
poll_df.head()

In [None]:
stats_var=["Obama","Romney","Undecided"]

In [None]:
poll_df[stats_var].describe()

In [None]:
plt.figure(figsize=(9,9))
sns.distplot(poll_df["Obama"])
plt.xlabel("Obama Supporters Percentage")
plt.ylabel("Frequency")
plt.show()

In [None]:
plt.figure(figsize=(9,9))
sns.distplot(poll_df["Romney"])
plt.xlabel("Romney Supporters Percentage")
plt.ylabel("Frequency")
plt.show()

Let's do a quick time series analysis of the voter sentiment by plotting Obama/Romney favor versus the Poll End Dates. 

In [None]:
poll_df.plot(x='End Date',y=['Obama','Romney','Undecided'],marker='o',linestyle='', figsize=(9,9))

In [None]:
poll_df['Difference'] = (poll_df.Obama - poll_df.Romney)/100

poll_df.head()

The Difference column is Obama minus Romney, thus a positive difference indicates a leaning towards Obama in the polls.

In [None]:
poll_df = poll_df.groupby(['Start Date'],as_index=False).mean()

In [None]:
poll_df.head()

In [None]:
poll_df.shape

In [None]:
poll_df.plot('Start Date','Difference',figsize=(12,4),marker='o',linestyle='-',color='purple')

The debate dates were Oct 3rd, Oct 11, and Oct 22nd (2012). Let's plot some lines as markers and then zoom in on the month of October. In order to find where to set the x limits for the figure we need to find out where the index for the month of October in 2012 is. 


In [None]:
# Set row count and xlimit list
row_in = 0
xlimit = []

# Cycle through dates until 2012-10 is found, then print row index
for date in poll_df['Start Date']:
    if date[0:7] == '2012-10':
        xlimit.append(row_in)
        row_in +=1
    else:
        row_in += 1
        
print (min(xlimit))
print (max(xlimit))

In [None]:
# Start with original figure
fig = poll_df.plot('Start Date','Difference',figsize=(12,4),marker='o',linestyle='-',color='purple',xlim=(325,352))

# Now add the debate markers
plt.axvline(x=325+2, linewidth=4, color='grey')
plt.axvline(x=325+8, linewidth=4, color='grey')
plt.axvline(x=325+18, linewidth=4, color='grey')

### Thse polls reflect a dip for Obama after the second debate against Romney for a while, although the first  and third one gave him a boost.