## This notebook demonstrates a few pandas features and shows a simple visualization.

Try creating your own notebook with variations of the features demonstrated here.

First, load up the numpy and pandas libraries.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


# How Many People Approve Donald Trump?

## Why this data
This data is interesting to me because I have always wondered if Americans actually have expressed that they approve or disapprove Donald Trump on a poll, if so, how many. The 2016 election result was astonishing as well as that of 2020, because, although Joe Biden was elected, it was still a very close race. 

### Goal:
My plan is to generate a graph that shows the percentage of people approving or disappoving Donald Trump chronologically from 2017 to 2020. I will be using the adjusted_approve and adjusted_disapprove data because the numbers are more accurate; they are weighted according to FiveThirtyEight's pollster ratings, which are based on pollsters’ historical accuracy in forecasting elections since 1998.

[Data Source](https://projects.fivethirtyeight.com/trump-approval-ratings/)

In [None]:
# Read csv file and create a dataframe
trump = pd.read_csv('../input/trumpapprovalpoll/approval_polllist.csv')
trump.head()

In [None]:
#Extract only the columns needed
trump[['enddate', 'pollster', 'adjusted_approve', 'adjusted_disapprove']].head()

In [None]:
#Create a new dataframe with only the relevant data
trumpsum = trump[['enddate', 'pollster', 'adjusted_approve', 'adjusted_disapprove']]
trumpsum.head()

In [None]:
#Create a new dataframe with the data from Gallup only (to avoid having multiple data on the same data)
gallup = trumpsum[trumpsum['pollster'] == 'Gallup']
gallup.tail()

In [None]:
#delete duplicate dates
newgallup = gallup.drop_duplicates(subset = "enddate")

newgallup.head()

In [None]:
#Convert 'enddate' column to a datetime object
newgallup['enddate'] = pd.to_datetime(newgallup['enddate'])

#Sort dataframe based on the 'enddate' column in ascending order
newgallup.sort_values(by='enddate')

In [None]:
#pd.options.plotting.backend = "plotly"
import plotly.graph_objects as go
fig.add_trace(go.Scatter(x=newgallup['enddate'], y=newgallup['adjusted_approve'], mode='lines', name='approve'))
fig.add_trace(go.Scatter(x=newgallup['enddate'], y=newgallup['adjusted_disapprove'], mode='lines', name='disapprove'))

#fig = px.line(newgallup, x='enddate', y='adjusted_approve')
#newgallup.plot(x=newgallup['enddate'], y=newgallup['adjusted_approve'])
fig.show()