## Visualizations - Practice Challenge

Is [Charles Minard's](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.title.html) chart of [Napoleon's retreat from Moscow](http://www.masswerk.at/minard/) the best infographic ever? You're going to practice using a number of visualization libraries. -Make sure you've installed [Seaborn](https://seaborn.pydata.org/), [ggplot](http://ggplot.yhathq.com/), [Folium](https://folium.readthedocs.io/en/latest/) and even the dreaded [matplotlib](https://matplotlib.org/)!

In [None]:
from datetime import datetime
import re

from bs4 import BeautifulSoup
import folium
from ggplot import *
import matplotlib.dates as md
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
import seaborn as sns

First we'll get some of the General Election data we've played with before back into Pandas:

In [None]:
ge_2010_url = 'https://raw.githubusercontent.com/augeas/undergrad-python-exercises/master/jsondata/ge_2010.json'
ge_2010 = pd.read_json(requests.get(ge_2010_url).text)

ge_2015_url = 'https://raw.githubusercontent.com/augeas/undergrad-python-exercises/master/jsondata/ge_2015.json'
ge_2015 = pd.read_json(requests.get(ge_2015_url).text)

ge_2017_url = 'https://raw.githubusercontent.com/augeas/undergrad-python-exercises/master/jsondata/ge_2017.json'
ge_2017 = pd.read_json(requests.get(ge_2017_url).text)

ge_all_url = 'https://raw.githubusercontent.com/augeas/undergrad-python-exercises/master/jsondata/pandas_ge_data.json'
ge_all = pd.read_json(requests.get(ge_all_url).text)

Here we build a dictionary to get party names by their abbreviations, we don't use 2015 to do this, it was a bit odd.

In [None]:
def party_abbrev(df):
    holds = df['outcome'].str.lower() == 'seat held'
    parties = dict(zip(df[holds]['member_party'].unique(),df[holds]['incumbent_party'].unique()))
    return parties

p_2010 = party_abbrev(ge_2010)
p_2017 = party_abbrev(ge_2017)

parties = p_2010.copy()
parties.update(p_2017)

parties

This function returns a dictionary whose keys are party abbreviations, the values are the net changes in seats for that party. It takes a dataframe, one of *ge_2010*, *ge_2015* or *ge_2017*.

In [None]:
def net_seats(df):
    gains = (df['outcome'].str.lower() == 'seat gain') | (df['outcome'].str.lower() == 'gain')
    agg_gains = df[gains].groupby('member_party').aggregate({'member_party':'count'})
    agg_losses = df[gains].groupby('incumbent_party').aggregate({'incumbent_party':'count'})
    changes = {}
    for k in parties.keys():
        try:
            gain = agg_gains['member_party'][k]
        except:
            gain = 0
        try:
            loss = agg_losses['incumbent_party'][parties[k]]
        except:
            loss = 0
        change = gain - loss
        if change:
            changes[k] = gain - loss
    # Merge the Labour and Labour Co-op parties.
    changes['L'] += changes.get('L Co-op',0)
    changes.pop('L Co-op',None)
    return changes                     

## Part 1: A Simple Bar Chart

Take a look at this [Seaborn bar-chart example](https://seaborn.pydata.org/examples/color_palettes.html). Plot the net change in seats for each party in the 2017 election as a bar chart. The party abbreviations should be on the x-axis. You'll need to create a list for the y-axis data. 

In [None]:
%matplotlib inline
net_gains_2017 = net_seats(ge_2017)
party_list = list(net_gains_2017.keys())

# Your code here:

Yes, it's the Titanic data set again.

In [None]:
titanic_url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
titanic = pd.read_csv(titanic_url)

## Part 2: Faceted Historgrams

Here's a slightly dull use of Seaborn's [faceted histograms](https://seaborn.pydata.org/examples/faceted_histogram.html). There are four histograms of the Titanic passesngers fares, according to whether they were male or female and whether they survived. Produce a set of six faceted histograms for the passengers who survived. The three passenger classes should vary on the x-axis, with the passenger's gender of the y-axis.

In [None]:
%matplotlib inline
fares = sns.FacetGrid(titanic, row="Survived", col="Sex", margin_titles=True)
bins = np.linspace(0, 200, 10)
fares.map(plt.hist, "Fare", bins=bins, lw=0)
survivors = titanic['Survived'] == 1

In [None]:
%matplotlib inline
bins = np.linspace(0, 80, 10)

# Your code here:

For this next bit, we'll use the [Wayback Machine API](https://archive.org/help/wayback_api.php) to grab data from the [2016 Jill Stein recount crowd-funder](https://www.jill2016.com/recount). The next two cells will take a while to run.

In [None]:
rec_req = requests.get("https://web.archive.org/cdx/search/cdx?url=jillstein.nationbuilder.com/recount&output=json")
timestamps = [snapshot[1] for snapshot in rec_req.json()[1:]]

In [None]:
def parse_val(soup,goal=False):
    # https://twitter.com/thepracticaldev/status/710156980535558144
    try:
        txt = soup.find(**{'class_':('bar-text','bar-goal')[goal]}).text
        return float(''.join(re.search('[0-9.,]+',txt).group().split(','))) / 1.0E6
    except:
        return False

def get_vals(timestamp):
    req = requests.get("http://web.archive.org/web/"+timestamp+"/https://jillstein.nationbuilder.com/recount")
    soup = BeautifulSoup(req.content, 'html.parser')
    return (parse_val(soup),parse_val(soup,goal=True))
    
rec_data = [get_vals(t) for t in timestamps if t[0:4] == '2016']

"rec_data" is a list of tuples, the first item is the ammount raised in millions of Dollars, the second is the goal ammount.

In [None]:
rec_data[0:5]

In [None]:
raised, goal = zip(*filter(lambda x: x, rec_data))
rec_times = [datetime.strptime(t, '%Y%m%d%H%M%S') for t in timestamps if t[0:4] == '2016']

We split the list into separate ones for the raised and goal ammounts, "rec_times" is a list of Python [Datetime](https://docs.python.org/3/library/datetime.html) objects.

In [None]:
print(rec_times[0:5], '\n', raised[0:5], '\n', goal[0:5])

We *can* use Matplotlib to plot this, but as you can see, it's a bit of a production.

In [None]:
%matplotlib inline
fig, ax = plt.subplots()
fmt = md.DateFormatter('%m/%d %H:%M')
ax.xaxis.set_major_formatter(fmt)
plt.xticks(rotation=25)
raised_line, = plt.plot(rec_times, raised, marker='*')
ax.set_xlabel('Time Retrieved')
ax.set_ylabel('Amount in Millions of $')
goal_line, = plt.plot(rec_times, goal)
plt.legend([raised_line, goal_line], ['raised','goal'], loc='lower right')
plt.show()

## Part 3: Time Series Using ggplot

The last figure can be done in three lines with ggplot.

- Use Pandas [https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_items.html](from_items) function to build a dataframe from the three lists with columns called "time", "raised" and "goal".
- Read the "trends over time" section of this [ggplot blog-post](http://blog.yhat.com/posts/aggregating-and-plotting-time-series-in-python.html). You'll neeed the Pandas [melt function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html).
- You'll also need the "ylab" function mentioned in [this post](http://blog.yhat.com/posts/ggplot-for-python.html) to reproduce the figure.

In [None]:
# Your code here

Back to UK politics. Here's a hand [list of constituency locations](https://www.doogal.co.uk/ElectoralConstituencies.php). We can use Pandas [merge function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) to get a dataframe of locations and winning parties.

In [None]:
consituency_url = 'https://www.doogal.co.uk/ElectoralConstituenciesCSV.ashx'
consituencies = pd.read_csv(consituency_url)

def seat_winners(yr):
    wins = (ge_all['Year']==yr) & (ge_all['Majority Party'].notnull())
    return ge_all[wins]

def seat_locations(yr):
    return pd.merge(seat_winners(yr), consituencies, left_on='Constituency', right_on='Constituency')

In [None]:
party_colours = {'Conservative':'blue', 'Labour':'red','Lib Dem':'beige', 'Democratic Unionist Party':'pink',
    'Sinn Fein':'darkgreen', 'Independent':'gray', 'Scottish National Party':'orange', 'Plaid Cymru':'purple',
    'Green':'green', 'Speaker':'lightgray'}

seats_2017 = seat_locations(2017)[['Party','Latitude','Longitude']]

seats_2017[0:5]

## Part 4: UK Election Map Using Folium
Complete the for-loop over the "seats_2017" dataframe's [itertuples method](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.itertuples.html). Call "help" on Folium's "folium.Marker" and "folium.Icon" classes to add a suitably coloured marker to "seat map". There's an example in the SQL challenge notebook. Let's hope the DUP don't mind being coloured [pink](https://www.pinknews.co.uk/2017/06/12/meet-the-dup-homophobes-who-now-hold-the-keys-to-power-in-the-uk/)...

In [None]:
map_centre = [seats_2017['Latitude'].mean(), seats_2017['Longitude'].mean()]

seat_map = folium.Map(location=map_centre, zoom_start=5)

for row in seat_locations(2017)[['Party','Latitude','Longitude']].itertuples():
    pass
    # Your code here.
    
seat_map

Let's look at how vote share changes in constituencies.

In [None]:
def consituency_df(con):
    cons = ge_all['Constituency'] == con
    return ge_all[cons]

We'll need quite a bit of [Pandas magick](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) to get this datframe into a useable form:

In [None]:
edgbaston = consituency_df('Birmingham, Edgbaston')[['Party','Share of Vote','Year']]
edgbaston[0:10]

In [None]:
flat_edgbaston = edgbaston.groupby(['Year','Party']).sum().unstack(level=1)['Share of Vote']
flat_edgbaston

## Part 5: Stacked Bars With Pandas and Matplotlib

Use the "flat_edgbaston" dataframe to plot the evolving vote share in Edgbaston, Birmingham as a stacked bar chart. The [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html) will help. Use the "figsize" parameter to ensure the graph is a decent size. The [legend and key](https://matplotlib.org/users/legend_guide.html) should be in a sensible place. Give your chart a [title](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.title.html) too.

In [None]:
fig = plt.figure()

# Your code here.

plt.show()