# Pandas and Illustrator

**Do these in whatever order you'd like.** Feel free to do a bar graph, then skip ahead to some Buzzfeed line graphs, etc. Start in pandas, then once you save move on over to Illustrator.

Do each group - NYT, Buzzfeed, FiveThirtyEight, Economist, Guardian - in a different notebook.  I'm only leaving spaces in these here so you can scroll through them more easily.

Be sure to check out the other notebook for **tips and hints**.

In [2]:
!pip install seaborn

Collecting seaborn
[?25l  Downloading https://files.pythonhosted.org/packages/a8/76/220ba4420459d9c4c9c9587c6ce607bf56c25b3d3d2de62056efe482dadc/seaborn-0.9.0-py3-none-any.whl (208kB)
[K    100% |████████████████████████████████| 215kB 5.3MB/s ta 0:00:01
[?25hCollecting scipy>=0.14.0 (from seaborn)
[?25l  Downloading https://files.pythonhosted.org/packages/04/66/ec5f1283d6a290a9153881a896837487338c44639c1305cc59e1c7b69cc9/scipy-1.3.0-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (27.7MB)
[K    100% |████████████████████████████████| 27.7MB 723kB/s ta 0:00:011
[31msqlalchemy-migrate 0.12.0 has requirement SQLAlchemy>=0.9.6, but you'll have sqlalchemy 0.7.10 which is incompatible.[0m
Installing collected packages: scipy, seaborn
Successfully installed scipy-1.3.0 seaborn-0.9.0
[33mYou are using pip version 10.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' co

In [60]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

matplotlib.rcParams['pdf.fonttype'] = 42

%matplotlib inline


# NYT: Bar graphs

Recreate the bar charts from [this piece](https://www.nytimes.com/2017/12/20/upshot/democrats-2018-congressional-elections-polling.html) and [this piece](https://www.nytimes.com/2017/12/27/business/the-robots-are-coming-and-sweden-is-fine.html) and [this piece](https://www.nytimes.com/2017/09/29/upshot/dont-forget-the-republicans-incumbency-advantage-in-2018.html).

![](images/sample-nyt.png)

**Data:** 
   
* `generic_poll_lead.csv`
* `social-spending.csv`
* `cook_pvi.csv`

In [45]:
df = pd.read_csv("generic_poll_lead.csv")

In [46]:
df.head()

Unnamed: 0,year,lead,in_power
0,2018,13.4,no
1,2016,1.2,yes
2,2014,3.0,yes
3,2012,1.5,no
4,2010,0.4,no


In [47]:
df['in_power']

0      no
1     yes
2     yes
3      no
4      no
5      no
6      no
7      no
8     yes
9     yes
10    yes
11    yes
12    yes
13     no
14     no
Name: in_power, dtype: object

In [48]:
colors = []
df2 = df.sort_values(by='year')
for row in df2['in_power']:
    if row == 'no':
        color = '#F4C351'
    if row == 'yes':
        color = '#C9C7C3'
    colors.append(color)
print(colors)

['#F4C351', '#F4C351', '#C9C7C3', '#C9C7C3', '#C9C7C3', '#C9C7C3', '#C9C7C3', '#F4C351', '#F4C351', '#F4C351', '#F4C351', '#F4C351', '#C9C7C3', '#C9C7C3', '#F4C351']


In [1]:
ax = df.sort_values(by='year').plot(x='year',
                                    y ='lead',
                                    kind='barh',
                                    color = colors, 
                                    legend = None,
                                    figsize=(6,8),
                                    width = .8,
                                   frameon = False)
ax1 = plt.axes()
x_axis = ax1.axes.get_xaxis()
x_axis.set_visible(False)
y_axis = ax1.axes.get_yaxis()
y_axis.get_label().set_visible(False)


NameError: name 'df' is not defined

In [90]:
ax

<matplotlib.axes._subplots.AxesSubplot at 0x116fb1588>

# Buzzfeed: Diversity in the Agriculture Department

You are going to recreate the two line-graph visualizations in [this piece](https://www.buzzfeed.com/jsvine/agriculture-department-political-appointee-diversity) by the super-famous [Jeremy Singer-Vine](https://twitter.com/jsvine). If he can do it, you can too!

![](images/buzzfeed.png)

**I've included the 100% beautiful, cleaned up data:** `gender-by-quarter.csv` and `diversity-by-quarter.csv`.

Since there isn't much cleaning to do, this work is mostly going to be about **how to move between pandas/matplotlib and finish things up in Illustrator.**


# FiveThirtyEight: What makes Nigel Richards the best at Scrabble?

You are going to recreate the visualizations in [this piece](https://fivethirtyeight.com/features/what-makes-nigel-richards-the-best-scrabble-player-on-earth/) by [Oliver Roeder](https://twitter.com/ollie). This is one of my favorite series of charts in all of history! 

(When you break them down by divisions, though, don't draw the circles.)

![](images/sample-538.png)

**I've included the data, but there's a little work to be done:** 

* `scrabble-point-spread.csv` - the points for and against each player at Nationals from one year (2013)
* `scrabble-tournament.csv` - rating and division data for each player at Nationals from one year (2013)
* `ranked-players-with-scores.csv` - the top 200 ranked players and their average points per game (2018)

Note that the ranking data is from 2018 so the graph that uses it will look a bit different.


# The Economist

Recreate the donut chart from [this piece](https://www.economist.com/blogs/graphicdetail/2012/07/daily-chart-0) even though [they're terrible](https://peltiertech.com/chart-busters-the-economist-doesnt-read-forbes/).

![](images/economist.png)

In fact, the pies are _so terrible_ we're going to **recreate the [revised chart](https://peltiertech.com/chart-busters-the-economist-doesnt-read-forbes/), too**. The important thing about it is that while the original chart focuses on the actual values in 2007 and 2011, the arrow chart reduces those values to **just the change**, which is (arguably) the important part.

![](images/econ-revised.png)

The revised chart is ugly, though, so we're going to make it look nicer.

1. Slightly change the chart type (See Homework Hints file)
2. Change the colors (Make it match the Economist)
3. Change any other styles/font/etc (Make it match the Economist
4. Order the bars (Order by what? Up to you.)

**Data:** 
   
* `bank-profits.csv`

# The Guardian

We'll be recreating a single graph each from 

1. [this piece](https://www.theguardian.com/news/datablog/2018/jan/26/no-equality-in-the-honours-two-thirds-of-australia-day-awards-go-to-men)
2. and [this piece](https://www.theguardian.com/inequality/datablog/2017/jul/17/which-countries-most-and-least-committed-to-reducing-inequality-oxfam-dfi)
3. and [this piece](https://www.theguardian.com/money/datablog/2017/jan/06/tracking-the-cost-uk-and-european-commuter-rail-fares-compared-in-data)

![](images/guardian.png)

We'll be using skills from the Economist revision, so **please do that one first!**

**Data:**

* `order-of-australia.csv`
* `oxfam.csv`
* `commute.csv`