## Causes of Death by Year

In [56]:
# Run this cell to set up the notebook, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)

# These lines load the tests.
from client.api.assignment import load_assignment 
tests = load_assignment('causes_of_death_by_year.ok')

This exercise is designed to give you practice using the Table method `pivot`.

We'll be looking at a dataset from the California Department of Public Health (available [here](http://www.healthdata.gov/dataset/leading-causes-death-zip-code-1999-2013) and described [here](http://www.cdph.ca.gov/data/statistics/Pages/DeathProfilesbyZIPCode.aspx)) that records the cause of death (as recorded on a death certificate) for everyone who died in California from 1999 to 2013.  The data are in the file `causes_of_death.csv.zip`.  Each row records the number of deaths by one cause in one year in one ZIP code.

To make the file smaller, we've compressed it; run the next cell to unzip and load it.

In [57]:
!unzip -f causes_of_death.csv.zip
causes = Table.read_table('causes_of_death.csv')
causes

The causes of death in the data are abbreviated.  We've provided a table called `abbreviations.csv` to translate the abbreviations.

In [58]:
abbreviations = Table.read_table('abbreviations.csv')
abbreviations.show()

We're going to examine the changes in causes of death over time.  To make a plot of those numbers, we need to have a table with one row per year, and the information about all the causes of death for each year.

**Question 1.** Create a table with 1 row for each year and a column for each kind of death containing the number of deaths by that cause in that year.  The columns' names should be the abbreviated causes of death.  Call the table `causes_by_year`.

In [59]:
causes_by_year = ...
causes_by_year.show()

In [61]:
_ = tests.grade('q1')

As you can see from the table you created, the dataset is missing data on certain causes of death for certain years.  It looks like those causes of death are relatively rare, so for some purposes it makes sense to drop them from consideration.  Of course, we'll have to keep in mind that we're no longer looking at a comprehensive report on all deaths in California.

**Question 2.** Make a new version of `causes_by_year` that includes only columns that don't appear to have missing data.  Call it `cleaned_causes_by_year`.

In [None]:
cleaned_causes_by_year = ...
cleaned_causes_by_year.show()

In [63]:
_ = tests.grade('q2')

**Question 3.** Make a plot of all the causes of death by year, using your cleaned-up version of the dataset.  There should be a single plot with one line per cause of death.

*Hint:* Use the Table method `plot`.  If you pass only a single argument, a line will be made for each of the other columns.

In [44]:
...

**Question 4.** It's probably hard to read that plot.  One problem is that the name of each cause of death is abbreviated.  Using the `abbreviations` table, redo your work in the previous questions to make a plot whose labels are the unabbreviated causes of death.

In [48]:
# Use this cell to make the new plot.

**Question 5.** It looks like there was a miraculous drop in deaths from Alzheimer's disease in 2011 and 2012, and then a dramatic rise in 2013.  Using the plot you made (or some other plot, if you prefer), can you explain this?

*Write your answer here, replacing this text.*

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [tests.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]

In [None]:
# Run this cell to submit your work *after* you have passed all of the test cells.
# It's ok to run this cell multiple times. Only your final submission will be scored.

!TZ=America/Los_Angeles ipython nbconvert --output=".causes_of_death_by_year_$(date +%m%d_%H%M)_submission.html" causes_of_death_by_year.ipynb