In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')



### Table of Contents

1.  <a href='#section 1'>Line graphs</a>

    a. <a href='#subsection 1a'> Comparing Years </a> <br><br>
    b. <a href='#subsection 1b'> Comparing Sex </a> <br><br>
    
2. <a href='#section 2'>Scatter plots</a>

    


---

## The Data <a id='data'></a>

Today, we will continue work with the US census data, just looking at years 2010 and 2014. 
As a reminder, the `SEX` column contains numeric codes: `0` stands for the total, `1` for male, and `2` for female. The `AGE` column contains ages in completed years, but the special value `999` is a sum of the total population.



---


In [None]:
data = 'http://www2.census.gov/programs-surveys/popest/datasets/2010-2015/national/asrh/nc-est2015-agesex-res.csv'

# A local copy can be accessed here in case census.gov moves the file:
# data = path_data + 'nc-est2015-agesex-res.csv'

full_census_table = Table.read_table(data)
partial_census_table = full_census_table.select('SEX', 'AGE', 'POPESTIMATE2010', 'POPESTIMATE2014')
us_pop = partial_census_table.relabeled('POPESTIMATE2010', '2010').relabeled('POPESTIMATE2014', '2014')
us_pop.sort('AGE')

## 1. Line Graphs <a id='section 1'></a>

Let's visualize our data!

<div class="alert alert-warning">

<b>Question 1:</b> Let's start by dropping the `999` code from age, and only looking at totals across sex. Fill in the blanks below:



In [None]:
no_999 = us_pop.where('AGE', are.below(999))
everyone = no_999.where(...).drop('SEX') ## YOUR CODE HERE
everyone

In [None]:
# ANSWER KEY
no_999 = us_pop.where('AGE', are.below(999))
everyone = no_999.where('SEX', 0).drop('SEX')
everyone

Now, lets plot population counts in 2010, with age on the x-axis.

In [None]:
everyone.plot('AGE', '2010')

<div class="alert alert-warning">
<b> Question 2</b>: Discuss: What are some interesting trends you notice from this line graph?

<b> YOUR ANSWER HERE: </b> 

Let's take a look at our table again. What if we wanted to compare population counts in 2010 and 2014?

In [None]:
everyone

In the previous example, we specified `2010` as the values we wanted for the y-axis. If we want all of the other columns in the table to be plotted, we only have to specify the argument for the x-axis to `.plot`.

<div class="alert alert-warning">
<b> Question 3: </b> Fill in the code to plot population counts in both 2010 and 2014.
    <a id='subsection 1a'></a>

In [None]:
....plot(...) # YOUR CODE HERE

In [None]:
#ANSWER KEY:
everyone.plot("AGE")

<div class="alert alert-warning">
<b> Question 4</b>: Discuss: What do you notice when comparing the plots for 2010 and 2014?

<b> YOUR ANSWER HERE: </b>

ANSWER KEY: gold graph (for 2014) looks a lot like the graph for 2010, shifted right by ~ 4 years. People who were 20 in 2010 are 24 in 2014 (there are a few more 24 year-olds in 2014 because of immigration). In later ages, the numbers dwindle a little bit because of higher numbers of death/less immigration. Discuss immigration/how it affects census.

We just compared population counts between different years, what if we wanted to compare plots between sex? <a id='subsection 1b'></a>

<div class="alert alert-warning">
<b> Question 5: </b> Fill in the code to construct a table of male and female population counts in 2014.

In [None]:
both_sexes = us_pop.where('AGE', are.below(999)).where('SEX', are.above(0))
males = both_sexes.where('SEX', 1).column("2014")
females = both_sexes.where(...).column(...) ## YOUR CODE HERE
by_sex = Table().with_columns("Age", np.arange(0, 101), "Males", ..., "Females", ...)
by_sex

In [None]:
# ANSWER KEY
both_sexes = us_pop.where('AGE', are.below(999)).where('SEX', are.above(0))
males = both_sexes.where('SEX', 1).column("2014")
females = both_sexes.where('SEX', 2).column("2014")
by_sex = Table().with_columns("Age", np.arange(0, 101), "Males", males, "Females", females)
by_sex

<div class="alert alert-warning">
<b> Question 6: </b> Fill in the code to plot male and female population counts in 2014, per age.

In [None]:
by_sex.plot(...) ##YOUR CODE HERE

In [None]:
# ANSWER KEY
by_sex.plot("Age")

<div class="alert alert-warning">
<b> Question 7: </b> What do you notice about the above plot?

<b> YOUR ANSWER HERE: </b>

ANSWER KEY: discuss when male population count is higher than females and vice versa.

## 2. Scatter Plots <a id='section 2'></a>

Let's go back to the actors table from Week 1.

In [None]:
actors = Table().read_table("actors.csv").relabeled("Gross", "#1 Movie Gross")
actors

Let's start by looking at the relationship between `Number of Movies` and `Average per Movie`. Again, we specify our label for the x-axis first, followed by the label for the y-axis.

In [None]:
actors.scatter("Number of Movies", "Average per Movie") ## RUN THIS CELL

<div class="alert alert-warning">
<b> Question 1: </b> What does each point in this plot represent? What are some general patterns you notice?

<b> YOUR ANSWER HERE: </b>

ANSWER KEY: Each point represents an actor. General negative association - higher number of movies indicates lower average per movie, why might this be the case?

Like line graphs, we can plot multiple scatter plots on one graph by only specifying the label for the x-axis.

<div class="alert alert-warning">
<b> Question 2: </b> Fill in the code to plot "Number of Movies" against "Total Gross" and "#1 Movie Gross". What do you notice?

In [None]:
actors.select("Number of Movies", ... , ...).scatter(...) ## YOUR CODE HERE

<b> YOUR ANSWER HERE: </b> 

In [None]:
#ANSWER KEY
actors.select("Number of Movies", "Total Gross", "#1 Movie Gross").scatter("Number of Movies") ## YOUR CODE HERE

ANSWER KEY: total gross is clearly higher than #1 movie gross. general positive association between number of movies and total gross (higher number of movies associated with higher total gross), with exceptions.

---

## Bibliography

- John Denero - Data 8X, Census: Males and Females. https://www.youtube.com/watch?v=SAJavz58uHk&feature=youtu.be
- Ani Adhikari - Data 8X, Charts: Scatter Plots. https://www.youtube.com/watch?v=6mPOvbubJSM&feature=youtu.be
- Ani Adhikari - Data 8X, Charts: Line Graphs. https://www.youtube.com/watch?v=pcEadlLnFBw&feature=youtu.be

---
Notebook developed by: X, X, X

Data Science Modules: http://data.berkeley.edu/education/modules
