<img src="./ccsf.png" alt="CCSF Logo" width=200px style="margin:0px -5px">

# Lecture 08: Census

Associated Textbook Sections: [6.3, 6.4, 7.0](https://ccsf-math-108.github.io/textbook/chapters/06/3/Example_Population_Trends.html)

---

## Overview

* [Exploring Census Data](#Exploring-Census-Data)
* [Visualizing Trends](#Visualizing-Trends)

---

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

---

## Exploring Census Data

---

### The Decennial Census

* Every ten years, the Census Bureau counts how many people there are in the U.S.
* In between censuses, the Bureau estimates how many people there are each year.
* Article 1, Section 2 of the Constitution: 
> "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers ..."


---

### Census Table Description

* Values have column-dependent interpretations
    * The `SEX` column: `1` is Male, `2` is Female
    * The `POPESTIMATE2010` column: 7/1/2010 estimate
* In this table, some rows are sums of other rows
    * The `SEX` column: `0` is Total (of Male + Female)
    * The `AGE` column: `999` is Total of all ages
* Numeric codes are often used for storage efficiency
    * Values in a column have the same type, but are not necessarily comparable (`AGE 12` vs `AGE 999`)

---

### Demo: Census

Explore the US Census data in `census.csv` from the [Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States](https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2020/cc-est2020-agesex.pdf). 

(Release date: June 2021, Updated January 2022 to include April 1, 2020 estimates)

In [None]:
census = Table.read_table('census.csv')
census

---

Clean up the table by completing the following:
1. Select the `SEX`, `AGE`, `CENSUS2010POP`, and `POPESTIMATE2019` columns.
2. Relabel the 2010 and 2019 columns.
3. Remove the 999 ages and focus just on the combined data where the `SEX` value is 0. Drop the `SEX` column since there is only one value there.

In [None]:
census_reduced = ...
census_relabeled = ...
census_no_999 = ...
everyone = ...
everyone

---

## Visualizing Trends

---

### Visualizing Numerical Trends

<img src="./Dwiggins_graph.jpg" alt="Published in 1919, Dwiggins used this parody graph to express his opinion of standards in printing" width=300px>

* You will soon learn about basic data visualizations.
* For example, what is a fundamental way to visualize how population sizes change over the range of ages?
* In this case, the fundamental visualization is a line plot.

---

### Demo: Line Plots

Visualize the relationship between age and population size in 2010.

In [None]:
...

# Some extra graph formatting you are not responsible for
plt.title('US Population Size') 
plt.show()

---

Include lines for both 2010 and the estimated 2019 population sizes.

In [None]:
...

# Some extra graph formatting you are not responsible for
plt.title('US Population Size') 
plt.show()

---

### Demo: Male and Female 2019 Estimates

Create a table with `Age`, `Males`, `Females` columns showing the population estimates in 2019 for males and females by age.

In [None]:
males = ...
females = ...
pop_2019 = ...
pop_2019

---

Visualize the distribution of of population size for both males and females.

In [None]:
...

# Some extra graph formatting you are not responsible for
plt.title('2019 Population Size Estimates')
plt.show()

---

Calculate the percent female for each age

In [None]:
total = ...
pct_female = ...
pct_female

In [None]:
pct_female = ...
pct_female

---

Add female percent to our table

In [None]:
pop_2019 = ...
pop_2019

---

Visualize the relationship between age and the percent of the population that is female.

In [None]:
...

# Some extra graph formatting you are not responsible for
plt.title('Female Population Percentage over Age')
plt.show()

In [None]:
...

# Some extra graph formatting you are not responsible for
plt.ylim(0, 100)
plt.title('Female Population Percentage over Age')
plt.show()

---

## Attribution

This content is licensed under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)</a> and derived from the <a href="https://www.data8.org/">Data 8: The Foundations of Data Science</a> offered by the University of California, Berkeley.

<img src="./by-nc-sa.png" width=100px>