<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 06: Census

Associated Textbook Sections: [6.3, 6.4](https://inferentialthinking.com/chapters/06/3/Example_Population_Trends.html)

<h2>Set Up the Notebook<h2>

In [2]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Table Review

### Some of the Table Methods

* Creating and extending tables: `Table().with_column` and `Table.read_table`
* Finding the size: `num_rows` and `num_columns`
* Referring to columns: indices --- column indices start at 0
* Accessing data in a column --- column takes a label or index and returns an array
* Using array methods to work with data in columns `item`, `sum`, `min`, `max`, and so on
* Creating new tables containing some of the original columns: `select`, `drop`

### Manipulating Rows

* `tbl.sort(column_name_or_index)` --- sorts the rows in increasing order
* `tbl.sort(column_name_or_index, descending=True)` --- sorts the rows in decreasing order
* `tbl.take(row_indices)` --- keeps the numbered rows where each row has an index, starting at 0
* `tbl.where(column, predicate)` --- keeps all rows for which a column's value satisfies a condition
* `t.where(column, value)` --- keeps all rows for which a column's value equals some particular value

Note that `t.where(column, value)` is the same as `t.where(column, are.equal_to(value))`.

### Demo: Exploring our Welcome Survey

Students in both Fall 2022 MATH 108 sections were surveyed. Load that data and explore it.

In [None]:
...

## Attribute Types


### Types of Attributes

All values in a column of a table should be both the same type and be comparable to each other in some way
* **Numerical** --- Each value is from a numerical scale
    * Numerical measurements are ordered
    * Differences are meaningful
* **Categorical** --- Each value is from a fixed inventory
    * May or may not have an ordering
    * Categories are the same or different


### “Numerical” Attributes

Just because the values are numbers, doesn't mean the variable is numerical
* Census example has numerical `SEX` code (`0`, `1`, and `2`)
* It doesn't make sense to perform arithmetic on these "numbers", e.g. `1 - 0` or `(0+1+2)/3` are meaningless
* The variable `SEX` is still categorical, even though numbers were used for the categories

## Census Data

### The Decennial Census

* Every ten years, the Census Bureau counts how many people there are in the U.S.
* In between censuses, the Bureau estimates how many people there are each year.
* Article 1, Section 2 of the Constitution: 
> "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers ..."


### Census Table Description

* Values have column-dependent interpretations
    * The `SEX` column: `1` is Male, `2` is Female
    * The `POPESTIMATE2010` column: 7/1/2010 estimate
* In this table, some rows are sums of other rows
    * The `SEX` column: `0` is Total (of Male + Female)
    * The `AGE` column: `999` is Total of all ages
* Numeric codes are often used for storage efficiency
    * Values in a column have the same type, but are not necessarily comparable (`AGE 12` vs `AGE 999`)

### Analyzing Census Data

Leads to the discovery of interesting features and trends in the population.

### Demo: Census

Explore the US Census data from the [Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States](https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2020/cc-est2020-agesex.pdf). 

(Release date: June 2021, Updated January 2022 to include April 1, 2020 estimates)

In [7]:
full = Table.read_table('https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/asrh/nc-est2020-agesex-res.csv')
full

SEX,AGE,CENSUS2010POP,ESTIMATESBASE2010,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014,POPESTIMATE2015,POPESTIMATE2016,POPESTIMATE2017,POPESTIMATE2018,POPESTIMATE2019
0,0,3944153,3944160,3951430,3963092,3926570,3931258,3954787,3983981,3954773,3893990,3815343,3783052
0,1,3978070,3978090,3957730,3966225,3977549,3942698,3948891,3973133,4002903,3972711,3908830,3829599
0,2,4096929,4096939,4090621,3970654,3978925,3991740,3958711,3966321,3991349,4020045,3987032,3922044
0,3,4119040,4119051,4111688,4101644,3981531,3991017,4005928,3974351,3982984,4006946,4033038,3998665
0,4,4063170,4063186,4077346,4121488,4111490,3992502,4004032,4020292,3989750,3997280,4018719,4043323
0,5,4056858,4056872,4064521,4087054,4131049,4121876,4004576,4017589,4035033,4003452,4008443,4028281
0,6,4066381,4066412,4072904,4074531,4096631,4141126,4133372,4017388,4031568,4048018,4014057,4017227
0,7,4030579,4030594,4042990,4082821,4084175,4106756,4152666,4145872,4030888,4044139,4058370,4022319
0,8,4046486,4046497,4025501,4052773,4092559,4094513,4118349,4165033,4158848,4042924,4054236,4066194
0,9,4148353,4148369,4125312,4035319,4062726,4103052,4106068,4130887,4177895,4170813,4053179,4061874


Select the `SEX`, `AGE`, `CENSUS2010POP`, and `POPESTIMATE2019` columns.

In [11]:
...

SEX,AGE,CENSUS2010POP,POPESTIMATE2019
0,0,3944153,3783052
0,1,3978070,3829599
0,2,4096929,3922044
0,3,4119040,3998665


Relabel the 2010 and 2019 columns.

In [12]:
...

SEX,AGE,2010,2019
0,0,3944153,3783052
0,1,3978070,3829599
0,2,4096929,3922044
0,3,4119040,3998665


Sort by `AGE`.

In [13]:
...

SEX,AGE,2010,2019
0,0,3944153,3783052
1,0,2014276,1935117
2,0,1929877,1847935
0,1,3978070,3829599
1,1,2030853,1958585
2,1,1947217,1871014
0,2,4096929,3922044
1,2,2092198,2005544
2,2,2004731,1916500
0,3,4119040,3998665


Remove the 999 ages and focus just on the combined data where the `SEX` value is 0. Drop the `SEX` column since there is only one value there.

In [15]:
...

<footer>
    <hr>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>