In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("hw03.ipynb")

<img style="display: block; margin-left: auto; margin-right: auto" src="./ccsf-logo.png" width="250rem;" alt="The CCSF black and white logo">

# Homework 03: Tables

**Recommended Reading**: 
* [Introduction to Tables](https://inferentialthinking.com/chapters/03/4/Introduction_to_Tables.html)
* [Tables](https://inferentialthinking.com/chapters/06/Tables.html)

## Assignment Reminders

- Make sure to run the code cell at the top of this notebook that starts with `# Initialize Otter` to load the auto-grader.
- For all tasks indicated with a 🔎 that you must write explanations and sentences for, provide your answer in the designated space.
- Throughout this assignment and all future ones, please be sure to not re-assign variables throughout the notebook! _For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously!_
- We encourage you to discuss this assignment with others, but make sure to write and submit your own code. Refer to the syllabus to learn more about how to learn cooperatively.

*View the related <a href="https://ccsf.instructure.com" target="_blank">Canvas</a> Assignment page for additional details.*

Run the following code cell to import the tools for this assignment.

In [None]:
import numpy as np
from datascience import *
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Unemployment


The [Federal Reserve Bank of St. Louis]((https://fred.stlouisfed.org/categories/33509)) publishes data about jobs in the US.  Below, we've loaded data on unemployment in the United States. There are many ways of defining unemployment, and our dataset includes two notions of the unemployment rate:

1. Among people who are able to work and are looking for a full-time job, the percentage who can't find a job.  This is called the Non-Employment Index, or NEI.
2. Among people who are able to work and are looking for a full-time job, the percentage who can't find any job *or* are only working at a part-time job.  The latter group is called "Part-Time for Economic Reasons", so the acronym for this index is NEI-PTER.  (Economists are great at marketing.)

*In this assignment, you will do some preliminary analysis of these unemployment metrics. In a future assignment, you will work on creating visualizations and summarizing your analysis.*


### Task 01 📍

The data are in a CSV file called `unemployment.csv`.  Load that file into a table called `unemployment`.


_Points:_ 1

In [None]:
unemployment = ...
unemployment

In [None]:
grader.check("task_01")

### Task 02 📍

Sort the data in descending order by NEI, naming the sorted table `by_nei`.  Create another table called `by_nei_pter` that's sorted in descending order by NEI-PTER instead.


_Points:_ 1

In [None]:
by_nei = ...
by_nei_pter = ...

In [None]:
grader.check("task_02")

### Task 03 📍

Use `take` to make a table containing the data for the 10 quarters when NEI was greatest.  Call that table `greatest_nei`.

`greatest_nei` should be sorted in descending order of `NEI`. Note that each row of `unemployment` represents a quarter.


_Points:_ 2

In [None]:
greatest_nei = ...
greatest_nei

In [None]:
grader.check("task_03")

### Task 04 📍

It's believed that many people became PTER (recall: "Part-Time for Economic Reasons") in the "Great Recession" of 2008-2009.  NEI-PTER is the percentage of people who are unemployed (and counted in the NEI) plus the percentage of people who are PTER.  Compute an array containing the percentage of people who were PTER in each quarter.  (The first element of the array should correspond to the first row of `unemployment`, and so on.)

*Note:* Use the original `unemployment` table for this.


_Points:_ 2

In [None]:
pter = ...
pter

In [None]:
grader.check("task_04")

### Task 05 📍

Add `pter` as a column to `unemployment` (named "PTER") and sort the resulting table by that column in descending order.  Call the table `by_pter`.

* You do not need to do this in one line of code.
* You are welcome to add and use extra variable names.
    * Make sure that you don't use a name that conflicts with the rest of the notebook.
    * Make sure that you do use the name `by_pter` for the table since we will be checking that name with the auto-grader.

_Points:_ 2

In [None]:
by_pter = ...
by_pter

In [None]:
grader.check("task_05")

### Task 06 📍

Create a new table called `pter_over_time` that adds the `year` array and the `pter` array to the `unemployment` table. Label these columns `Year` and `PTER`.


_Points:_ 2

In [None]:
year = 1994 + np.arange(by_pter.num_rows)/4
pter_over_time = ...

In [None]:
grader.check("task_06")

## Birth Rates


The following table gives census-based population estimates for each state on both July 1, 2015 and July 1, 2016. The last four columns describe the components of the estimated change in population during this time interval. **For all questions below, assume that the word "states" refers to all 52 rows including Puerto Rico & the District of Columbia.**

* The data was taken from [the US Census 2010-2016 national totals data set](http://www2.census.gov/programs-surveys/popest/datasets/2010-2016/national/totals/nst-est2016-alldata.csv).
* If you want to read more about the different column descriptions, review [the census documentation](http://www2.census.gov/programs-surveys/popest/datasets/2010-2015/national/totals/nst-est2015-alldata.pdf)!

*In this assignment, you will do some preliminary analysis of birth rates during this time period. In a future assignment, you will work on creating visualizations and summarizing your analysis.*

The raw data is a bit messy - run the cell below to clean the table and make it easier to work with.

In [None]:
# Don't change this cell; just run it.
pop = Table.read_table('nst-est2016-alldata.csv').where('SUMLEV', 40).select([1, 4, 12, 13, 27, 34, 62, 69])
pop = pop.relabeled('POPESTIMATE2015', '2015').relabeled('POPESTIMATE2016', '2016')
pop = pop.relabeled('BIRTHS2016', 'BIRTHS').relabeled('DEATHS2016', 'DEATHS')
pop = pop.relabeled('NETMIG2016', 'MIGRATION').relabeled('RESIDUAL2016', 'OTHER')
pop = pop.with_columns("REGION", np.array([int(region) if region != "X" else 0 for region in pop.column("REGION")]))
pop.set_format([2, 3, 4, 5, 6, 7], NumberFormatter(decimals=0)).show(5)

### Task 07 📍

Assign `us_birth_rate` to the total US annual birth rate during this time interval. The annual birth rate for a year-long period is the total number of births in that period as a proportion of the population size at the start of the time period.

**Hint:** Which year corresponds to the start of the time period?


_Points:_ 2

In [None]:
us_birth_rate = ...
us_birth_rate

In [None]:
grader.check("task_07")

### Task 08 📍

Assign `movers` to the number of states for which the **absolute value** of the **annual rate of migration** was higher than 1%. The annual rate of migration for a year-long period is the net number of migrations (in and out) as a proportion of the population size at the start of the period. The `MIGRATION` column contains estimated annual net migration counts by state.


_Points:_ 2

In [None]:
migration_rates = ...
movers = ...
movers

In [None]:
grader.check("task_08")

### Task 09 📍

Assign `west_births` to the total number of births that occurred in region 4 (the Western US). 

**Hint:** Make sure you double check the type of the values in the region column, and appropriately filter (i.e. the types must match!).


_Points:_ 2

In [None]:
west_births = ...
west_births

In [None]:
grader.check("task_09")

### Task 10 📍

Assign `less_than_west_births` to the number of states that had a total population in 2016 that was smaller than the *total number of births in region 4 (the Western US)* during this time interval.


_Points:_ 2

In [None]:
less_than_west_births = ...
less_than_west_births

In [None]:
grader.check("task_10")

## Submit your Homework to Canvas

Once you have finished working on the homework tasks, prepare to submit your work in Canvas by completing the following steps.

1. In the related Canvas Assignment page, check the rubric to know how you will be scored for this assignment.
2. Double-check that you have run the code cell near the end of the notebook that contains the command `"grader.check_all()"`. This command will run all of the run tests on all your responses to the auto-graded tasks marked with 📍.
3. Double-check your responses to the manually graded tasks marked with 📍🔎.
3. Select the menu item "File" and "Save Notebook" in the notebook's Toolbar to save your work and create a specific checkpoint in the notebook's work history.
4. Select the menu items "File", "Download" in the notebook's Toolbar to download the notebook (.ipynb) file. 
5. In the related Canvas Assignment page, click Start Assignment or New Attempt to upload the downloaded .ipynb file.

**Keep in mind that the autograder does not always check for correctness. Sometimes it just checks for the format of your answer, so passing the autograder for a question does not mean you got the answer correct for that question.**

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()