## Colleges and Universities in the Bay Area

In [51]:
# Run this cell to set up the notebook, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines load the tests.
from client.api.assignment import load_assignment 
tests = load_assignment('bay_area_schools.ok')

The US National Center for Education Statistics compiles information about US colleges and universities in the Integrated Postsecondary Education Data System (IPEDS).  Here's a [spreadsheet](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjMocT62pHOAhUJ1GMKHenBCccQFggeMAA&url=http%3A%2F%2Fnces.ed.gov%2Fipeds%2Ftablefiles%2FtableDocs%2FIPEDS201314Tablesdoc.xlsx&usg=AFQjCNGfg7FWu8NNIQ5ABCDoUH_Yu6Tm0A&sig2=K-z7Bvv2fQSkKSHeYLtkBg) describing the tables in the IPEDS.  The full datasets are available [here](http://nces.ed.gov/ipeds/datacenter/DataFiles.aspx).

In this assignment, we'll answer the question: In 2013, how many full-time students attend colleges and universities in the various counties of the San Francisco Bay Area?  (For brevity, we're going to call them "schools," but understand that these only include post-secondary schools like colleges and universities, not other kinds of schools.)

The data we need are spread across two IPEDS tables, so we'll have to use `join` to bring them together.

First, run the cell below to load the IPEDS data.  (We've pared down the data to just a few columns for this exercise, but the original datasets are quite rich.)

In [52]:
efia = Table.read_table("efia.csv")
hd = Table.read_table("hd.csv")

The table `hd` contains information about the location of each school.

**Question 1.** The Census Bureau calls the San Francisco Bay Area the 488th Combined Statistical Area (CSA).  Create a table called `sfba_schools` that's a copy of `hd`, but containing only schools in the San Francisco Bay Area.

In [53]:
sfba_schools = ...
sfba_schools

In [54]:
_ = tests.grade('q1')

**Question 2.** What SFBA cities have at least one school in them, and how many schools do they each have?  Make a table called `bay_area_cities` with a column called "City" and a column called "Number of schools", with a row for each SFBA city.

In [55]:
sfba_cities = ...
sfba_cities.sort('Number of schools', descending=True)

In [56]:
_ = tests.grade('q2')

This doesn't tell us how many *students* go to school in each city, though.  For that, we need to know how many students attend each school.  The `efia` table has that information.  Both the `efia` and `sfba_schools` tables identify schools by their IDs in the column "Institution ID".

**Question 3.** Create a table called `schools_with_attendance` that has a row for each school that's in both `sfba_schools` and `efia`.  It should have all the columns that are in either of those tables.

In [57]:
schools_with_attendance = ...
schools_with_attendance

In [58]:
_ = tests.grade('q3')

**Question 4.** Now compute the number of full-time undergraduate students in each SFBA city.  Create a table called `students_by_city` with 1 row per city, and columns for the city's name ("City") and the number of students in that city ("Number of full-time undergraduates").

In [59]:
students_by_city = ...
students_by_city.sort(1, descending=True)

In [60]:
_ = tests.grade('q4')

In [62]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [tests.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]

In [63]:
# Run this cell to submit your work *after* you have passed all of the test cells.
# It's ok to run this cell multiple times. Only your final submission will be scored.

!TZ=America/Los_Angeles ipython nbconvert --output=".bay_area_schools_$(date +%m%d_%H%M)_submission.html" bay_area_schools.ipynb