## Selecting and Ingesting Tables

My interest is in measuring the impact of two waves of immigration on my hometown:

1. Immigration from Iran that began during the Iranian Revolution in 1978
2. A subsequent wave of immigration from Asia

Now that I have a Census geography that approximates my hometown I can begin trying to get Census data on it. The key data structure in Census data is the "Table". I know of two places to browse tables:

1. [data.census.gov](http://data.census.gov). This is the official website for Census data. They have a search box where you can type anything and get a list of relevant tables. I began my search here.
2. [CensusReporter.org](http://censusreporter.org). Surprisingly, I found this website to be more helpful than data.census.gov. For example, [this](https://censusreporter.org/topics/citizenship/) page lists all tables related to Citizenship. I found the additional information they provide above and beyond raw data to be helpful.

Based on this research I believe that the following tables will be helpful to me in this project:

 * [B05012: Nativity in the United States](https://censusreporter.org/data/table/?table=B05012&geo_ids=97000US3612510&primary_geo_id=97000US3612510)
 * [B05006: Place of Birth for the Foreign-born Population](https://censusreporter.org/data/table/?table=B05006&geo_ids=97000US3612510&primary_geo_id=97000US3612510)
 * [B02001: Race](https://censusreporter.org/data/table/?table=B02001&geo_ids=97000US3612510&primary_geo_id=97000US3612510)
 * [B04006: People Reporting Ancestry](https://censusreporter.org/data/table/?table=B04006&geo_ids=97000US3612510&primary_geo_id=97000US3612510) 
 * [B16001: Language Spoken at Home by Ability to Speak English](https://censusreporter.org/tables/B16001/)


### Downloading Tables

You can download any of these tables using the `censusdis` package using the following template. Note that censusdis uses the word "group" instead of "table". (The geographic parameters used here were covered in the notebook `01-geographic-choice.ipynb`.)

In [5]:
import censusdis.data as ced

from censusdis.datasets import ACS5
from censusdis.states import NY

df = ced.download(
    dataset=ACS5,
    vintage=2020,
    group="B05012",
    state=NY,
    school_district_unified="12510",
)
df

Unnamed: 0,STATE,SCHOOL_DISTRICT_UNIFIED,B05012_001E,B05012_002E,B05012_003E,GEO_ID,NAME
0,36,12510,46046,31638,14408,9700000US3612510,"Great Neck Union Free School District, New York"


### Working with Column Names

The column names that contain data use the naming convention `<table>_<integer>E` (where the "E" stands for "Estimate"). You can get the Label for these variables using the function `ced.variables.group_tree`:

In [3]:
ced.variables.group_tree(ACS5, 2020, "B05012")["Estimate"]

+ Total: (B05012_001E)
    + Native (B05012_002E)
    + Foreign-Born (B05012_003E)

Note that censusdis does not natively support converting the column names from variables to labels. However, [this](https://github.com/censusdis/censusdis/blob/main/notebooks/Column%20Labels.ipynb) notebook contains code which does that. I copied that code into `utils.py` in this repo. Here is an example of using it:

In [4]:
from utils import name_mapper

# Note that both a `group` and `vintage` are required here, as the labels can change over time
df.rename(columns=name_mapper(group="B05012", vintage=2020))

Unnamed: 0,STATE,SCHOOL_DISTRICT_UNIFIED,Total,Native,Foreign-Born,GEO_ID,NAME
0,36,12510,46046,31638,14408,9700000US3612510,"Great Neck Union Free School District, New York"
