# Language spoken at home data downloader

This notebook downloads language spoken at home data at the ZIP Code Tabulation Area level from the Census Bureau's 2017 five-year American Community Survey. It draws from table C16001, the so-called "short form," which is less detailed as other tables but has the benefit of being available for the smaller geographies.

## Import Python tools

In [34]:
import os
import pandas as pd
import geopandas as gpd
import census_map_downloader
from census_data_downloader.tables import LanguageShortFormDownloader

In [2]:
!export CENSUS_API_KEY="2406531db87ec547327b463ef193df1afe91b80b"

Import a Census API key from the local environment. It's expected that you've visted [the Census site](https://api.census.gov/data/key_signup.html) and signed up for an account.

In [3]:
CENSUS_API_KEY = os.getenv("CENSUS_API_KEY")

Download tract-level language data

In [10]:
downloader = LanguageShortFormDownloader(CENSUS_API_KEY, data_dir='./')

In [23]:
downloader.download_tracts()

Trim the tract level census data down to Los Angeles County

In [26]:
df = pd.read_csv('processed/acs5_2017_languageshortform_tracts.csv', dtype={"geoid": str})

In [33]:
la_data = df[df.geoid.str.startswith("06037")]

Download tract shapefiles

In [27]:
census_map_downloader.TractsDownloader2010(data_dir="./").run()

Read in California tracts

In [49]:
gdf = gpd.read_file("processed/tracts_2010_ca.geojson", dtype={"geoid": "str"})

Trim tract shapefiles down to LA County

In [50]:
la_shapes = gdf[gdf.county_fips == '037']

Verify the data and shapes have the same number of rows

In [51]:
assert len(la_shapes) == len(la_data)

Join the shapes and the data

In [52]:
merged_df = la_data.merge(la_shapes, on="geoid")

Write out a GeoJSON file