# Census Demographic Features

As **input**, this notebook requires you to provide the path to the census files.

As **output**, this notebook requires you to define the path you'd like to save the output DataFrame to.

The last Census was 2020, meaning that the collected files from the original markup study can be used for this. Additional files are going to be scraped.

This page is additionally meant to take **input** from a geojson file\* containing at least the following hierarchy: 


{...

For Each Address:
- `state`
- `county`


Within ACS dictionary:
- `geography = 'block_group'`
  Each Key:
  - `column`
  

...}


#### ACS tables
Here you will set the tables and columns to grab for each state/county pair

In [2]:
# ACS Dictionary - taken right from the original Markup Study 
acs_tables = [
    {
        "display_name": "race_ethnicity",
        "table_name": "B03002",
        "url": "https://api.census.gov/data/2018/acs/acs5/groups/B03002.html",
        "columns": {
            "block group": "block_group",
            "B03002_001E": "race_total_estimate",
            "B03002_003E": "race_white_alone",
            "B03002_004E": "race_black_alone",
            "B03002_005E": "race_aindian_alone",
            "B03002_006E": "race_asian_alone",
            "B03002_007E": "race_pacific_islander_native_hawaiian_alone",
            "B03002_008E": "race_some_other_alone",
            "B03002_009E": "race_two_or_more_alone",
            "B03002_012E": "race_latino_alone",
        },
    },
    {
        "display_name": "household_income_2",
        "table_name": "B19013",
        "url": "https://api.census.gov/data/2018/acs/acs5/groups/B19013.html",
        "columns": {
            "block group": "block_group",
            "B19013_001E": "median_household_income",
        },
    },
    {
        "display_name": "internet_subscription",
        "table_name": "B28002",
        "url": "https://api.census.gov/data/2018/acs/acs5/groups/B28002.html",
        "columns": {
            "block group": "block_group",
            "B28002_001E": "internet_total_estimate",
            "B28002_002E": "internet_subscriptions_any",
            "B28002_004E": "internet_broadband",
            "B28002_005E": "internet_mobile",
            "B28002_006E": "internet_mobile_only",
            "B28002_013E": "internet_no_access",
        },
    },
]

## Collect State, County Pairs
From the appropriate addresses, collect all the state county pairs we want to cover

In [3]:
import glob as glob
import geopandas
import os
import pandas as pd
import requests

In [15]:
# Set Variables
address_directory = glob.glob(
    "/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/*.geojson"
)
year = 2019
api_key = ""

# Create the list of api calls
#   For each: (state, county), (geography, table_name, column name)
calls = []

for f in address_directory:
    geopandas.read_file(f)

/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/philadelphia-addresses-county.geojson
/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/city_of_boston-addresses-city.geojson
/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/city_of_new_york-addresses-city.geojson
/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/me_statewide-addresses-state.geojson
/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/wv_statewide-addresses-state.geojson
/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/city_of_nashua-addresses-city.geojson
/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/city_of_dover-addresses-city.geojson
/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/city_of_baltimore-addresses-city.geojson
/Users/abbycarr/Places_Things/investigate_NE_isp/data/open_address/original/city_of_bu

In [None]:
def download_acs(
    column: str = "B03002_001E",
    geography: str = "block group",
    year: int = 2019,
    state: int = 55,
    county: [int, str] = 1,
    data_dir: str = "data/",
    api_key: str = None,
    debug: bool = False,
) -> dict:

    def make_acs_request(year, column, geography, state, county, api_key, debug=False):
        county = str(county).zfill(3)
        url = (
            f"https://api.census.gov/data/{year}/acs/acs5?get={column}&"
            f"for={geography}:*&in=state:{state}%20county:{county}&key={api_key}"
        )
        if debug:
            print(url)
        return requests.get(url)

    # check if the file exists...
    table = column.split("_")[0]
    geography_ = geography.replace(" ", "_")
    fn_out = f"{data_dir}/{geography_}/{year}/{state}/{county}/{table}/{column}.csv.gz"
    if os.path.exists(fn_out):
        return 1
    os.makedirs(os.path.dirname(fn_out), exist_ok=True)
    print(f"collecting {fn_out}")
    # make the request
    resp = make_acs_request(year, column, geography, state, county, api_key, debug)

    # validate the response and save it
    if resp.status_code != 200:
        pd.DataFrame([]).to_csv(fn_out, index=False, compression="gzip")
        return 0
    _data = resp.json()
    _df = pd.DataFrame(_data[1:], columns=_data[0])
    _df.to_csv(fn_out, index=False, compression="gzip")
    return 1