# Overview of Data Variables and Sources

In [1]:
# Loading libraries
import os
import requests
import pandas as pd

## US Census Variables

These variables were taken from the US Census API, specifically from the [American Community Survey](https://www.census.gov/data/developers/data-sets/acs-5year.html). Where possible all variables are collected at a census track level.

In [2]:
# List of variables taken from US CENSUS bureau
census_vars = {
    "total_pop": "B01001_001E", # Total population
    "white_no_hispanic": "B01001H_001E", # White Alone, not Hispanic or Latino
    "year_built": "B25035_001E", # Median Year structure was built
    "internet_access": "B28002_013E", # Number of households with No Internet Access
    "total_households": "B09019_003E", # Total number of households
    "labor_force_rate": "S2301_C02_001E", # Labor Force Participation Rate for population 16 years or older
    #"pop_16_older": "S2301_C01_001E", # Population 16 years or older
    #"less_than_hs": "S2301_C01_032E", # Population 25 to 64 years that are Less than high school graduate
    "less_than_hs": "B06009_002E", # Population age 25+ with no HS diploma
    "insured": "S2701_C02_001E", # Insured population
    "senior_living_alone": "S1101_C01_013E", # Number of households - householder 65+ years old living alone
    "median_income": "B07011_001E", # Median income for population in the last 12 months
    "occupied_housing_units": "S2501_C01_001E", # Total number of housing units
    "owner_occupied": "S2501_C03_001E", # Number of owner occupied housing units
    "renter_occupied": "S2501_C05_001E", # Number of renter occupied housing units
    "disabled": "B18140_002E", # Population 16+ years old with any disability with earnings
    "pop_16_plus": "B18140_001E" # Population 16+ with earnings
}

## Building US Census Dataset
Please find an example of how you could build a dataset with the US Census Datasets

In [3]:
# Function that queries US Census Bureau for a specific variable and state
def query_acs(var_name, var_code, state_fips, api_key):
    
    # General format for ACS data retrieval for all census tracts in a state
    ACS_BASE_URL = 'https://api.census.gov/data/2020/acs/acs5?get=NAME,'
    ACS_SUBJECT_URL = 'https://api.census.gov/data/2020/acs/acs5/subject?get=NAME,'
    ACS_TRACTS = '&for=tract:*&in=state:'
    ACS_KEY = '&key=' + api_key
    
    # Check if this is a detailed table variable or subject table variable
    base_url = ""
    if (var_code[0] == "S"):
        base_url = ACS_SUBJECT_URL
    elif(var_code[0] == "B"):
        base_url = ACS_BASE_URL
    else:
        print(f"Error Unknown type of variable {var_code}")
        return None
    
    # Builds request for US Census API
    req_url = base_url + var_code + ACS_TRACTS + state_fips + ACS_KEY
    
    # Recieve and process response from US Census API
    resp = requests.get(req_url)
    if ((resp.status_code) != 200):
        print(f"Error response code {resp.status_code} for request url\n{req_url}")
        return None
    
    dat = resp.json()
    df = pd.DataFrame(dat[1:], columns=dat[0])
    df["GEOID"] = df['state'].astype(str) + df['county'].astype(str) + df['tract'].astype(str)
    
    # Rename variable with a name instead of code
    df = df.rename(columns={var_code: var_name})
    
    return df

In [None]:
# Read in US Census API Key
census_api_key = ""
f = open("api_keys/us_census_api_key.txt", "r")
census_api_key = f.read().rstrip("\n")
f.close()

# Directory structure
outdir = os.path.join("data", "acs")
states_outdir = os.path.join(outdir, "states")

# Create directories if needed
if not (os.path.isdir(outdir)):
    os.mkdir(outdir)
    
if not (os.path.isdir(states_outdir)):
    os.mkdir(states_outdir)

# Get list of State FIPS codes
states = pd.read_csv("data/support/state_fips.csv", dtype={"STATE": str})
states_fips = list(states["STATE"])

results = None

# Iterate through all states
for fip in states_fips:
    
    state_abbr = states[states["STATE"] == fip]["STUSAB"].values[0]
    print(f"Current State: {state_abbr}")
    
    state_results = None 
    # Iterate through all variables
    for var_name, var_code in census_vars.items():
        print(f"Current Variable: {var_name}")
        
        curr_df = query_acs(var_name, var_code, fip, census_api_key)
        
        # Collect all variables for a given state in the results df
        if not (curr_df is None):
            if not (state_results is None):
                state_results = pd.merge(state_results, curr_df[["GEOID", var_name]], on="GEOID")
            else:
                state_results = curr_df
                
    # Collect state results into overall result dataframe
    if not (state_results is None):
        
         # Write State results to file
        state_filename = f"{state_abbr}.csv"
        state_results.to_csv(os.path.join(states_outdir, state_filename), index=False)
        print(f"Wrote {state_abbr} results")
        
        if not (results is None):
            results = pd.concat([results, state_results])
        else:
            results = state_results
    
print("Done getting data")
results.to_csv(os.path.join(outdir, "acs_data.csv"), index=False)
print("Done writing data")

## Climate Variables

### National Risk Index
Climate variables were taken from the [National Risk Index](https://hazards.fema.gov/nri/data-resources) created by FEMA. Specifically downloaded the census tract level tables for the US.

## Social Variables

### Social Vulnerability Index
[Social Vulnerability Index](https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html) created by the CDC. Download the 2018 data formatted as a CSV table at the census tract level.

### Eviction Data

Used the data from the [Eviction lab](https://data-downloads.evictionlab.org/#data-for-analysis/). Download the county level proprietary data. Method for how this data was calculated is found [here](https://evictionlab.org/docs/Eviction_Lab_Methodology_Report_2022.pdf). Note that this does not offer full coverage of the US.

## Energy Variables

### Energy Burden

Energy Burden was taken from the [DOE LEAD tool](https://www.energy.gov/eere/slsc/maps/lead-tool). Choose Federal Poverty Level (FPL) for income model.

For each state:
1. Click on State
2. Click "View Census Tracts"
3. Select "None" under show borders (default is Tribal Areas)
4. Click the data download button at the bottom of the state map

All these files were then concatenated together to create a full US dataset.

### ACEEE

An overview of the ACEEE scorecard is provided [here](https://www.aceee.org/state-policy/scorecard). Full methodology for how gas and electric spending are found in this [report](https://www.aceee.org/research-report/u2201), specifically taken from tables 1 and 2.

### Community Power Scorecard

The [2022 version](https://cdn.ilsr.org/wp-content/uploads/2022/02/2022-Scorecard-Methodology-Full-Scores.pdf?_ga=2.53695528.1839514613.1661261912-1838658277.1661261912&_gl=1*1jo79qe*_ga*MTgzODY1ODI3Ny4xNjYxMjYxOTEy*_ga_M3134750WM*MTY2MTI2MTkxMS4xLjEuMTY2MTI2MTkxMS4wLjAuMA) provided by Institute for Local Self Reliance was used. Each state was assessed on 11 indicators, with a maximum of 44 points. The raw point score provided by ISLR for each state were divided by 44 to determine the % of points earned (out of 100%).

### Reliability data

Taken from [EIA Form 861](https://www.eia.gov/electricity/data/eia861/). Note this data exists at a service territory level and needs to be transformed to census tract level data. Main variables to look at are CAIDI, SAIDI and SAIFI.

### Median income of solar installers by tract

Median income for rooftop solar installers data from [Lawrence Berkeley National Lab](https://www.lbl.gov/). Note this was collected by contacting the lab directly.

### Residential rates as percentage of commercial and industrial rates

Data pulled from [EIA Electric Power Monthly](https://www.eia.gov/electricity/monthly/epm_table_grapher.php?t=epmt_5_6_a). We used December 2021 data (although May  2022 data is now available). Through excel we calculated the residential rate per kWh (in cents) as a percentage of the commercial rate per kWh (i.e. divided by) and as a percentage of the industrial rate per kWh. These two figures (residential/commerical and residential/industrial) were then averaged and reported as a single percentage, with 100% being equal rates, <100% reflecting residential rates lower than the average of C&I rates, and >100% reflecting residential rates than the average of C&I rates.

### Home Energy Affordability Gap

The Home Energy Affordability Gap's site is [here](http://www.homeenergyaffordabilitygap.com/index.html). Note the data was collected by directly contacting the firm.