## Section 1. Introduction(Under Construction)

Information regarding state vs national level
Gini Index, Education Attainment, Poverty Rate, Hate Crime Rate
Bureau Census and FBI.gov

**API pattern (per Census guide):**
1) Host + year + dataset: `https://api.census.gov/data/{year}/acs/{dataset}`
2) Variables: `?get=NAME,table name,table name`
3) Geography: `&for=state:*`
4) Key: `&key=YOUR_KEY`


## Section 2. Setup
**Import Libraries, Verify API Key, env, Session Retries**
- Load the Census API key from .env (CENSUS_API_KEY).
- Create project data folders: data/raw and data/processed.
- Create a requests session with retries + exponential backoff(to handle timeouts / 429s)
- Choose dataset and year range (ACS 1-year)

In [None]:
# Standard library imports
import os
from pathlib import Path
import sys
import time # to pause between API calls

# Third party imports
from dotenv import load_dotenv
import pandas as pd
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# --- Setup ---
# Load API key from .env
load_dotenv()
KEY = os.getenv("CENSUS_API_KEY")
if not KEY:
    raise RuntimeError("Missing CENSUS_API_KEY. Add it to .env")

# Path to data directory folders (raw and processed)
DATA_DIR = Path("data")
RAW_DIR = DATA_DIR / "raw"
PROCESSED_DIR = DATA_DIR / "processed"
DATA_DIR.mkdir(exist_ok=True)
RAW_DIR.mkdir(exist_ok=True)
PROCESSED_DIR.mkdir(exist_ok=True)

# Using ACS1 (1-year estimates) for state-level data
DATASET = "acs1"
YEARS = range(2008, 2024 + 1)
SLEEP_BETWEEN_CALLS = 0.35

# Session with retries + exponential backoff
session = requests.Session()
try:
    retries = Retry(
        total=5,
        backoff_factor=1.2,
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET"]
    )
except TypeError:
    # for older urllib3 versions
    retries = Retry(
        total=5,
        backoff_factor=1.2,
        status_forcelist=[429, 500, 502, 503, 504],
        method_whitelist=["GET"]
    )
session.mount("https://", HTTPAdapter(max_retries=retries))

# Wrapper for session.get; with timeout/retries 
def safe_get(url, params, timeout=60):
    r = session.get(url, params=params, timeout=timeout)
    return r

Key loaded: True


## Section 3. Make API Calls to Fetch Data -> State & Year 
- 2.1 Gini Index
- 2.2 Education Attainment
- 2.3 Poverty Rate
- 2.4 Hate Crime Incidents
  

----


### 3.1 Gini Index (Income Inequality)

**B19083**
- One API call per year, returns all states.
- Saved raw data to `data/raw/raw_gini_state_year.csv`.


In [196]:
print("Fetching Gini Index data...")
rows_gini, failed_years_gini, cols_gini = [], [], None

for y in YEARS:
    url = f"https://api.census.gov/data/{y}/acs/{DATASET}"
    params = {"get": "NAME,B19083_001E,B19083_001M", "for": "state:*"}
    if KEY: params["key"] = KEY
    try:
        r = safe_get(url, params)
        if r.status_code == 404:
            print(f"⚠️ {y} not available for {DATASET}; skipping.")
            continue
        r.raise_for_status()
        data = r.json()
        if cols_gini is None:
            cols_gini = data[0] + ["year"]
        for rec in data[1:]:
            rows_gini.append(rec + [y])
        print(f"✅ {y}: {len(data)-1} rows")
    except requests.RequestException as e:
        print(f"❌ {y}: {e}")
        failed_years_gini.append(y)
    finally:
        time.sleep(SLEEP_BETWEEN_CALLS)

gini_df = pd.DataFrame(rows_gini, columns=cols_gini) if rows_gini else pd.DataFrame()
if not gini_df.empty:
    gini_df.to_csv(RAW_DIR / "raw_gini_state_year.csv", index=False)
    print(f"Saved: {RAW_DIR / 'raw_gini_state_year.csv'}")

Fetching Gini Index data...
✅ 2008: 52 rows
✅ 2009: 52 rows
✅ 2010: 52 rows
✅ 2011: 52 rows
✅ 2012: 52 rows
✅ 2013: 52 rows
✅ 2014: 52 rows
✅ 2015: 52 rows
✅ 2016: 52 rows
✅ 2017: 52 rows
✅ 2018: 52 rows
✅ 2019: 52 rows
⚠️ 2020 not available for acs1; skipping.
✅ 2021: 52 rows
✅ 2022: 52 rows
✅ 2023: 52 rows
✅ 2024: 52 rows
Saved: data/raw/raw_gini_state_year.csv


### 3.2 Education Attainment(B15003)
- B15003 (Educational Attainment for the Population 25+)
- B15003_001E = total adults 25+
- B15003_022E…025E = Bachelor’s, Master’s, Professional, Doctorate
- % Bachelor’s+ = 100 × (022E+023E+024E+025E) / 001E
- Saved raw counts to data/raw/raw_education_counts_state_year.csv; later compute pct_bach_plus.

In [197]:
print("\nFetching Education Attainment data...")
rows_edu, failed_years_edu, cols_edu = [], [], None

bachelors_or_higher_vars = [f"B15003_{i:03d}E" for i in range(22, 26)]  # 22..25
get_vars = ["NAME", "B15003_001E"] + bachelors_or_higher_vars
get_vars_str = ",".join(get_vars)

for y in YEARS:
    url = f"https://api.census.gov/data/{y}/acs/{DATASET}"
    params = {"get": get_vars_str, "for": "state:*"}
    if KEY: params["key"] = KEY
    try:
        r = safe_get(url, params)
        if r.status_code == 404:
            print(f"⚠️ {y} not available for {DATASET}; skipping.")
            continue
        r.raise_for_status()
        data = r.json()
        if cols_edu is None:
            cols_edu = data[0] + ["year"]
        for rec in data[1:]:
            rows_edu.append(rec + [y])
        print(f"✅ {y}: {len(data)-1} rows")
    except requests.RequestException as e:
        print(f"❌ {y}: {e}")
        failed_years_edu.append(y)
    finally:
        time.sleep(SLEEP_BETWEEN_CALLS)

edu_df = pd.DataFrame(rows_edu, columns=cols_edu) if rows_edu else pd.DataFrame()
if not edu_df.empty:
    edu_df.to_csv(RAW_DIR / "raw_education_counts_state_year.csv", index=False)
    print(f"Saved: {RAW_DIR / 'raw_education_counts_state_year.csv'}")


Fetching Education Attainment data...
✅ 2008: 52 rows
✅ 2009: 52 rows
✅ 2010: 52 rows
✅ 2011: 52 rows
✅ 2012: 52 rows
✅ 2013: 52 rows
✅ 2014: 52 rows
✅ 2015: 52 rows
✅ 2016: 52 rows
✅ 2017: 52 rows
✅ 2018: 52 rows
✅ 2019: 52 rows
⚠️ 2020 not available for acs1; skipping.
✅ 2021: 52 rows
✅ 2022: 52 rows
✅ 2023: 52 rows
✅ 2024: 52 rows
Saved: data/raw/raw_education_counts_state_year.csv


### 3.3 Poverty Rate(B17001 -> computed percent)
- Using B17001 detailed counts (available in 2008–2009 for ACS 1-year):
- B17001_001E = universe (poverty status determined)
- B17001_002E = count below poverty level
- % in poverty = 100 × (002E / 001E)
- Saved raw counts to data/raw/raw_poverty_counts_B17001_state_year.csv; later compute pct_poverty.
*Note: S1701 percent starts ~2010 in ACS1; B17001 is used for a consistent 2008–present series.*

In [198]:
print("\nFetching Poverty (compute later from B17001 counts)...")
rows_pov, failed_years_pov, cols_pov = [], [], None

for y in YEARS:  # e.g., range(2008, 2024+1)
    url = f"https://api.census.gov/data/{y}/acs/{DATASET}"
    params = {"get": "NAME,B17001_001E,B17001_002E", "for": "state:*"}
    if KEY: params["key"] = KEY

    try:
        r = safe_get(url, params, timeout=60)
        if r.status_code == 404:
            print(f"⚠️ {y} not available for {DATASET}; skipping.")
            continue
        r.raise_for_status()
        data = r.json()
        if cols_pov is None:
            cols_pov = data[0] + ["year"]
        for rec in data[1:]:
            rows_pov.append(rec + [y])
        print(f"✅ {y}: {len(data)-1} rows")
    except requests.RequestException as e:
        print(f"❌ {y}: {e}")
        failed_years_pov.append(y)
    finally:
        time.sleep(SLEEP_BETWEEN_CALLS)

pov_counts = pd.DataFrame(rows_pov, columns=cols_pov) if rows_pov else pd.DataFrame()
if not pov_counts.empty:
    raw_path = RAW_DIR / "raw_poverty_counts_B17001_state_year.csv"
    pov_counts.to_csv(raw_path, index=False)
    print("Saved raw:", raw_path)
else:
    print("No poverty rows downloaded; check years/dataset/key.")


Fetching Poverty (compute later from B17001 counts)...
✅ 2008: 52 rows
✅ 2009: 52 rows
✅ 2010: 52 rows
✅ 2011: 52 rows
✅ 2012: 52 rows
✅ 2013: 52 rows
✅ 2014: 52 rows
✅ 2015: 52 rows
✅ 2016: 52 rows
✅ 2017: 52 rows
✅ 2018: 52 rows
✅ 2019: 52 rows
⚠️ 2020 not available for acs1; skipping.
✅ 2021: 52 rows
✅ 2022: 52 rows
✅ 2023: 52 rows
✅ 2024: 52 rows
Saved raw: data/raw/raw_poverty_counts_B17001_state_year.csv


### 3.4 Hate Crime 

In [200]:
RAW_DIR = Path("data/raw")
PROCESSED_DIR = Path("data/processed")
PROCESSED_DIR.mkdir(parents=True, exist_ok=True)

raw_path = RAW_DIR / "hate_crime.csv" 
print("Loading raw hate crime data from:", raw_path)

raw_hate_crime_df = pd.read_csv(raw_path)
print("Raw shape:", raw_hate_crime_df.shape)

# quick review
display(raw_hate_crime_df.head(10))

Loading raw hate crime data from: data/raw/hate_crime.csv
Raw shape: (265834, 28)


Unnamed: 0,incident_id,data_year,ori,pug_agency_name,pub_agency_unit,agency_type_name,state_abbr,state_name,division_name,region_name,...,offender_race,offender_ethnicity,victim_count,offense_name,total_individual_victims,location_name,bias_desc,victim_types,multiple_offense,multiple_bias
0,43,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,Not Specified,1,Aggravated Assault,1.0,Residence/Home,Anti-Black or African American,Individual,S,S
1,44,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,Not Specified,2,Aggravated Assault;Destruction/Damage/Vandalis...,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-White,Individual,M,S
2,45,1991,AR0600300,North Little Rock,,City,AR,Arkansas,West South Central,South,...,Black or African American,Not Specified,2,Aggravated Assault;Murder and Nonnegligent Man...,2.0,Residence/Home,Anti-White,Individual,M,S
3,46,1991,AR0600300,North Little Rock,,City,AR,Arkansas,West South Central,South,...,Black or African American,Not Specified,1,Intimidation,1.0,Residence/Home,Anti-White,Individual,S,S
4,47,1991,AR0670000,Sevier,,County,AR,Arkansas,West South Central,South,...,White,Not Specified,1,Intimidation,1.0,School/College,Anti-Black or African American,Individual,S,S
5,3015,1991,AR0040200,Rogers,,City,AR,Arkansas,West South Central,South,...,White,Not Specified,1,Intimidation,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-Black or African American,Individual,S,S
6,3016,1991,AR0290100,Hope,,City,AR,Arkansas,West South Central,South,...,Black or African American,Not Specified,1,Simple Assault,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-White,Individual,S,S
7,3017,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,Not Specified,1,Aggravated Assault,1.0,Service/Gas Station,Anti-White,Individual,S,S
8,3018,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,Not Specified,1,Robbery,1.0,Grocery/Supermarket,Anti-White,Individual,S,S
9,3019,1991,AR0600200,Little Rock,,City,AR,Arkansas,West South Central,South,...,Black or African American,Not Specified,1,Aggravated Assault,1.0,School/College,Anti-White,Individual,S,S


## Sources

**1. Gini Index of Income Inequality Data**
-  Source: U.S. Census Bureau, American Community Survey (ACS) 1-Year Estimates
-  ACS Table ID **B19083**
- [Census Data](https://data.census.gov/table/ACSDT1Y2024.B19083?g=040XX00US01_010XX00US&q=Gini)
  
**2. Educational Attainment Data** 
- Source: U.S. Census Bureau, ACS 1-Year Estimates
- ACS Table ID: **B15003** (Educational Attainment for 25+).  
- [Census Data](https://data.census.gov/table/ACSDT1Y2022.B15003)
  
**3. Poverty Data**
- Source: U.S. Census Bureau, ACS 1-Year Estimates
- ACS Table ID: **S1701** (Poverty Status in the Past 12 Months)
- [Census Data](https://data.census.gov/table/ACSST1Y2024.S1701?t=&text=poverty)


**4. Census Data API**
- [Census Data API Link](https://www.census.gov/data/developers/guidance/api-user-guide.Example_API_Queries.html)

**5. Census Data ACS Variable Types**
- [ACS Variable Types Link](https://www.census.gov/data/developers/data-sets/acs-1year/notes-on-acs-api-variable-types.html)
