# Healthcare Desert Analysis - Talladega County, Alabama

## Project Focus: Intra-County Healthcare Deserts

This notebook analyzes healthcare access disparities **within Talladega County, Alabama** using:
- **IGS Data**: 8 census tracts with 35-point disparity (21-56 scores)
- **Healthcare Facilities**: Hospitals, clinics, pharmacies in Talladega County
- **Accessibility Analysis**: Distance/drive-time to nearest facilities
- **Desert Index**: Composite score combining IGS + accessibility metrics

**Key Finding**: Tract 1121010500 (IGS: 24.0) vs Tract 1121010400 (IGS: 50.6) - **26.6-point difference within same county!**

**Outputs:** 
- Healthcare facilities data for Talladega County
- Tract boundaries for mapping
- Accessibility metrics (distance to nearest facility)
- Final dataset ready for interactive mapping


In [None]:
import os
import pandas as pd
import geopandas as gpd
import numpy as np
from pathlib import Path
import requests
import zipfile
import json

# Configuration for Talladega County, Alabama
DATA_RAW = Path('data/raw')
DATA_PROCESSED = Path('data/processed')
DATA_RAW.mkdir(parents=True, exist_ok=True)
DATA_PROCESSED.mkdir(parents=True, exist_ok=True)

# Talladega County FIPS: 01121
TARGET_COUNTY = 'Talladega County, Alabama'
COUNTY_FIPS = '01121'
TARGET_TRACTS = [
    '1121010301', '1121010400', '1121010500', '1121010600',
    '1121010700', '1121010900', '1121011000', '1121011100'
]

print(f'Target: {TARGET_COUNTY}')
print(f'County FIPS: {COUNTY_FIPS}')
print(f'Tracts: {len(TARGET_TRACTS)}')
print(f'Data directories: {DATA_RAW}, {DATA_PROCESSED}')


## 1. Load IGS Data (Already Collected)

We have complete IGS data for all 8 tracts in Talladega County showing dramatic disparities.


In [None]:
# Load the IGS data we already processed
igs_df = pd.read_csv(DATA_RAW / 'igs_talladega_tracts.csv')

print('=== IGS Data Summary ===')
print(f'Shape: {igs_df.shape}')
print(f'Columns: {list(igs_df.columns)[:10]}...')

# Show tract-level analysis
tract_summary = igs_df.groupby('Census Tract FIPS code')['Inclusive Growth Score'].agg(['count', 'min', 'max', 'mean']).round(1)
tract_summary = tract_summary.sort_values('mean')

print(f'\n=== Healthcare Desert Rankings ===')
for tract, row in tract_summary.iterrows():
    status = "🔴 SEVERE DESERT" if row['mean'] < 30 else "🟡 MODERATE DESERT" if row['mean'] < 45 else "🟢 ADEQUATE ACCESS"
    print(f'Tract {tract}: {row["mean"]:.1f} avg IGS - {status}')

print(f'\n=== Key Statistics ===')
igs_scores = igs_df['Inclusive Growth Score'].dropna()
print(f'Score range: {igs_scores.min()} - {igs_scores.max()} ({igs_scores.max() - igs_scores.min()} point difference)')
print(f'Most vulnerable: Tract {tract_summary.index[0]} (avg: {tract_summary.iloc[0]["mean"]})')
print(f'Least vulnerable: Tract {tract_summary.index[-1]} (avg: {tract_summary.iloc[-1]["mean"]})')


## 2. Extract Tract Boundaries

We need to extract the tract boundaries from the TIGER/Line ZIP files for mapping.


In [None]:
# Extract tract boundaries from TIGER/Line ZIP files
def extract_tract_boundaries():
    """Extract tract boundaries for Talladega County from TIGER/Line data."""
    
    # Alabama TIGER/Line file (FIPS 01)
    al_zip = DATA_RAW / 'tl_2023_01_tract.zip'
    
    if not al_zip.exists():
        print(f"❌ TIGER/Line file not found: {al_zip}")
        print("Please download Alabama tract boundaries from: https://www2.census.gov/geo/tiger/TIGER2023/TRACT/")
        return None
    
    # Extract the ZIP file
    extract_dir = DATA_RAW / 'tiger_extract'
    extract_dir.mkdir(exist_ok=True)
    
    with zipfile.ZipFile(al_zip, 'r') as zip_ref:
        zip_ref.extractall(extract_dir)
    
    # Find the shapefile
    shp_files = list(extract_dir.glob('*.shp'))
    if not shp_files:
        print("❌ No shapefile found in extracted data")
        return None
    
    # Load the shapefile
    tracts_gdf = gpd.read_file(shp_files[0])
    
    # Filter for Talladega County (FIPS: 01121)
    talladega_tracts = tracts_gdf[tracts_gdf['COUNTYFP'] == '121'].copy()
    
    print(f"✅ Found {len(talladega_tracts)} tracts in Talladega County")
    print(f"Tract FIPS codes: {sorted(talladega_tracts['TRACTCE'].tolist())}")
    
    # Save the filtered tracts
    output_file = DATA_PROCESSED / 'talladega_tract_boundaries.geojson'
    talladega_tracts.to_file(output_file, driver='GeoJSON')
    print(f"✅ Saved tract boundaries to: {output_file}")
    
    return talladega_tracts

# Extract the boundaries
tract_boundaries = extract_tract_boundaries()


## 3. Healthcare Facilities Data Collection

**MANUAL DATA COLLECTION REQUIRED**

We need to collect healthcare facilities data for Talladega County. Here are the specific data sources and instructions:


### 3.1 Hospitals (CMS Hospital General Information)

**Website:** https://data.cms.gov/provider-data/dataset/mj5m-pzi6
**Instructions:**
1. Go to the CMS Hospital General Information dataset
2. Filter by State: Alabama
3. Filter by County: Talladega
4. Download as CSV
5. Save as: `data/raw/hospitals_talladega.csv`

**Required columns:** Facility Name, Address, City, State, ZIP, Latitude, Longitude, Facility Type

### 3.2 Clinics (HRSA Health Center Service Delivery Sites)

**Website:** https://data.hrsa.gov/data/download
**Instructions:**
1. Go to HRSA Data Warehouse
2. Select "Health Center Service Delivery Sites"
3. Filter by State: Alabama
4. Filter by County: Talladega
5. Download as CSV
6. Save as: `data/raw/clinics_talladega.csv`

**Required columns:** Site Name, Address, City, State, ZIP, Latitude, Longitude, Service Type

### 3.3 Pharmacies (CMS Part D Pharmacy Directory)

**Website:** https://data.cms.gov/provider-data/dataset/6jpm-sxkc
**Instructions:**
1. Go to CMS Part D Pharmacy Directory
2. Filter by State: Alabama
3. Filter by County: Talladega
4. Download as CSV
5. Save as: `data/raw/pharmacies_talladega.csv`

**Required columns:** Pharmacy Name, Address, City, State, ZIP, Latitude, Longitude


In [None]:
# Function to load and validate healthcare facilities data
def load_healthcare_facilities():
    """Load and validate healthcare facilities data for Talladega County."""
    
    facilities = {}
    
    # Check for each type of facility
    facility_types = {
        'hospitals': 'hospitals_talladega.csv',
        'clinics': 'clinics_talladega.csv', 
        'pharmacies': 'pharmacies_talladega.csv'
    }
    
    for facility_type, filename in facility_types.items():
        file_path = DATA_RAW / filename
        
        if file_path.exists():
            df = pd.read_csv(file_path)
            facilities[facility_type] = df
            print(f"✅ Loaded {len(df)} {facility_type} from {filename}")
        else:
            print(f"❌ Missing {facility_type} data: {filename}")
            print(f"   Please download from the instructions above")
    
    return facilities

# Load facilities data (will show missing files until you download them)
facilities_data = load_healthcare_facilities()


## 4. Accessibility Analysis

Once we have the healthcare facilities data, we'll calculate:
- Distance from each tract centroid to nearest facility
- Drive-time accessibility scores
- Healthcare desert classification


In [None]:
# Function to calculate accessibility metrics
def calculate_accessibility_metrics(tract_boundaries, facilities_data):
    """Calculate distance and accessibility metrics for each tract."""
    
    if tract_boundaries is None:
        print("❌ No tract boundaries available")
        return None
    
    if not facilities_data:
        print("❌ No facilities data available")
        return None
    
    # Calculate tract centroids
    tract_boundaries['centroid'] = tract_boundaries.geometry.centroid
    tract_boundaries['centroid_lat'] = tract_boundaries['centroid'].y
    tract_boundaries['centroid_lon'] = tract_boundaries['centroid'].x
    
    # Combine all facilities
    all_facilities = []
    for facility_type, df in facilities_data.items():
        if 'Latitude' in df.columns and 'Longitude' in df.columns:
            facilities_gdf = gpd.GeoDataFrame(
                df, 
                geometry=gpd.points_from_xy(df['Longitude'], df['Latitude']),
                crs='EPSG:4326'
            )
            facilities_gdf['facility_type'] = facility_type
            all_facilities.append(facilities_gdf)
    
    if not all_facilities:
        print("❌ No facilities with coordinates found")
        return None
    
    # Combine all facilities
    combined_facilities = pd.concat(all_facilities, ignore_index=True)
    
    print(f"✅ Found {len(combined_facilities)} total facilities")
    print(f"   - Hospitals: {len(combined_facilities[combined_facilities['facility_type'] == 'hospitals'])}")
    print(f"   - Clinics: {len(combined_facilities[combined_facilities['facility_type'] == 'clinics'])}")
    print(f"   - Pharmacies: {len(combined_facilities[combined_facilities['facility_type'] == 'pharmacies'])}")
    
    # Calculate distances (simplified - would use proper routing in production)
    accessibility_results = []
    
    for idx, tract in tract_boundaries.iterrows():
        tract_geom = tract['centroid']
        
        # Calculate distances to all facilities
        distances = []
        for _, facility in combined_facilities.iterrows():
            # Simple Euclidean distance (would use proper routing in production)
            dist = tract_geom.distance(facility.geometry) * 111  # Rough km conversion
            distances.append(dist)
        
        # Find nearest facility
        min_distance = min(distances)
        nearest_facility_idx = distances.index(min_distance)
        nearest_facility = combined_facilities.iloc[nearest_facility_idx]
        
        accessibility_results.append({
            'tract_fips': tract['GEOID'],
            'min_distance_km': min_distance,
            'nearest_facility_type': nearest_facility['facility_type'],
            'nearest_facility_name': nearest_facility.get('Facility Name', 'Unknown'),
            'is_desert': min_distance > 30  # 30km threshold
        })
    
    return pd.DataFrame(accessibility_results)

# Calculate accessibility (will work once facilities data is available)
if tract_boundaries is not None and facilities_data:
    accessibility_df = calculate_accessibility_metrics(tract_boundaries, facilities_data)
    if accessibility_df is not None:
        print(f"\n=== Accessibility Results ===")
        print(accessibility_df)
else:
    print("⏳ Waiting for facilities data to calculate accessibility metrics")


## 5. Data Collection Status & Next Steps

**Current Status:**
- ✅ **IGS Data**: Complete (8 tracts, 35-point disparity)
- ✅ **Tract Boundaries**: Ready to extract from TIGER/Line
- ❌ **Healthcare Facilities**: Need manual download
- ❌ **Accessibility Analysis**: Waiting for facilities data

**Immediate Next Steps:**
1. **Download healthcare facilities** using the instructions above
2. **Run the notebook** to extract tract boundaries and calculate accessibility
3. **Create final dataset** combining IGS + accessibility metrics
4. **Build interactive map** showing healthcare deserts

**Files to Download:**
- `hospitals_talladega.csv` (CMS Hospital General Information)
- `clinics_talladega.csv` (HRSA Health Center Service Delivery Sites)  
- `pharmacies_talladega.csv` (CMS Part D Pharmacy Directory)

Once you have these files, run the notebook to complete the data collection phase!


## Placeholders for data loaders

- IGS: manual CSV upload for now (tract FIPS, metrics)
- ACS: Census API fetch function (B27020, B19083, B19013, B17001)
- Facilities: CMS/HRSA CSVs with lat/long
- Boundaries: TIGER tract GeoJSON for MS/AL


In [None]:
from src.data.fetch_census_acs import fetch_acs_tract_data, normalize_acs_columns
from pathlib import Path
import pandas as pd

DATA_RAW = Path('data/raw')
DATA_RAW.mkdir(parents=True, exist_ok=True)

# FIPS: Alabama=01, Mississippi=28
STATE_FIPS = {"AL": "01", "MS": "28"}
ACS_VARS = {
    "B19013_001E": "median_income",
    "B19083_001E": "gini",
}

frames = []
for abbr, fips in STATE_FIPS.items():
    df = fetch_acs_tract_data(state_fips=fips, variables=list(ACS_VARS.keys()), dataset="acs/acs5", year=2023)
    df = normalize_acs_columns(df, ACS_VARS)
    df["state_abbr"] = abbr
    frames.append(df)

acs = pd.concat(frames, ignore_index=True)
acs.to_csv(DATA_RAW / 'acs_ms_al_2023.csv', index=False)
acs.head()
