# HRSA Health Professional Shortage Area (HPSA) Connector - Quickstart Guide

This notebook demonstrates how to use the **HRSAConnector** to analyze health professional shortage areas, medically underserved areas, and health center data from the Health Resources and Services Administration (HRSA).

**Data Source:** HRSA Data Warehouse (https://data.hrsa.gov/)

**Coverage:**
- Health Professional Shortage Areas (HPSA)
- Medically Underserved Areas/Populations (MUA/MUP)
- Federally Qualified Health Centers (FQHC)

© 2025 KR-Labs. All rights reserved.

## 1. Setup and Installation

First, ensure the KRL Data Connectors package is installed.

In [None]:
# Install the package (uncomment if needed)
# !pip install krl-data-connectors

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

from krl_data_connectors.health import HRSAConnector

# Configure plotting
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print(" Imports successful!")

## 2. Initialize Connector

The HRSA connector works with CSV files downloaded from the HRSA Data Warehouse.

In [None]:
# Initialize the connector
connector = HRSAConnector()

print("HRSA Connector initialized!")
print(f"\nAvailable HPSA Disciplines: {connector.HPSA_DISCIPLINES}")
print(f"Available HPSA Types: {connector.HPSA_TYPES}")
print(f"Available MUA Types: {connector.MUA_TYPES}")

## 3. Download Data

**Note:** You need to download HPSA, MUA, and Health Center data from https://data.hrsa.gov/data/download

For this example, we'll create sample data to demonstrate the functionality.

In [None]:
# Create sample HPSA data for demonstration
sample_hpsa = pd.DataFrame({
    'State Abbreviation': ['CA', 'CA', 'CA', 'TX', 'TX', 'NY', 'NY', 'FL', 'FL', 'IL'],
    'County Equivalent Name': ['Los Angeles', 'Fresno', 'Riverside', 'Harris', 'Dallas', 
                                'Bronx', 'Kings', 'Miami-Dade', 'Broward', 'Cook'],
    'HPSA Discipline Class': ['Primary Care', 'Dental Health', 'Mental Health', 'Primary Care', 
                               'Mental Health', 'Primary Care', 'Dental Health', 'Primary Care', 
                               'Mental Health', 'Primary Care'],
    'Designation Type': ['Geographic', 'Population', 'Geographic', 'Geographic', 'Population',
                         'Geographic', 'Geographic', 'Population', 'Geographic', 'Geographic'],
    'HPSA Score': [18, 22, 15, 20, 17, 25, 19, 14, 16, 21],
    'Rural Status': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban', 'Urban', 'Urban', 'Urban', 'Urban', 'Urban'],
    'HPSA Name': ['LA County Area 1', 'Fresno County Rural', 'Riverside Mental Health', 
                  'Harris County HPSA', 'Dallas Mental Health', 'Bronx HPSA', 'Brooklyn Dental',
                  'Miami-Dade Primary', 'Broward Mental Health', 'Cook County HPSA'],
    'Population': [125000, 45000, 89000, 156000, 78000, 198000, 112000, 67000, 92000, 175000]
})

# Save to temporary CSV
temp_dir = Path('/tmp/hrsa_demo')
temp_dir.mkdir(exist_ok=True)
hpsa_file = temp_dir / 'sample_hpsa.csv'
sample_hpsa.to_csv(hpsa_file, index=False)

print(f" Sample HPSA data created ({len(sample_hpsa)} records)")
print(f" Saved to: {hpsa_file}")

## 4. Load HPSA Data

Load Health Professional Shortage Area data to identify areas with healthcare provider shortages.

In [None]:
# Load HPSA data
hpsa_data = connector.load_hpsa_data(hpsa_file)

print(f"Loaded {len(hpsa_data)} HPSA designations\n")
print("First few records:")
hpsa_data.head()

## 5. Filter by State

Analyze health shortage areas for a specific state.

In [None]:
# Get California HPSA data
ca_hpsa = connector.get_state_data(hpsa_data, 'CA')

print(f"California has {len(ca_hpsa)} HPSA designations\n")
print("California HPSAs by discipline:")
print(ca_hpsa['hpsa_discipline_class'].value_counts())
print("\nAverage HPSA Score:", ca_hpsa['hpsa_score'].mean())

## 6. Filter by Discipline

Focus on specific healthcare disciplines (Primary Care, Dental Health, or Mental Health).

In [None]:
# Get Primary Care shortages
primary_care = connector.filter_by_discipline(hpsa_data, 'Primary Care')

print(f"Primary Care HPSAs: {len(primary_care)}")
print(f"\nStates with Primary Care shortages:")
print(primary_care['state_abbreviation'].value_counts())

# Get Mental Health shortages
mental_health = connector.filter_by_discipline(hpsa_data, 'Mental Health')
print(f"\nMental Health HPSAs: {len(mental_health)}")

## 7. Identify High-Need Areas

HPSA scores range from 0-26:
- 0-14: Moderate shortage
- 15-19: High shortage
- 20-26: Critical shortage

In [None]:
# Get critical shortage areas (score >= 20)
critical_areas = connector.get_high_need_areas(hpsa_data, score_threshold=20)

print(f"Critical shortage areas (score ≥ 20): {len(critical_areas)}")
print(f"\nCritical areas by state:")
print(critical_areas['state_abbreviation'].value_counts())

print("\nCritical shortage details:")
print(critical_areas[['hpsa_name', 'state_abbreviation', 'hpsa_discipline_class', 
                       'hpsa_score', 'population']].sort_values('hpsa_score', ascending=False))

## 8. Rural vs Urban Analysis

Compare healthcare shortages in rural vs urban areas.

In [None]:
# Get rural HPSAs
rural_hpsa = connector.get_rural_areas(hpsa_data)

print(f"Rural HPSAs: {len(rural_hpsa)}")
print(f"Urban HPSAs: {len(hpsa_data) - len(rural_hpsa)}")

# Compare average scores
rural_avg = rural_hpsa['hpsa_score'].mean()
urban_hpsa = hpsa_data[hpsa_data['rural_status'] != 'Rural']
urban_avg = urban_hpsa['hpsa_score'].mean()

print(f"\nAverage HPSA Score - Rural: {rural_avg:.2f}")
print(f"Average HPSA Score - Urban: {urban_avg:.2f}")

## 9. State-Level Summary

Generate comprehensive statistics by state.

In [None]:
# Summarize by state
state_summary = connector.summarize_by_state(
    hpsa_data,
    metrics=['hpsa_score', 'population']
)

print("State-level HPSA Summary:")
print(state_summary.sort_values('hpsa_score_mean', ascending=False))

## 10. Visualization: HPSA Scores by State

In [None]:
# Bar chart of average HPSA scores by state
fig, ax = plt.subplots(figsize=(12, 6))

state_avg = hpsa_data.groupby('state_abbreviation')['hpsa_score'].mean().sort_values(ascending=False)

bars = ax.bar(state_avg.index, state_avg.values)

# Color code by severity
colors = ['red' if x >= 20 else 'orange' if x >= 15 else 'yellow' for x in state_avg.values]
for bar, color in zip(bars, colors):
    bar.set_color(color)

ax.axhline(y=20, color='red', linestyle='--', label='Critical (≥20)', alpha=0.7)
ax.axhline(y=15, color='orange', linestyle='--', label='High Need (≥15)', alpha=0.7)

ax.set_xlabel('State', fontsize=12)
ax.set_ylabel('Average HPSA Score', fontsize=12)
ax.set_title('Average HPSA Scores by State', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

print("Legend:")
print(" Red: Critical shortage (≥20)")
print(" Orange: High need (15-19)")
print(" Yellow: Moderate shortage (<15)")

## 11. Visualization: Discipline Distribution

In [None]:
# Pie chart of HPSA disciplines
fig, ax = plt.subplots(figsize=(10, 8))

discipline_counts = hpsa_data['hpsa_discipline_class'].value_counts()

ax.pie(discipline_counts.values, labels=discipline_counts.index, autopct='%1.1f%%',
       startangle=90, colors=['#FF6B6B', '#4ECDC4', '#45B7D1'])
ax.set_title('HPSA Designations by Discipline', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("Discipline breakdown:")
for discipline, count in discipline_counts.items():
    print(f"  {discipline}: {count} ({count/len(hpsa_data)*100:.1f}%)")

## 12. County-Level Analysis

Examine specific counties for detailed shortage information.

In [None]:
# Get data for Los Angeles County
la_county = connector.get_county_data(hpsa_data, 'Los Angeles', 'CA')

if len(la_county) > 0:
    print(f"Los Angeles County, CA - HPSA Designations: {len(la_county)}\n")
    print("Shortage areas:")
    print(la_county[['hpsa_name', 'hpsa_discipline_class', 'hpsa_score', 
                     'designation_type', 'population']])
else:
    print("No HPSA data found for Los Angeles County, CA")

## 13. Population Impact Analysis

Analyze the total population affected by health professional shortages.

In [None]:
# Total population in shortage areas
total_pop = hpsa_data['population'].sum()
print(f"Total population in HPSA-designated areas: {total_pop:,}")

# By discipline
print("\nPopulation by discipline:")
for discipline in connector.HPSA_DISCIPLINES:
    disc_data = connector.filter_by_discipline(hpsa_data, discipline)
    disc_pop = disc_data['population'].sum()
    print(f"  {discipline}: {disc_pop:,}")

# Critical shortage areas
critical_pop = critical_areas['population'].sum()
print(f"\nPopulation in critical shortage areas (score ≥20): {critical_pop:,}")
print(f"Percentage of total: {critical_pop/total_pop*100:.1f}%")

## 14. Key Findings Summary

In [None]:
print("="*60)
print("HRSA HEALTH PROFESSIONAL SHORTAGE AREAS - KEY FINDINGS")
print("="*60)

print(f"\n Total HPSA Designations: {len(hpsa_data)}")
print(f" Total Population Affected: {hpsa_data['population'].sum():,}")

print("\n By Discipline:")
for discipline in connector.HPSA_DISCIPLINES:
    count = len(connector.filter_by_discipline(hpsa_data, discipline))
    print(f"   {discipline}: {count} designations")

print("\n Severity Breakdown:")
critical = len(connector.get_high_need_areas(hpsa_data, 20))
high = len(connector.get_high_need_areas(hpsa_data, 15)) - critical
moderate = len(hpsa_data) - critical - high
print(f"   Critical (≥20): {critical} ({critical/len(hpsa_data)*100:.1f}%)")
print(f"   High Need (15-19): {high} ({high/len(hpsa_data)*100:.1f}%)")
print(f"   Moderate (<15): {moderate} ({moderate/len(hpsa_data)*100:.1f}%)")

print("\n Rural vs Urban:")
rural_count = len(connector.get_rural_areas(hpsa_data))
urban_count = len(hpsa_data) - rural_count
print(f"   Rural: {rural_count} ({rural_count/len(hpsa_data)*100:.1f}%)")
print(f"   Urban: {urban_count} ({urban_count/len(hpsa_data)*100:.1f}%)")

print("\n Average HPSA Score: {:.2f}".format(hpsa_data['hpsa_score'].mean()))
print(f"   Highest Score: {hpsa_data['hpsa_score'].max()}")
print(f"   Lowest Score: {hpsa_data['hpsa_score'].min()}")

print("\n" + "="*60)

## 15. Next Steps

**Further Analysis:**
- Load MUA/MUP data using `connector.load_mua_data()`
- Load Health Center data using `connector.load_health_center_data()`
- Combine HPSA, MUA, and Health Center data for comprehensive analysis
- Create geographic visualizations using geographic coordinates

**Data Download:**
Visit https://data.hrsa.gov/data/download to download:
- HPSA data (all disciplines)
- MUA/MUP designations
- Health Center locations and services

**Documentation:**
For more information on HPSA scoring and methodology:
https://bhw.hrsa.gov/workforce-shortage-areas/hpsa-criteria

---

© 2025 KR-Labs. All rights reserved.  
KR-Labs™ is a trademark of Quipu Research Labs, LLC.