# County Health Rankings & Roadmaps Connector - Quickstart Guide

This notebook demonstrates how to use the **CountyHealthRankingsConnector** to analyze county-level health outcomes, health factors, and rankings data from the County Health Rankings & Roadmaps program.

**Data Source:** County Health Rankings & Roadmaps (https://www.countyhealthrankings.org/)

**Rankings Methodology:**
- Health Outcomes (50%): Length of Life + Quality of Life
- Health Factors (50%): Health Behaviors + Clinical Care + Social/Economic + Physical Environment

© 2025 KR-Labs. All rights reserved.

## 1. Setup and Installation

First, ensure the KRL Data Connectors package is installed.

In [None]:
# Install the package (uncomment if needed)
# !pip install krl-data-connectors

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from pathlib import Path

from krl_data_connectors.health import CountyHealthRankingsConnector

# Configure plotting
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

print("✅ Imports successful!")

## 2. Initialize Connector

The County Health Rankings connector works with annual CSV releases from the CHR&R program.

In [None]:
# Initialize the connector
connector = CountyHealthRankingsConnector()

print("County Health Rankings Connector initialized!")
print(f"\nRanking categories: {len(connector.RANKING_COLUMNS)} total")
print(f"Health outcome measures: {len(connector.HEALTH_OUTCOME_MEASURES)}")
print(f"Health factor measures: {len(connector.HEALTH_FACTOR_MEASURES)}")

## 3. Create Sample Data

**Note:** Download actual data from https://www.countyhealthrankings.org/health-data

For this example, we'll create sample data to demonstrate functionality.

In [None]:
# Create sample County Health Rankings data
np.random.seed(42)

states = ['California', 'Texas', 'New York', 'Florida', 'Illinois']
counties_per_state = 6

data_list = []
for state in states:
    for i in range(1, counties_per_state + 1):
        data_list.append({
            'state': state,
            'county': f'{state[:3]} County {i}',
            'health_outcomes_rank': i,
            'health_factors_rank': np.random.randint(1, counties_per_state + 1),
            'length_of_life_rank': np.random.randint(1, counties_per_state + 1),
            'quality_of_life_rank': np.random.randint(1, counties_per_state + 1),
            'health_behaviors_rank': np.random.randint(1, counties_per_state + 1),
            'clinical_care_rank': np.random.randint(1, counties_per_state + 1),
            'social_economic_factors_rank': np.random.randint(1, counties_per_state + 1),
            'physical_environment_rank': np.random.randint(1, counties_per_state + 1),
            'premature_death': 5000 + i * 500 + np.random.randint(-200, 200),
            'poor_or_fair_health': 0.10 + i * 0.02 + np.random.uniform(-0.01, 0.01),
            'adult_smoking': 0.12 + i * 0.015 + np.random.uniform(-0.01, 0.01),
            'adult_obesity': 0.25 + i * 0.02 + np.random.uniform(-0.02, 0.02),
            'uninsured': 0.08 + i * 0.01 + np.random.uniform(-0.005, 0.005),
            'primary_care_physicians': 1500 - i * 100 + np.random.randint(-50, 50),
            'unemployment': 0.04 + i * 0.005 + np.random.uniform(-0.002, 0.002),
            'children_in_poverty': 0.15 + i * 0.02 + np.random.uniform(-0.01, 0.01),
            'high_school_graduation': 0.90 - i * 0.02 + np.random.uniform(-0.01, 0.01),
            'air_pollution_particulate_matter': 8.0 + i * 0.5 + np.random.uniform(-0.5, 0.5)
        })

sample_chr = pd.DataFrame(data_list)

# Save to temporary CSV
temp_dir = Path('/tmp/chr_demo')
temp_dir.mkdir(exist_ok=True)
chr_file = temp_dir / 'sample_chr_2025.csv'
sample_chr.to_csv(chr_file, index=False)

print(f"✅ Sample CHR data created ({len(sample_chr)} counties)")
print(f"📁 Saved to: {chr_file}")
print(f"\nStates included: {', '.join(states)}")
print(f"Counties per state: {counties_per_state}")

## 4. Load Rankings Data

Load the annual County Health Rankings data.

In [None]:
# Load CHR data
chr_data = connector.load_rankings_data(chr_file)

print(f"Loaded {len(chr_data)} county records\n")
print("First few records:")
chr_data.head()

## 5. State-Level Analysis

Analyze health rankings for a specific state.

In [None]:
# Get California data
ca_data = connector.get_state_data(chr_data, 'California')

print(f"California: {len(ca_data)} counties\n")
print("Health Outcomes Rankings:")
print(ca_data[['county', 'health_outcomes_rank', 'health_factors_rank']].head(10))

## 6. Top Performers

Identify counties with the best health outcomes (rank 1 = best).

In [None]:
# Get top 5 performers in health outcomes
top_outcomes = connector.get_top_performers(chr_data, n=5, rank_column='health_outcomes_rank')

print("Top 5 Counties - Health Outcomes:")
print(top_outcomes[['state', 'county', 'health_outcomes_rank', 'premature_death', 
                     'poor_or_fair_health']].to_string(index=False))

# Get top performers in health factors
top_factors = connector.get_top_performers(chr_data, n=5, rank_column='health_factors_rank')

print("\nTop 5 Counties - Health Factors:")
print(top_factors[['state', 'county', 'health_factors_rank', 'adult_smoking', 
                    'adult_obesity']].to_string(index=False))

## 7. Poor Performers

Identify counties in the bottom 25% (poorest health outcomes).

In [None]:
# Get poor performers (bottom 25%)
poor_outcomes = connector.get_poor_performers(chr_data, percentile=75, 
                                               rank_column='health_outcomes_rank')

print(f"Counties in bottom 25% for health outcomes: {len(poor_outcomes)}\n")
print("Worst performing counties:")
print(poor_outcomes[['state', 'county', 'health_outcomes_rank', 
                      'premature_death']].sort_values('health_outcomes_rank', 
                                                      ascending=False).head(10).to_string(index=False))

## 8. Filter by Health Measure

Find counties with high adult obesity rates.

In [None]:
# Get counties with adult obesity > 30%
high_obesity = connector.filter_by_measure(chr_data, 'adult_obesity', 0.30, above=True)

print(f"Counties with adult obesity > 30%: {len(high_obesity)}\n")
print("High obesity counties:")
print(high_obesity[['state', 'county', 'adult_obesity', 'health_outcomes_rank']]
      .sort_values('adult_obesity', ascending=False).head(10).to_string(index=False))

## 9. Compare to State Average

Compare county values to their state averages.

In [None]:
# Compare premature death to state average
comparison = connector.compare_to_state(chr_data, 'premature_death')

print("Counties vs State Average (Premature Death):")
print(comparison[['state', 'county', 'premature_death', 'state_avg', 'diff_from_avg']]
      .sort_values('diff_from_avg', ascending=False).head(10).to_string(index=False))

print("\nNote: Positive diff_from_avg means worse than state average")

## 10. Available Measures Discovery

Explore all available health measures in the dataset.

In [None]:
# Get available measures
measures = connector.get_available_measures(chr_data)

print("Available Health Measures:\n")
for category, measure_list in measures.items():
    print(f"\n{category}:")
    for measure in measure_list:
        print(f"  - {measure}")

## 11. State Summary Statistics

Generate comprehensive statistics by state.

In [None]:
# Summarize key measures by state
state_summary = connector.summarize_by_state(
    chr_data,
    measures=['premature_death', 'adult_obesity', 'uninsured', 'unemployment']
)

print("State-Level Health Summary:")
print(state_summary.round(2))

## 12. Visualization: Health Outcomes by State

In [None]:
# Box plot of health outcomes rank by state
fig, ax = plt.subplots(figsize=(14, 6))

chr_data.boxplot(column='health_outcomes_rank', by='state', ax=ax)
ax.set_title('Health Outcomes Rankings Distribution by State', fontsize=14, fontweight='bold')
ax.set_xlabel('State', fontsize=12)
ax.set_ylabel('Health Outcomes Rank (1=Best)', fontsize=12)
plt.suptitle('')  # Remove default title

plt.tight_layout()
plt.show()

print("Note: Lower rank = better health outcomes (1 is best)")

## 13. Visualization: Health Behaviors Comparison

In [None]:
# Compare smoking and obesity rates by state
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Adult smoking
state_smoking = chr_data.groupby('state')['adult_smoking'].mean().sort_values()
ax1.barh(state_smoking.index, state_smoking.values * 100, color='#E74C3C')
ax1.set_xlabel('Adult Smoking Rate (%)', fontsize=12)
ax1.set_title('Average Adult Smoking Rate by State', fontsize=13, fontweight='bold')
ax1.grid(axis='x', alpha=0.3)

# Adult obesity
state_obesity = chr_data.groupby('state')['adult_obesity'].mean().sort_values()
ax2.barh(state_obesity.index, state_obesity.values * 100, color='#F39C12')
ax2.set_xlabel('Adult Obesity Rate (%)', fontsize=12)
ax2.set_title('Average Adult Obesity Rate by State', fontsize=13, fontweight='bold')
ax2.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

## 14. Visualization: Social Determinants

In [None]:
# Scatter plot: Poverty vs Health Outcomes
fig, ax = plt.subplots(figsize=(12, 8))

for state in chr_data['state'].unique():
    state_data = chr_data[chr_data['state'] == state]
    ax.scatter(state_data['children_in_poverty'] * 100, 
               state_data['health_outcomes_rank'],
               label=state, alpha=0.7, s=100)

ax.set_xlabel('Children in Poverty (%)', fontsize=12)
ax.set_ylabel('Health Outcomes Rank (1=Best)', fontsize=12)
ax.set_title('Child Poverty vs Health Outcomes', fontsize=14, fontweight='bold')
ax.legend(loc='best', framealpha=0.9)
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate correlation
corr = chr_data[['children_in_poverty', 'health_outcomes_rank']].corr().iloc[0, 1]
print(f"\nCorrelation: {corr:.3f}")
print("Note: Positive correlation suggests higher poverty associated with worse health outcomes")

## 15. Heatmap: State Health Metrics

In [None]:
# Create heatmap of average values by state
metrics = ['adult_smoking', 'adult_obesity', 'uninsured', 'unemployment', 'children_in_poverty']
state_metrics = chr_data.groupby('state')[metrics].mean()

# Normalize to percentages
state_metrics_pct = state_metrics * 100

fig, ax = plt.subplots(figsize=(12, 6))
sns.heatmap(state_metrics_pct.T, annot=True, fmt='.1f', cmap='RdYlGn_r', 
            cbar_kws={'label': 'Percentage (%)'}, ax=ax)
ax.set_title('Health Metrics Heatmap by State', fontsize=14, fontweight='bold')
ax.set_xlabel('State', fontsize=12)
ax.set_ylabel('Health Metric', fontsize=12)

plt.tight_layout()
plt.show()

print("Note: Red = higher (worse), Green = lower (better)")

## 16. County-Specific Deep Dive

Examine a specific county in detail.

In [None]:
# Get specific county data
county_data = connector.get_county_data(chr_data, 'Cal County 1', 'California')

if len(county_data) > 0:
    county_row = county_data.iloc[0]
    
    print(f"\n{'='*60}")
    print(f"COUNTY HEALTH PROFILE: {county_row['county']}, {county_row['state']}")
    print(f"{'='*60}\n")
    
    print("RANKINGS:")
    print(f"  Health Outcomes Rank: #{county_row['health_outcomes_rank']}")
    print(f"  Health Factors Rank: #{county_row['health_factors_rank']}")
    
    print("\nHEALTH OUTCOMES:")
    print(f"  Premature Death: {county_row['premature_death']:.0f} per 100,000")
    print(f"  Poor/Fair Health: {county_row['poor_or_fair_health']:.1%}")
    
    print("\nHEALTH BEHAVIORS:")
    print(f"  Adult Smoking: {county_row['adult_smoking']:.1%}")
    print(f"  Adult Obesity: {county_row['adult_obesity']:.1%}")
    
    print("\nCLINICAL CARE:")
    print(f"  Uninsured: {county_row['uninsured']:.1%}")
    print(f"  Primary Care Physicians: {county_row['primary_care_physicians']:.0f} per 100,000")
    
    print("\nSOCIAL & ECONOMIC:")
    print(f"  Unemployment: {county_row['unemployment']:.1%}")
    print(f"  Children in Poverty: {county_row['children_in_poverty']:.1%}")
    print(f"  HS Graduation: {county_row['high_school_graduation']:.1%}")
    
    print(f"\n{'='*60}")
else:
    print("County not found in dataset")

## 17. Trends Analysis (Multi-Year)

If you have multi-year data, use `load_trends_data()` to analyze changes over time.

In [None]:
# Note: This requires multi-year CHR data files
# Example usage:
# trends_data = connector.load_trends_data('chr_trends_2015_2025.csv')
# 
# Analyze trends:
# - Adult smoking rates over time
# - Obesity trends
# - Uninsured rate changes
# - Health outcome improvements/declines

print("Multi-year trends analysis:")
print("- Download annual CHR data for multiple years")
print("- Load each year separately or use combined trends file")
print("- Track county rankings changes")
print("- Identify improving/declining counties")
print("- Analyze policy impact over time")

## 18. Key Findings Summary

In [None]:
print("="*70)
print("COUNTY HEALTH RANKINGS - KEY FINDINGS SUMMARY")
print("="*70)

print(f"\n📊 Total Counties Analyzed: {len(chr_data)}")
print(f"📍 States Covered: {chr_data['state'].nunique()}")

print("\n🏥 National Averages (Sample Data):")
print(f"   Premature Death: {chr_data['premature_death'].mean():.0f} per 100,000")
print(f"   Poor/Fair Health: {chr_data['poor_or_fair_health'].mean():.1%}")
print(f"   Adult Smoking: {chr_data['adult_smoking'].mean():.1%}")
print(f"   Adult Obesity: {chr_data['adult_obesity'].mean():.1%}")
print(f"   Uninsured Rate: {chr_data['uninsured'].mean():.1%}")
print(f"   Unemployment: {chr_data['unemployment'].mean():.1%}")
print(f"   Children in Poverty: {chr_data['children_in_poverty'].mean():.1%}")

print("\n🎯 Best Performing State (Avg Health Outcomes Rank):")
best_state = chr_data.groupby('state')['health_outcomes_rank'].mean().idxmin()
best_rank = chr_data.groupby('state')['health_outcomes_rank'].mean().min()
print(f"   {best_state} (Avg Rank: {best_rank:.1f})")

print("\n⚠️ Areas Needing Improvement:")
high_obesity_pct = (chr_data['adult_obesity'] > 0.30).sum() / len(chr_data) * 100
high_uninsured_pct = (chr_data['uninsured'] > 0.10).sum() / len(chr_data) * 100
print(f"   Counties with >30% obesity: {high_obesity_pct:.1f}%")
print(f"   Counties with >10% uninsured: {high_uninsured_pct:.1f}%")

print("\n📈 Correlation Insights:")
corr_poverty_health = chr_data[['children_in_poverty', 'health_outcomes_rank']].corr().iloc[0,1]
corr_obesity_health = chr_data[['adult_obesity', 'health_outcomes_rank']].corr().iloc[0,1]
print(f"   Poverty ↔ Health Outcomes: {corr_poverty_health:.3f}")
print(f"   Obesity ↔ Health Outcomes: {corr_obesity_health:.3f}")

print("\n" + "="*70)

## 19. Next Steps

**Further Analysis:**
- Download actual CHR data from https://www.countyhealthrankings.org/health-data
- Load multi-year data to analyze trends
- Compare urban vs rural counties
- Analyze specific health measures in detail
- Create geographic visualizations (choropleth maps)
- Correlate with other data sources (HRSA, EPA, Census)

**Data Documentation:**
- Methodology: https://www.countyhealthrankings.org/health-data/methodology
- Data Dictionary: https://www.countyhealthrankings.org/health-data/data-dictionary
- Reports: https://www.countyhealthrankings.org/reports

**Ranking Weights:**
- Health Outcomes (50%): Length of Life (50%) + Quality of Life (50%)
- Health Factors (50%):
  - Health Behaviors: 30%
  - Clinical Care: 20%
  - Social & Economic Factors: 40%
  - Physical Environment: 10%

---

© 2025 KR-Labs. All rights reserved.  
KR-Labs™ is a trademark of Quipu Research Labs, LLC.