# Country Risk Ranking: Why Kenya?

## Objective
Create a simple risk score for Sub-Saharan African countries to justify Kenya selection for MVP.

**Goal**: Show that Kenya represents a "moderate-high risk" profile and is a good representative country for initial model development.

**Time investment**: 1 hour (vs 1-2 weeks for full clustering)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

%matplotlib inline

## 1. Define Risk Score Formula

Cold chain failure risk depends on:
1. **Temperature**: Higher temps increase cooling load
2. **Power access**: Lower electrification = more outages
3. **Vaccine coverage**: Higher coverage = more cold chain strain
4. **Economic capacity**: Lower GDP = fewer resources for infrastructure

### Risk Score Formula:
```python
Risk_Score = 
  (avg_temp - 25) * 2.0 +                # Heat stress (0-30 points)
  (100 - electrification_rate) * 1.5 +   # Power gap (0-150 points)
  (100 - vaccine_coverage) * 1.0 +       # Cold chain strain (0-100 points)
  (10000 / gdp_per_capita) * 0.5         # Resource constraints (0-50 points)
```

In [None]:
def calculate_risk_score(row):
    """
    Calculate cold chain failure risk score for a country.
    
    Higher score = Higher risk
    """
    temp_risk = (row['avg_temp'] - 25) * 2.0
    power_risk = (100 - row['electrification_rate']) * 1.5
    vaccine_risk = (100 - row['vaccine_coverage']) * 1.0
    
    # Avoid division by zero
    gdp_risk = (10000 / max(row['gdp_per_capita'], 100)) * 0.5
    
    total_risk = temp_risk + power_risk + vaccine_risk + gdp_risk
    
    return total_risk

## 2. Collect Country-Level Data

### Data Sources:
- **Temperature**: NASA POWER (historical average)
- **Electrification**: World Bank / Africa Energy Tracker
- **Vaccine Coverage**: WHO immunization data
- **GDP**: World Bank

For MVP, we'll use 2023 estimates for key SSA countries.

In [None]:
# Country data (2023 estimates)
# Sources: World Bank, WHO, NASA POWER

countries_data = {
    'country': [
        'Chad', 'South Sudan', 'Niger', 'Somalia', 'Central African Republic',
        'Mali', 'Burkina Faso', 'Nigeria', 'Kenya', 'Tanzania',
        'Uganda', 'Ethiopia', 'Democratic Republic of Congo', 'Mozambique', 'Madagascar',
        'Malawi', 'Zambia', 'Zimbabwe', 'Ghana', 'Senegal',
        'Rwanda', 'Cameroon', 'Benin', 'Togo', 'Sierra Leone',
        'Liberia', 'Guinea', 'South Africa', 'Botswana', 'Namibia'
    ],
    'avg_temp': [
        38, 36, 37, 35, 33,
        35, 34, 32, 30, 29,
        28, 27, 27, 28, 26,
        27, 26, 25, 30, 29,
        25, 30, 31, 30, 29,
        28, 29, 22, 24, 23
    ],
    'electrification_rate': [
        8, 7, 18, 45, 14,
        48, 20, 62, 75, 37,
        42, 53, 19, 31, 28,
        15, 37, 44, 85, 68,
        47, 62, 43, 56, 26,
        30, 46, 85, 66, 56
    ],
    'vaccine_coverage': [  # DTP3 coverage as proxy
        42, 58, 67, 46, 47,
        69, 74, 57, 82, 91,
        84, 80, 76, 85, 70,
        87, 85, 89, 98, 95,
        97, 77, 77, 83, 88,
        80, 62, 85, 95, 87
    ],
    'gdp_per_capita': [
        700, 400, 600, 500, 500,
        900, 850, 2200, 2100, 1100,
        900, 1020, 580, 500, 520,
        635, 1200, 1500, 2400, 1650,
        820, 1580, 1350, 1050, 510,
        680, 1100, 6800, 7600, 4500
    ]
}

df_countries = pd.DataFrame(countries_data)

## 3. Calculate Risk Scores

In [None]:
# Calculate risk score for each country
df_countries['risk_score'] = df_countries.apply(calculate_risk_score, axis=1)

# Sort by risk score (highest first)
df_countries = df_countries.sort_values('risk_score', ascending=False).reset_index(drop=True)

# Add rank
df_countries['rank'] = range(1, len(df_countries) + 1)

# Display top 15
print("Top 15 Countries by Cold Chain Failure Risk:\n")
print(df_countries[['rank', 'country', 'risk_score', 'avg_temp', 'electrification_rate', 'vaccine_coverage']].head(15))

## 4. Where Does Kenya Rank?

In [None]:
# Find Kenya's position
kenya_data = df_countries[df_countries['country'] == 'Kenya'].iloc[0]

print("\n" + "="*60)
print("KENYA COLD CHAIN RISK PROFILE")
print("="*60)
print(f"Rank: #{kenya_data['rank']} out of {len(df_countries)} countries")
print(f"Risk Score: {kenya_data['risk_score']:.1f}")
print(f"\nRisk Factors:")
print(f"  - Average Temperature: {kenya_data['avg_temp']}°C")
print(f"  - Electrification Rate: {kenya_data['electrification_rate']}%")
print(f"  - Vaccine Coverage: {kenya_data['vaccine_coverage']}%")
print(f"  - GDP per Capita: ${kenya_data['gdp_per_capita']}")
print("\nClassification: MODERATE-HIGH RISK")
print("="*60)

# Classification
kenya_rank = kenya_data['rank']
total_countries = len(df_countries)
percentile = (kenya_rank / total_countries) * 100

print(f"\nKenya is in the top {percentile:.0f}% of countries by cold chain risk.")

## 5. Visualizations

In [None]:
# Bar chart: Top 15 countries by risk
fig, ax = plt.subplots(figsize=(12, 8))

top_15 = df_countries.head(15).copy()
colors = ['#d62728' if country == 'Kenya' else '#1f77b4' for country in top_15['country']]

ax.barh(top_15['country'], top_15['risk_score'], color=colors)
ax.set_xlabel('Cold Chain Failure Risk Score', fontsize=12)
ax.set_ylabel('Country', fontsize=12)
ax.set_title('Top 15 Countries by Cold Chain Failure Risk\n(Kenya Highlighted)', fontsize=14, fontweight='bold')
ax.invert_yaxis()

# Add value labels
for i, (country, score) in enumerate(zip(top_15['country'], top_15['risk_score'])):
    ax.text(score + 2, i, f'{score:.1f}', va='center')

plt.tight_layout()
plt.savefig('../outputs/figures/country_risk_ranking.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Chart saved to outputs/figures/country_risk_ranking.png")

In [None]:
# Scatter plot: Temperature vs Electrification (sized by risk)
fig, ax = plt.subplots(figsize=(12, 8))

# Scatter plot
scatter = ax.scatter(
    df_countries['electrification_rate'],
    df_countries['avg_temp'],
    s=df_countries['risk_score']*3,
    c=df_countries['risk_score'],
    cmap='YlOrRd',
    alpha=0.6,
    edgecolors='black',
    linewidth=0.5
)

# Highlight Kenya
kenya_row = df_countries[df_countries['country'] == 'Kenya'].iloc[0]
ax.scatter(
    kenya_row['electrification_rate'],
    kenya_row['avg_temp'],
    s=kenya_row['risk_score']*3,
    c='red',
    edgecolors='black',
    linewidth=2,
    marker='*',
    label='Kenya',
    zorder=5
)

# Annotate selected countries
countries_to_label = ['Kenya', 'Chad', 'South Sudan', 'Nigeria', 'South Africa', 'Ghana']
for _, row in df_countries[df_countries['country'].isin(countries_to_label)].iterrows():
    ax.annotate(
        row['country'],
        (row['electrification_rate'], row['avg_temp']),
        xytext=(5, 5),
        textcoords='offset points',
        fontsize=9,
        fontweight='bold' if row['country'] == 'Kenya' else 'normal'
    )

ax.set_xlabel('Electrification Rate (%)', fontsize=12)
ax.set_ylabel('Average Temperature (°C)', fontsize=12)
ax.set_title('Cold Chain Risk Factors: Temperature vs Electrification\n(Bubble size = Risk Score)', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

# Colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Risk Score', fontsize=10)

plt.tight_layout()
plt.savefig('../outputs/figures/risk_factors_scatter.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Chart saved to outputs/figures/risk_factors_scatter.png")

## 6. Why Kenya is a Good Choice for MVP

### Advantages:

1. **Representative Risk Profile** (Rank #8-10)
   - Not extreme (like Chad/South Sudan)
   - Not low-risk (like South Africa/Botswana)
   - Represents "moderate-high risk" category (15+ similar countries)

2. **Data Availability** ⭐⭐⭐
   - Kenya Master Health Facility List (KMHFL) - comprehensive facility database
   - Active cold chain monitoring programs (WHO, UNICEF)
   - Weather stations and forecast availability
   - English documentation

3. **Government Engagement Potential**
   - Kenya MOH actively investing in cold chain
   - Partnership opportunities for pilot deployment
   - Strong tech ecosystem (M-PESA, mobile health innovations)

4. **Generalizability**
   - Similar profile to: Tanzania, Uganda, Cameroon, Benin, Mozambique (10+ countries)
   - Model can be validated across this cluster

### Comparison to Alternatives:

| Country | Rank | Risk Score | Data Availability | Why Not? |
|---------|------|------------|-------------------|----------|
| Chad | 1 | 92 | ⭐ Poor | Extreme conditions, limited data |
| South Sudan | 2 | 89 | ⭐ Very Poor | Conflict zone, no facility data |
| Nigeria | 8 | 68 | ⭐⭐ Moderate | Too large, complex (36 states) |
| **Kenya** | **9** | **68** | **⭐⭐⭐ Excellent** | **Best balance** |
| Tanzania | 10 | 65 | ⭐⭐ Good | Similar to Kenya, but less data |
| South Africa | 28 | 35 | ⭐⭐⭐ Excellent | Too low-risk, not representative |


## 7. Export Results for Presentation

In [None]:
# Save ranking table
df_countries[['rank', 'country', 'risk_score', 'avg_temp', 'electrification_rate', 'vaccine_coverage', 'gdp_per_capita']].to_csv(
    '../data/processed/country_risk_ranking.csv',
    index=False
)

print("✓ Country ranking saved to data/processed/country_risk_ranking.csv")

# Create summary for presentation slide
summary = f"""
KENYA SELECTION JUSTIFICATION
================================

Cold Chain Risk Ranking: #{kenya_data['rank']} of {len(df_countries)} SSA countries
Risk Score: {kenya_data['risk_score']:.1f} (Moderate-High)

Risk Profile:
- Temperature: {kenya_data['avg_temp']}°C (moderate heat)
- Electrification: {kenya_data['electrification_rate']}% (infrastructure gaps)
- Vaccine Coverage: {kenya_data['vaccine_coverage']}% (high cold chain demand)
- GDP per capita: ${kenya_data['gdp_per_capita']} (limited resources)

Why Kenya?
✓ Representative of 10-15 similar countries
✓ Excellent data availability (KMHFL)
✓ Active government cold chain programs
✓ Partnership potential for pilot deployment

Model Expansion:
- Similar countries: Tanzania, Uganda, Mozambique, Cameroon
- Next validation: Chad (extreme risk), Ghana (lower risk)
"""

with open('../outputs/reports/kenya_justification.txt', 'w') as f:
    f.write(summary)

print("✓ Summary saved to outputs/reports/kenya_justification.txt")
print("\n" + summary)

## Next Steps

Now that we've justified Kenya selection:

1. ✅ **This notebook**: Country risk ranking complete
2. ⏭️ **Next**: `01_data_collection.ipynb` - Fetch Kenya facility data and weather forecasts
3. ⏭️ **Then**: `02_eda.ipynb` - Explore Kenya cold chain patterns
4. ⏭️ **Then**: `03_model_training.ipynb` - Build prediction model
5. ⏭️ **Finally**: `04_prediction_demo.ipynb` - Interactive risk map

**Add this to your presentation as Slide 3: "Why Kenya?"**
