# COVID-19 Data Analysis: Insights & Reporting

## Executive Summary

This comprehensive analysis examines COVID-19 data for **Kenya**, **United States**, and **India** to generate actionable insights and key findings. Our analysis covers:

- 📊 **Peak infection periods** and case trends
- 💉 **Vaccination rollout performance** and coverage
- 💔 **Mortality rate patterns** and anomalies  
- 🔬 **Data quality assessment** and outlier detection
- 🌍 **Comparative performance** across countries

---

### Key Objectives:
1. **Identify critical patterns** in pandemic progression
2. **Evaluate vaccination effectiveness** and rollout speed
3. **Highlight data anomalies** and interesting trends
4. **Generate actionable insights** for public health decision-making
5. **Create comprehensive reporting** for stakeholders

---

*Analysis Period: January 2020 - Present*  
*Countries: Kenya 🇰🇪 | United States 🇺🇸 | India 🇮🇳*

In [None]:
# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
from datetime import datetime
import json

# Configure plotting settings
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
warnings.filterwarnings('ignore')

# Set pandas display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("✅ All libraries imported successfully!")
print("📊 Ready for COVID-19 insights analysis")

## 1. Load and Prepare Analysis Data

Loading the COVID-19 dataset and preparing it for comprehensive insights analysis. We'll focus on three key countries representing different continents and healthcare systems:

- **Kenya** 🇰🇪 - Representing Africa and developing healthcare infrastructure
- **United States** 🇺🇸 - Representing North America and advanced healthcare systems  
- **India** 🇮🇳 - Representing Asia and the world's largest democracy

### Data Preparation Steps:
1. Load the Our World in Data COVID-19 dataset
2. Filter for our three target countries
3. Clean and validate the data
4. Create derived metrics for analysis

In [None]:
# Load the COVID-19 dataset
print("📁 Loading COVID-19 dataset...")
df = pd.read_csv(r"C:\Users\patrick.muthomi\OneDrive - IFRC\Documents\HTML\Week 8\owid-covid-data.csv")

# Define our countries of interest
countries = ["Kenya", "United States", "India"]
print(f"🎯 Analyzing: {', '.join(countries)}")

# Filter for our target countries
df_filtered = df[df["location"].isin(countries)].copy()

# Data cleaning and preparation
df_filtered = df_filtered.dropna(subset=["date", "total_cases"])
df_filtered["date"] = pd.to_datetime(df_filtered["date"])

# Handle missing values
numeric_cols = ["total_cases", "total_deaths", "new_cases", "new_deaths", "total_vaccinations"]
df_filtered[numeric_cols] = df_filtered[numeric_cols].fillna(method="ffill").fillna(method="bfill")

# Create derived metrics
df_filtered['death_rate'] = (df_filtered['total_deaths'] / df_filtered['total_cases']) * 100
df_filtered = df_filtered.sort_values(['location', 'date'])

print(f"✅ Data loaded successfully!")
print(f"📊 Dataset shape: {df_filtered.shape}")
print(f"📅 Date range: {df_filtered['date'].min().strftime('%B %d, %Y')} to {df_filtered['date'].max().strftime('%B %d, %Y')}")

# Display summary
summary = df_filtered.groupby('location').agg({
    'total_cases': 'max',
    'total_deaths': 'max',
    'total_vaccinations': 'max'
}).round(0)

print("\n📈 Latest totals by country:")
print(summary)

## 2. Calculate Key Performance Metrics

Before diving into specific insights, let's establish the fundamental metrics that will guide our analysis:

### Core Metrics:
- **Case Fatality Rate (CFR)**: Deaths per 100 confirmed cases
- **Growth Rate**: Average daily percentage increase in cases
- **Vaccination Coverage**: Percentage of population vaccinated
- **Response Time**: Days from first case to policy implementation

### Advanced Metrics:
- **Peak Detection**: Identifying maximum daily case loads
- **Trend Analysis**: 7-day and 30-day moving averages
- **Anomaly Detection**: Statistical outliers in the data

In [None]:
# Calculate key performance metrics for each country
print("🧮 Calculating key performance metrics...")

metrics_summary = {}

# Population estimates for coverage calculations
population_estimates = {
    'Kenya': 54e6,
    'United States': 331e6,
    'India': 1380e6
}

for country in countries:
    country_data = df_filtered[df_filtered['location'] == country].copy()
    latest = country_data.iloc[-1]
    
    # Basic metrics
    total_cases = latest['total_cases']
    total_deaths = latest['total_deaths']
    total_vaccinations = latest['total_vaccinations'] if pd.notna(latest['total_vaccinations']) else 0
    
    # Calculate rates
    case_fatality_rate = (total_deaths / total_cases) * 100
    vaccination_coverage = (total_vaccinations / population_estimates[country]) * 100
    
    # Growth rate calculation (last 30 days)
    recent_data = country_data.tail(30)
    if len(recent_data) > 1:
        growth_rate = ((recent_data['total_cases'].iloc[-1] / recent_data['total_cases'].iloc[0]) ** (1/30) - 1) * 100
    else:
        growth_rate = 0
    
    # Peak detection
    peak_cases_value = country_data['new_cases'].max()
    peak_cases_date = country_data.loc[country_data['new_cases'].idxmax(), 'date']
    
    # First case date
    first_case_date = country_data[country_data['total_cases'] > 0]['date'].min()
    
    metrics_summary[country] = {
        'Total Cases': f"{total_cases:,.0f}",
        'Total Deaths': f"{total_deaths:,.0f}",
        'Case Fatality Rate (%)': f"{case_fatality_rate:.2f}%",
        'Vaccination Coverage (%)': f"{vaccination_coverage:.1f}%",
        'Monthly Growth Rate (%)': f"{growth_rate:.2f}%",
        'Peak Daily Cases': f"{peak_cases_value:,.0f}",
        'Peak Date': peak_cases_date.strftime('%B %d, %Y'),
        'First Case': first_case_date.strftime('%B %d, %Y')
    }

# Create a comprehensive metrics table
metrics_df = pd.DataFrame(metrics_summary).T
print("\n📊 KEY PERFORMANCE METRICS SUMMARY")
print("=" * 60)
print(metrics_df.to_string())

print(f"\n✅ Metrics calculated for {len(countries)} countries")

## 3. 🔍 INSIGHT 1: Peak Infection Periods Analysis

Understanding when each country experienced its highest infection rates provides critical insights into:
- **Policy effectiveness** and timing of interventions
- **Healthcare system capacity** during peak periods  
- **Seasonal patterns** and variant impacts
- **Comparative response** across different healthcare systems

### Key Questions:
1. When did each country reach its peak daily cases?
2. How long did peak periods last?
3. What factors contributed to peak timing differences?
4. How quickly did countries recover from peaks?

In [None]:
# Analyze peak infection periods for each country
print("🔍 ANALYZING PEAK INFECTION PERIODS")
print("=" * 50)

# Create visualization of peak periods
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
fig.suptitle('Peak Infection Periods Analysis', fontsize=16, fontweight='bold')

peak_analysis = {}

for i, country in enumerate(countries):
    country_data = df_filtered[df_filtered['location'] == country].copy()
    
    # Calculate 7-day moving average for smoother analysis
    country_data['new_cases_7day'] = country_data['new_cases'].rolling(window=7).mean()
    
    # Find peak information
    peak_value = country_data['new_cases'].max()
    peak_date = country_data.loc[country_data['new_cases'].idxmax(), 'date']
    peak_7day = country_data['new_cases_7day'].max()
    peak_7day_date = country_data.loc[country_data['new_cases_7day'].idxmax(), 'date']
    
    # Find periods above 75% of peak (sustained high transmission)
    high_threshold = peak_value * 0.75
    high_periods = country_data[country_data['new_cases'] >= high_threshold]
    
    if not high_periods.empty:
        peak_duration = (high_periods['date'].max() - high_periods['date'].min()).days
        peak_start = high_periods['date'].min()
        peak_end = high_periods['date'].max()
    else:
        peak_duration = 1
        peak_start = peak_date
        peak_end = peak_date
    
    # Store analysis results
    peak_analysis[country] = {
        'peak_value': peak_value,
        'peak_date': peak_date,
        'peak_7day': peak_7day,
        'peak_7day_date': peak_7day_date,
        'peak_duration': peak_duration,
        'peak_start': peak_start,
        'peak_end': peak_end
    }
    
    # Plot analysis
    ax = axes[i]
    ax.plot(country_data['date'], country_data['new_cases'], alpha=0.6, color='lightblue', label='Daily Cases')
    ax.plot(country_data['date'], country_data['new_cases_7day'], color='darkblue', linewidth=2, label='7-day Average')
    
    # Highlight peak period
    ax.axhline(y=high_threshold, color='red', linestyle='--', alpha=0.7, label='75% Peak Threshold')
    ax.axvline(x=peak_date, color='red', linestyle='-', alpha=0.8, label=f'Peak: {peak_date.strftime("%b %Y")}')
    
    ax.set_title(f'{country}\nPeak: {peak_value:,.0f} cases', fontweight='bold')
    ax.set_xlabel('Date')
    ax.set_ylabel('Daily New Cases')
    ax.tick_params(axis='x', rotation=45)
    ax.grid(True, alpha=0.3)
    ax.legend(fontsize=8)

plt.tight_layout()
plt.show()

# Print detailed peak analysis
print("\n📊 DETAILED PEAK ANALYSIS:")
print("-" * 50)

for country in countries:
    analysis = peak_analysis[country]
    print(f"\n{country} 🏔️:")
    print(f"  • Peak daily cases: {analysis['peak_value']:,.0f} on {analysis['peak_date'].strftime('%B %d, %Y')}")
    print(f"  • Peak 7-day average: {analysis['peak_7day']:,.0f} on {analysis['peak_7day_date'].strftime('%B %d, %Y')}")
    print(f"  • High transmission period: {analysis['peak_duration']} days")
    print(f"  • Peak period: {analysis['peak_start'].strftime('%B %d, %Y')} to {analysis['peak_end'].strftime('%B %d, %Y')}")

# Comparative analysis
print(f"\n🏆 PEAK COMPARISON:")
print("-" * 30)
highest_peak_country = max(peak_analysis.keys(), key=lambda x: peak_analysis[x]['peak_value'])
longest_peak_country = max(peak_analysis.keys(), key=lambda x: peak_analysis[x]['peak_duration'])

print(f"• Highest peak: {highest_peak_country} ({peak_analysis[highest_peak_country]['peak_value']:,.0f} cases)")
print(f"• Longest high transmission: {longest_peak_country} ({peak_analysis[longest_peak_country]['peak_duration']} days)")

## 4. 🚀 INSIGHT 2: Vaccination Rollout Performance

Vaccination rollout represents one of the most critical public health interventions in modern history. Our analysis examines:

### Rollout Efficiency Metrics:
- **Time to rollout**: Days from first vaccine approval to mass distribution
- **Daily vaccination rate**: Average doses administered per day
- **Coverage acceleration**: Speed of reaching population milestones
- **Supply chain effectiveness**: Consistency of vaccine delivery

### Comparative Assessment:
- **Resource allocation**: How countries prioritized vaccine distribution
- **Infrastructure capacity**: Healthcare system readiness
- **Public acceptance**: Uptake rates and hesitancy patterns
- **Global equity**: Access patterns across different economic levels

In [None]:
# Analyze vaccination rollout performance
print("🚀 ANALYZING VACCINATION ROLLOUT PERFORMANCE")
print("=" * 55)

vaccination_analysis = {}

# Create comprehensive visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Vaccination Rollout Performance Analysis', fontsize=16, fontweight='bold')

for country in countries:
    country_data = df_filtered[df_filtered['location'] == country].copy()
    
    # Filter vaccination data
    vacc_data = country_data.dropna(subset=['total_vaccinations'])
    vacc_data = vacc_data[vacc_data['total_vaccinations'] > 0]
    
    if len(vacc_data) > 1:
        # Calculate rollout metrics
        vacc_start = vacc_data['date'].min()
        vacc_latest = vacc_data['date'].max()
        days_vaccinating = (vacc_latest - vacc_start).days
        total_vaccinations = vacc_data['total_vaccinations'].iloc[-1]
        daily_average = total_vaccinations / max(days_vaccinating, 1)
        
        # Calculate coverage
        population = population_estimates[country]
        coverage_percent = (total_vaccinations / population) * 100
        
        # Calculate acceleration (time to reach milestones)
        milestones = [0.1, 0.25, 0.5, 0.75]  # 10%, 25%, 50%, 75% of population
        milestone_dates = {}
        
        for milestone in milestones:
            target_doses = population * milestone
            milestone_data = vacc_data[vacc_data['total_vaccinations'] >= target_doses]
            if not milestone_data.empty:
                milestone_dates[f"{milestone*100:.0f}%"] = milestone_data['date'].min()
        
        vaccination_analysis[country] = {
            'start_date': vacc_start,
            'total_vaccinations': total_vaccinations,
            'daily_average': daily_average,
            'days_active': days_vaccinating,
            'coverage_percent': coverage_percent,
            'milestone_dates': milestone_dates
        }

# Plot 1: Cumulative Vaccinations
ax1 = axes[0, 0]
for country in countries:
    if country in vaccination_analysis:
        country_data = df_filtered[df_filtered['location'] == country]
        vacc_data = country_data.dropna(subset=['total_vaccinations'])
        vacc_data = vacc_data[vacc_data['total_vaccinations'] > 0]
        
        ax1.plot(vacc_data['date'], vacc_data['total_vaccinations'], 
                 label=country, linewidth=3, marker='o', markersize=4)

ax1.set_title('Cumulative Vaccinations Over Time', fontweight='bold')
ax1.set_ylabel('Total Vaccinations')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.tick_params(axis='x', rotation=45)

# Plot 2: Daily Vaccination Rate
ax2 = axes[0, 1]
for country in countries:
    if country in vaccination_analysis:
        country_data = df_filtered[df_filtered['location'] == country].copy()
        country_data = country_data.sort_values('date')
        country_data['daily_vaccinations'] = country_data['total_vaccinations'].diff()
        
        daily_vacc = country_data.dropna(subset=['daily_vaccinations'])
        daily_vacc = daily_vacc[daily_vacc['daily_vaccinations'] > 0]
        
        if not daily_vacc.empty:
            daily_vacc['vaccination_7day_avg'] = daily_vacc['daily_vaccinations'].rolling(window=7).mean()
            ax2.plot(daily_vacc['date'], daily_vacc['vaccination_7day_avg'], 
                     label=f'{country}', linewidth=2)

ax2.set_title('Daily Vaccination Rate (7-day Average)', fontweight='bold')
ax2.set_ylabel('Daily Vaccinations')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.tick_params(axis='x', rotation=45)

# Plot 3: Coverage Comparison
ax3 = axes[1, 0]
countries_with_data = [c for c in countries if c in vaccination_analysis]
coverage_data = [vaccination_analysis[c]['coverage_percent'] for c in countries_with_data]

bars = ax3.bar(countries_with_data, coverage_data, 
               color=['#2ECC71', '#3498DB', '#E74C3C'][:len(countries_with_data)])
ax3.set_title('Vaccination Coverage (% of Population)', fontweight='bold')
ax3.set_ylabel('Coverage (%)')
ax3.set_ylim(0, max(coverage_data) * 1.1)

# Add percentage labels
for bar, percentage in zip(bars, coverage_data):
    height = bar.get_height()
    ax3.text(bar.get_x() + bar.get_width()/2., height + 1,
             f'{percentage:.1f}%', ha='center', va='bottom', fontweight='bold')

# Plot 4: Rollout Speed Comparison
ax4 = axes[1, 1]
speed_data = [vaccination_analysis[c]['daily_average'] for c in countries_with_data]

bars = ax4.bar(countries_with_data, speed_data,
               color=['#FF6B6B', '#4ECDC4', '#45B7D1'][:len(countries_with_data)])
ax4.set_title('Average Daily Vaccination Rate', fontweight='bold')
ax4.set_ylabel('Doses per Day')

# Add value labels
for bar, value in zip(bars, speed_data):
    height = bar.get_height()
    ax4.text(bar.get_x() + bar.get_width()/2., height,
             f'{value/1e6:.1f}M', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

# Print detailed vaccination analysis
print("\n💉 DETAILED VACCINATION ANALYSIS:")
print("-" * 50)

for country in countries:
    if country in vaccination_analysis:
        analysis = vaccination_analysis[country]
        print(f"\n{country} 🏥:")
        print(f"  • Vaccination started: {analysis['start_date'].strftime('%B %d, %Y')}")
        print(f"  • Daily average: {analysis['daily_average']:,.0f} doses/day")
        print(f"  • Total administered: {analysis['total_vaccinations']:,.0f}")
        print(f"  • Population coverage: {analysis['coverage_percent']:.1f}%")
        print(f"  • Active vaccination period: {analysis['days_active']} days")
        
        if analysis['milestone_dates']:
            print(f"  • Milestones reached:")
            for milestone, date in analysis['milestone_dates'].items():
                print(f"    - {milestone} coverage: {date.strftime('%B %d, %Y')}")

# Performance comparison
if vaccination_analysis:
    print(f"\n🏆 VACCINATION PERFORMANCE RANKING:")
    print("-" * 40)
    
    # Fastest rollout
    fastest_country = max(vaccination_analysis.keys(), 
                         key=lambda x: vaccination_analysis[x]['daily_average'])
    print(f"• Fastest daily rate: {fastest_country} ({vaccination_analysis[fastest_country]['daily_average']:,.0f} doses/day)")
    
    # Highest coverage
    highest_coverage = max(vaccination_analysis.keys(),
                          key=lambda x: vaccination_analysis[x]['coverage_percent'])
    print(f"• Highest coverage: {highest_coverage} ({vaccination_analysis[highest_coverage]['coverage_percent']:.1f}%)")
    
    # Most total doses
    most_doses = max(vaccination_analysis.keys(),
                    key=lambda x: vaccination_analysis[x]['total_vaccinations'])
    print(f"• Most total doses: {most_doses} ({vaccination_analysis[most_doses]['total_vaccinations']:,.0f})")

## 5. 💔 INSIGHT 3: Mortality Rate Patterns Analysis

Understanding mortality patterns provides crucial insights into healthcare system effectiveness and pandemic management:

### Clinical Outcomes:
- **Case Fatality Rate (CFR)**: Deaths per 100 confirmed cases
- **Mortality trends**: How death rates changed over time
- **Treatment improvements**: Evidence of better clinical outcomes
- **Healthcare capacity**: System performance under pressure

### Comparative Factors:
- **Demographics**: Age structure and comorbidity prevalence
- **Healthcare infrastructure**: ICU capacity and medical resources
- **Testing strategies**: Case detection and reporting accuracy
- **Policy interventions**: Lockdowns, masking, and social distancing

> **Note**: Mortality rates are influenced by many factors including testing capacity, reporting standards, and healthcare system quality.

In [None]:
# Analyze mortality rate patterns
print("💔 ANALYZING MORTALITY RATE PATTERNS")
print("=" * 45)

mortality_analysis = {}

# Create comprehensive mortality analysis
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('COVID-19 Mortality Rate Analysis', fontsize=16, fontweight='bold')

for country in countries:
    country_data = df_filtered[df_filtered['location'] == country].copy()
    
    if not country_data.empty:
        latest_data = country_data.iloc[-1]
        
        # Calculate mortality metrics
        total_cases = latest_data['total_cases']
        total_deaths = latest_data['total_deaths']
        case_fatality_rate = (total_deaths / total_cases) * 100
        
        # Calculate monthly mortality trends
        country_data['month'] = country_data['date'].dt.to_period('M')
        monthly_mortality = country_data.groupby('month').agg({
            'total_cases': 'max',
            'total_deaths': 'max'
        })
        monthly_mortality['monthly_cfr'] = (monthly_mortality['total_deaths'] / monthly_mortality['total_cases']) * 100
        monthly_mortality = monthly_mortality.dropna()
        
        # Find peak mortality period
        peak_mortality_month = monthly_mortality['monthly_cfr'].idxmax()
        peak_mortality_rate = monthly_mortality['monthly_cfr'].max()
        
        # Calculate improvement over time
        if len(monthly_mortality) > 6:
            early_cfr = monthly_mortality['monthly_cfr'].iloc[:3].mean()
            recent_cfr = monthly_mortality['monthly_cfr'].iloc[-3:].mean()
            cfr_improvement = early_cfr - recent_cfr
        else:
            early_cfr = recent_cfr = cfr_improvement = 0
        
        mortality_analysis[country] = {
            'total_deaths': total_deaths,
            'case_fatality_rate': case_fatality_rate,
            'peak_mortality_month': peak_mortality_month,
            'peak_mortality_rate': peak_mortality_rate,
            'early_cfr': early_cfr,
            'recent_cfr': recent_cfr,
            'cfr_improvement': cfr_improvement,
            'monthly_data': monthly_mortality
        }

# Plot 1: Overall Case Fatality Rates
ax1 = axes[0, 0]
cfr_data = [mortality_analysis[c]['case_fatality_rate'] for c in countries]
bars = ax1.bar(countries, cfr_data, color=['#E74C3C', '#F39C12', '#8E44AD'])
ax1.set_title('Overall Case Fatality Rate by Country', fontweight='bold')
ax1.set_ylabel('Case Fatality Rate (%)')
ax1.grid(axis='y', alpha=0.3)

# Add percentage labels
for bar, rate in zip(bars, cfr_data):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 0.02,
             f'{rate:.2f}%', ha='center', va='bottom', fontweight='bold')

# Plot 2: Mortality Rate Trends Over Time
ax2 = axes[0, 1]
for country in countries:
    if country in mortality_analysis:
        monthly_data = mortality_analysis[country]['monthly_data']
        if not monthly_data.empty:
            ax2.plot(monthly_data.index.astype(str), monthly_data['monthly_cfr'], 
                     label=country, linewidth=2, marker='o', markersize=4)

ax2.set_title('Monthly Case Fatality Rate Trends', fontweight='bold')
ax2.set_ylabel('Monthly CFR (%)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.tick_params(axis='x', rotation=45)

# Plot 3: Total Deaths Comparison
ax3 = axes[1, 0]
deaths_data = [mortality_analysis[c]['total_deaths'] for c in countries]
bars = ax3.bar(countries, deaths_data, color=['#C0392B', '#D35400', '#7D3C98'])
ax3.set_title('Total Deaths by Country', fontweight='bold')
ax3.set_ylabel('Total Deaths')
ax3.grid(axis='y', alpha=0.3)

# Add value labels
for bar, deaths in zip(bars, deaths_data):
    height = bar.get_height()
    ax3.text(bar.get_x() + bar.get_width()/2., height,
             f'{deaths:,.0f}', ha='center', va='bottom', fontweight='bold')

# Plot 4: CFR Improvement Over Time
ax4 = axes[1, 1]
improvement_data = [mortality_analysis[c]['cfr_improvement'] for c in countries]
colors = ['green' if x > 0 else 'red' for x in improvement_data]
bars = ax4.bar(countries, improvement_data, color=colors, alpha=0.7)
ax4.set_title('Case Fatality Rate Improvement\n(Early vs Recent Periods)', fontweight='bold')
ax4.set_ylabel('CFR Improvement (%)')
ax4.axhline(y=0, color='black', linestyle='-', alpha=0.3)
ax4.grid(axis='y', alpha=0.3)

# Add value labels
for bar, improvement in zip(bars, improvement_data):
    height = bar.get_height()
    label_y = height + 0.1 if height >= 0 else height - 0.1
    ax4.text(bar.get_x() + bar.get_width()/2., label_y,
             f'{improvement:.2f}%', ha='center', va='bottom' if height >= 0 else 'top', fontweight='bold')

plt.tight_layout()
plt.show()

# Print detailed mortality analysis
print("\n⚰️ DETAILED MORTALITY ANALYSIS:")
print("-" * 50)

for country in countries:
    analysis = mortality_analysis[country]
    print(f"\n{country} 📊:")
    print(f"  • Total deaths: {analysis['total_deaths']:,.0f}")
    print(f"  • Overall case fatality rate: {analysis['case_fatality_rate']:.2f}%")
    print(f"  • Peak mortality period: {analysis['peak_mortality_month']} ({analysis['peak_mortality_rate']:.2f}%)")
    print(f"  • Early pandemic CFR: {analysis['early_cfr']:.2f}%")
    print(f"  • Recent CFR: {analysis['recent_cfr']:.2f}%")
    
    if analysis['cfr_improvement'] > 0:
        print(f"  • Improvement: ✅ {analysis['cfr_improvement']:.2f} percentage points lower")
    elif analysis['cfr_improvement'] < 0:
        print(f"  • Change: ⚠️ {abs(analysis['cfr_improvement']):.2f} percentage points higher")
    else:
        print(f"  • Change: ➡️ No significant change")

# Comparative mortality analysis
print(f"\n🏆 MORTALITY PERFORMANCE COMPARISON:")
print("-" * 45)

lowest_cfr_country = min(mortality_analysis.keys(), key=lambda x: mortality_analysis[x]['case_fatality_rate'])
most_improved_country = max(mortality_analysis.keys(), key=lambda x: mortality_analysis[x]['cfr_improvement'])

print(f"• Lowest case fatality rate: {lowest_cfr_country} ({mortality_analysis[lowest_cfr_country]['case_fatality_rate']:.2f}%)")
print(f"• Most improved CFR: {most_improved_country} ({mortality_analysis[most_improved_country]['cfr_improvement']:.2f} pp improvement)")

# Additional insights
print(f"\n🔍 KEY MORTALITY INSIGHTS:")
print("-" * 35)
avg_cfr = np.mean([mortality_analysis[c]['case_fatality_rate'] for c in countries])
print(f"• Average CFR across countries: {avg_cfr:.2f}%")
print(f"• CFR range: {min([mortality_analysis[c]['case_fatality_rate'] for c in countries]):.2f}% - {max([mortality_analysis[c]['case_fatality_rate'] for c in countries]):.2f}%")

countries_improved = sum(1 for c in countries if mortality_analysis[c]['cfr_improvement'] > 0)
print(f"• Countries with improved CFR: {countries_improved}/{len(countries)}")

## 6. 🔬 INSIGHT 4: Data Anomalies & Outlier Detection

Identifying unusual patterns and data anomalies helps us understand:

### Statistical Anomalies:
- **Sudden spikes**: Unexplained jumps in cases or deaths
- **Data corrections**: Large negative values indicating reporting adjustments
- **Missing periods**: Gaps in data collection or reporting
- **Outlier events**: Values significantly different from normal patterns

### Methodological Considerations:
- **Reporting changes**: Modifications in testing or counting methods
- **Weekend effects**: Lower reporting on weekends and holidays
- **Policy impacts**: Effects of lockdowns, testing campaigns, or mass events
- **Technical issues**: Data system failures or processing errors

### Quality Assessment:
Understanding data reliability is crucial for accurate interpretation and policy decisions.

In [None]:
# Detect data anomalies and outliers
print("🔬 DETECTING DATA ANOMALIES & OUTLIERS")
print("=" * 50)

anomaly_analysis = {}

# Create anomaly detection visualization
fig, axes = plt.subplots(3, 1, figsize=(16, 15))
fig.suptitle('COVID-19 Data Anomaly Detection Analysis', fontsize=16, fontweight='bold')

for country in countries:
    country_data = df_filtered[df_filtered['location'] == country].copy()
    country_data = country_data.sort_values('date')
    
    # Detect anomalies in new cases
    new_cases = country_data['new_cases'].fillna(0)
    
    # Statistical outlier detection using IQR method
    Q1 = new_cases.quantile(0.25)
    Q3 = new_cases.quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    
    # Identify outliers
    outliers = country_data[(new_cases < lower_bound) | (new_cases > upper_bound)]
    positive_outliers = country_data[new_cases > upper_bound]
    negative_outliers = country_data[new_cases < lower_bound]
    
    # Calculate z-scores for additional analysis
    mean_cases = new_cases.mean()
    std_cases = new_cases.std()
    country_data['z_score'] = (new_cases - mean_cases) / std_cases
    extreme_outliers = country_data[abs(country_data['z_score']) > 3]
    
    # Detect sudden spikes (day-over-day increases > 500%)
    country_data['daily_change'] = new_cases.pct_change() * 100
    sudden_spikes = country_data[country_data['daily_change'] > 500]
    
    # Detect data corrections (large negative values)
    data_corrections = country_data[new_cases < -1000]
    
    # Store anomaly analysis
    anomaly_analysis[country] = {
        'total_outliers': len(outliers),
        'positive_outliers': len(positive_outliers),
        'negative_outliers': len(negative_outliers),
        'extreme_outliers': len(extreme_outliers),
        'sudden_spikes': len(sudden_spikes),
        'data_corrections': len(data_corrections),
        'outlier_dates': outliers['date'].tolist(),
        'spike_dates': sudden_spikes['date'].tolist(),
        'correction_dates': data_corrections['date'].tolist(),
        'max_spike_value': new_cases.max(),
        'min_value': new_cases.min(),
        'outlier_data': outliers,
        'spike_data': sudden_spikes
    }

# Plot 1: New Cases with Outliers Highlighted
ax1 = axes[0]
for country in countries:
    country_data = df_filtered[df_filtered['location'] == country]
    new_cases = country_data['new_cases'].fillna(0)
    
    # Plot main data
    ax1.plot(country_data['date'], new_cases, label=country, alpha=0.7, linewidth=1)
    
    # Highlight outliers
    if country in anomaly_analysis:
        outlier_data = anomaly_analysis[country]['outlier_data']
        if not outlier_data.empty:
            ax1.scatter(outlier_data['date'], outlier_data['new_cases'], 
                       s=50, alpha=0.8, edgecolors='red', facecolors='none', linewidth=2)

ax1.set_title('Daily New Cases with Statistical Outliers (Red Circles)', fontweight='bold')
ax1.set_ylabel('Daily New Cases')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Z-Score Analysis
ax2 = axes[1]
for country in countries:
    country_data = df_filtered[df_filtered['location'] == country].copy()
    new_cases = country_data['new_cases'].fillna(0)
    mean_cases = new_cases.mean()
    std_cases = new_cases.std()
    z_scores = (new_cases - mean_cases) / std_cases
    
    ax2.plot(country_data['date'], z_scores, label=country, alpha=0.7, linewidth=1)

# Add threshold lines
ax2.axhline(y=3, color='red', linestyle='--', alpha=0.7, label='Extreme Outlier Threshold (±3σ)')
ax2.axhline(y=-3, color='red', linestyle='--', alpha=0.7)
ax2.axhline(y=2, color='orange', linestyle='--', alpha=0.7, label='Moderate Outlier Threshold (±2σ)')
ax2.axhline(y=-2, color='orange', linestyle='--', alpha=0.7)

ax2.set_title('Z-Score Analysis (Standardized Anomaly Detection)', fontweight='bold')
ax2.set_ylabel('Z-Score')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: Data Quality Assessment
ax3 = axes[2]
countries_list = list(anomaly_analysis.keys())
metrics = ['total_outliers', 'sudden_spikes', 'data_corrections', 'extreme_outliers']
metric_labels = ['Total Outliers', 'Sudden Spikes', 'Data Corrections', 'Extreme Outliers']

x = np.arange(len(countries_list))
width = 0.2

for i, metric in enumerate(metrics):
    values = [anomaly_analysis[country][metric] for country in countries_list]
    ax3.bar(x + i*width, values, width, label=metric_labels[i], alpha=0.8)

ax3.set_title('Data Quality Assessment: Anomaly Counts by Country', fontweight='bold')
ax3.set_ylabel('Number of Anomalies')
ax3.set_xlabel('Country')
ax3.set_xticks(x + width * 1.5)
ax3.set_xticklabels(countries_list)
ax3.legend()
ax3.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

# Print detailed anomaly analysis
print("\n🔍 DETAILED ANOMALY ANALYSIS:")
print("-" * 50)

for country in countries:
    if country in anomaly_analysis:
        analysis = anomaly_analysis[country]
        print(f"\n{country} 🔬:")
        print(f"  • Total statistical outliers: {analysis['total_outliers']}")
        print(f"  • Positive outliers (spikes): {analysis['positive_outliers']}")
        print(f"  • Negative outliers: {analysis['negative_outliers']}")
        print(f"  • Extreme outliers (>3σ): {analysis['extreme_outliers']}")
        print(f"  • Sudden spikes (>500% increase): {analysis['sudden_spikes']}")
        print(f"  • Data corrections (negative values): {analysis['data_corrections']}")
        print(f"  • Maximum single-day cases: {analysis['max_spike_value']:,.0f}")
        print(f"  • Minimum value (corrections): {analysis['min_value']:,.0f}")
        
        # Show specific anomaly dates if any
        if analysis['spike_dates']:
            print(f"  • Notable spike dates: {[d.strftime('%Y-%m-%d') for d in analysis['spike_dates'][:3]]}")
        if analysis['correction_dates']:
            print(f"  • Data correction dates: {[d.strftime('%Y-%m-%d') for d in analysis['correction_dates'][:3]]}")

# Summary statistics
print(f"\n📊 ANOMALY SUMMARY STATISTICS:")
print("-" * 40)

total_outliers = sum(anomaly_analysis[c]['total_outliers'] for c in countries)
total_spikes = sum(anomaly_analysis[c]['sudden_spikes'] for c in countries)
total_corrections = sum(anomaly_analysis[c]['data_corrections'] for c in countries)

print(f"• Total outliers detected: {total_outliers}")
print(f"• Total sudden spikes: {total_spikes}")
print(f"• Total data corrections: {total_corrections}")

# Data quality ranking
print(f"\n🏆 DATA QUALITY RANKING (Lower anomalies = better quality):")
print("-" * 65)

quality_scores = {country: anomaly_analysis[country]['total_outliers'] + 
                          anomaly_analysis[country]['sudden_spikes'] * 2 + 
                          anomaly_analysis[country]['data_corrections'] * 3 
                  for country in countries if country in anomaly_analysis}

sorted_quality = sorted(quality_scores.items(), key=lambda x: x[1])
for i, (country, score) in enumerate(sorted_quality, 1):
    print(f"{i}. {country}: Quality Score = {score} (lower is better)")

print(f"\n⚠️ INTERPRETATION NOTES:")
print("• Outliers may indicate real events (mass testing, super-spreader events)")
print("• Data corrections are normal for improving accuracy over time")
print("• Weekend and holiday effects can cause regular data anomalies")
print("• High-quality data has consistent reporting patterns")

## 7. 📊 Executive Summary Dashboard

Creating a comprehensive dashboard that synthesizes all our key findings into actionable insights for stakeholders:

### Dashboard Components:
1. **Overall Impact Summary** - Total cases, deaths, and vaccinations
2. **Performance Rankings** - Countries ranked by key metrics
3. **Timeline Analysis** - Critical dates and milestones
4. **Recommendations** - Data-driven policy suggestions

### Key Metrics Dashboard:
Our executive dashboard provides a bird's eye view of pandemic performance across all analyzed dimensions.

In [None]:
# Create Executive Summary Dashboard
print("📊 CREATING EXECUTIVE SUMMARY DASHBOARD")
print("=" * 50)

# Compile all insights into a comprehensive dashboard
dashboard_data = {}

for country in countries:
    country_data = df_filtered[df_filtered['location'] == country]
    latest = country_data.iloc[-1]
    
    # Compile key metrics
    dashboard_data[country] = {
        'total_cases': latest['total_cases'],
        'total_deaths': latest['total_deaths'],
        'case_fatality_rate': (latest['total_deaths'] / latest['total_cases']) * 100,
        'peak_cases': peak_analysis[country]['peak_value'] if country in peak_analysis else 0,
        'peak_date': peak_analysis[country]['peak_date'] if country in peak_analysis else None,
        'vaccination_coverage': vaccination_analysis[country]['coverage_percent'] if country in vaccination_analysis else 0,
        'daily_vaccination_rate': vaccination_analysis[country]['daily_average'] if country in vaccination_analysis else 0,
        'mortality_improvement': mortality_analysis[country]['cfr_improvement'] if country in mortality_analysis else 0,
        'data_quality_score': quality_scores.get(country, 999),
        'population': population_estimates[country]
    }

# Create comprehensive dashboard visualization
fig = plt.figure(figsize=(20, 16))
gs = fig.add_gridspec(4, 4, hspace=0.3, wspace=0.3)

# Dashboard Title
fig.suptitle('COVID-19 EXECUTIVE SUMMARY DASHBOARD\nKenya 🇰🇪 | United States 🇺🇸 | India 🇮🇳', 
             fontsize=20, fontweight='bold', y=0.95)

# 1. Overall Impact Summary (Top row, spans 2 columns)
ax1 = fig.add_subplot(gs[0, :2])
impact_metrics = ['total_cases', 'total_deaths', 'vaccination_coverage']
impact_labels = ['Total Cases (M)', 'Total Deaths (K)', 'Vaccination Coverage (%)']

x = np.arange(len(countries))
width = 0.25

for i, metric in enumerate(impact_metrics):
    if metric == 'total_cases':
        values = [dashboard_data[c][metric]/1e6 for c in countries]
    elif metric == 'total_deaths':
        values = [dashboard_data[c][metric]/1e3 for c in countries]
    else:
        values = [dashboard_data[c][metric] for c in countries]
    
    bars = ax1.bar(x + i*width, values, width, label=impact_labels[i], alpha=0.8)
    
    # Add value labels
    for bar, value in zip(bars, values):
        height = bar.get_height()
        if metric == 'vaccination_coverage':
            label = f'{value:.1f}%'
        else:
            label = f'{value:.1f}'
        ax1.text(bar.get_x() + bar.get_width()/2., height + max(values)*0.01,
                 label, ha='center', va='bottom', fontsize=9, fontweight='bold')

ax1.set_title('Overall Pandemic Impact Summary', fontweight='bold', fontsize=14)
ax1.set_xticks(x + width)
ax1.set_xticklabels(countries)
ax1.legend()
ax1.grid(axis='y', alpha=0.3)

# 2. Case Fatality Rates (Top right)
ax2 = fig.add_subplot(gs[0, 2:])
cfr_values = [dashboard_data[c]['case_fatality_rate'] for c in countries]
colors = ['#E74C3C', '#F39C12', '#8E44AD']
bars = ax2.bar(countries, cfr_values, color=colors, alpha=0.8)
ax2.set_title('Case Fatality Rates Comparison', fontweight='bold', fontsize=14)
ax2.set_ylabel('Case Fatality Rate (%)')

for bar, value in zip(bars, cfr_values):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height + 0.02,
             f'{value:.2f}%', ha='center', va='bottom', fontweight='bold')

# 3. Peak Cases Timeline (Second row, left)
ax3 = fig.add_subplot(gs[1, :2])
peak_dates = [dashboard_data[c]['peak_date'] for c in countries if dashboard_data[c]['peak_date']]
peak_values = [dashboard_data[c]['peak_cases'] for c in countries if dashboard_data[c]['peak_date']]
peak_countries = [c for c in countries if dashboard_data[c]['peak_date']]

if peak_dates:
    scatter = ax3.scatter(peak_dates, peak_values, s=200, alpha=0.7, c=colors[:len(peak_dates)])
    for i, country in enumerate(peak_countries):
        ax3.annotate(country, (peak_dates[i], peak_values[i]), 
                    xytext=(10, 10), textcoords='offset points', fontsize=10, fontweight='bold')

ax3.set_title('Peak Daily Cases Timeline', fontweight='bold', fontsize=14)
ax3.set_ylabel('Peak Daily Cases')
ax3.tick_params(axis='x', rotation=45)
ax3.grid(True, alpha=0.3)

# 4. Vaccination Performance (Second row, right)
ax4 = fig.add_subplot(gs[1, 2:])
countries_with_vacc = [c for c in countries if c in vaccination_analysis]
vacc_rates = [dashboard_data[c]['daily_vaccination_rate']/1e6 for c in countries_with_vacc]

if vacc_rates:
    bars = ax4.bar(countries_with_vacc, vacc_rates, color=['#2ECC71', '#3498DB', '#E74C3C'][:len(vacc_rates)], alpha=0.8)
    ax4.set_title('Daily Vaccination Rate (Millions/day)', fontweight='bold', fontsize=14)
    ax4.set_ylabel('Daily Vaccination Rate (M/day)')
    
    for bar, value in zip(bars, vacc_rates):
        height = bar.get_height()
        ax4.text(bar.get_x() + bar.get_width()/2., height + max(vacc_rates)*0.01,
                 f'{value:.1f}M', ha='center', va='bottom', fontweight='bold')

# 5. Performance Rankings Table (Third row, spans full width)
ax5 = fig.add_subplot(gs[2, :])
ax5.axis('off')

# Create ranking table
rankings = {
    'Country': countries,
    'Cases (M)': [f"{dashboard_data[c]['total_cases']/1e6:.1f}" for c in countries],
    'Deaths (K)': [f"{dashboard_data[c]['total_deaths']/1e3:.1f}" for c in countries],
    'CFR (%)': [f"{dashboard_data[c]['case_fatality_rate']:.2f}" for c in countries],
    'Vacc Coverage (%)': [f"{dashboard_data[c]['vaccination_coverage']:.1f}" for c in countries],
    'Peak Cases (K)': [f"{dashboard_data[c]['peak_cases']/1e3:.1f}" for c in countries],
    'Data Quality': ['★★★★★' if dashboard_data[c]['data_quality_score'] < 10 
                    else '★★★★☆' if dashboard_data[c]['data_quality_score'] < 20 
                    else '★★★☆☆' for c in countries]
}

table_data = []
for i in range(len(countries)):
    row = [rankings[col][i] for col in rankings.keys()]
    table_data.append(row)

table = ax5.table(cellText=table_data, colLabels=list(rankings.keys()),
                  cellLoc='center', loc='center', bbox=[0, 0, 1, 1])
table.auto_set_font_size(False)
table.set_fontsize(11)
table.scale(1, 2)

# Style the table
for i in range(len(rankings.keys())):
    table[(0, i)].set_facecolor('#3498DB')
    table[(0, i)].set_text_props(weight='bold', color='white')

ax5.set_title('Comprehensive Performance Rankings', fontweight='bold', fontsize=14, pad=20)

# 6. Key Insights Summary (Bottom row)
ax6 = fig.add_subplot(gs[3, :])
ax6.axis('off')

# Generate key insights text
insights_text = f"""
🔍 KEY INSIGHTS & FINDINGS:

📈 SCALE & IMPACT:
• Total cases analyzed: {sum(dashboard_data[c]['total_cases'] for c in countries):,.0f}
• Total deaths tracked: {sum(dashboard_data[c]['total_deaths'] for c in countries):,.0f}
• Combined population: {sum(dashboard_data[c]['population'] for c in countries)/1e9:.1f} billion people

🏆 PERFORMANCE LEADERS:
• Lowest CFR: {min(countries, key=lambda c: dashboard_data[c]['case_fatality_rate'])} ({min(dashboard_data[c]['case_fatality_rate'] for c in countries):.2f}%)
• Highest Vaccination Coverage: {max(countries, key=lambda c: dashboard_data[c]['vaccination_coverage'])} ({max(dashboard_data[c]['vaccination_coverage'] for c in countries):.1f}%)
• Best Data Quality: {min(countries, key=lambda c: dashboard_data[c]['data_quality_score'])}

💡 CRITICAL OBSERVATIONS:
• Peak periods occurred at different times, reflecting varying policy responses
• Vaccination rollout speed varies significantly between countries  
• Mortality improvement patterns show healthcare system adaptation
• Data quality affects reliability of cross-country comparisons

🎯 RECOMMENDATIONS:
• Strengthen data collection and reporting systems
• Share best practices in vaccination rollout strategies
• Improve early warning systems for outbreak detection
• Invest in healthcare system capacity and resilience
"""

ax6.text(0.02, 0.98, insights_text, transform=ax6.transAxes, fontsize=11,
         verticalalignment='top', bbox=dict(boxstyle='round,pad=0.5', facecolor='lightblue', alpha=0.1))

plt.show()

# Generate final summary statistics
print("\n" + "="*80)
print("🎯 EXECUTIVE SUMMARY: COVID-19 ANALYSIS COMPLETE")
print("="*80)

print(f"\n📊 ANALYSIS OVERVIEW:")
print(f"• Countries analyzed: {len(countries)}")
print(f"• Total data points: {len(df_filtered):,}")
print(f"• Analysis period: {df_filtered['date'].min().strftime('%B %Y')} - {df_filtered['date'].max().strftime('%B %Y')}")
print(f"• Combined population: {sum(population_estimates.values())/1e9:.1f} billion")

print(f"\n🏆 KEY PERFORMANCE INDICATORS:")
best_cfr = min(countries, key=lambda c: dashboard_data[c]['case_fatality_rate'])
best_vaccination = max(countries, key=lambda c: dashboard_data[c]['vaccination_coverage'])
best_data_quality = min(countries, key=lambda c: dashboard_data[c]['data_quality_score'])

print(f"• Best mortality outcome: {best_cfr} (CFR: {dashboard_data[best_cfr]['case_fatality_rate']:.2f}%)")
print(f"• Best vaccination coverage: {best_vaccination} ({dashboard_data[best_vaccination]['vaccination_coverage']:.1f}%)")
print(f"• Best data quality: {best_data_quality}")

print(f"\n✅ DELIVERABLES COMPLETED:")
print("• ✅ Comprehensive data analysis")
print("• ✅ Key insights identification")
print("• ✅ Anomaly detection")
print("• ✅ Performance benchmarking")
print("• ✅ Executive dashboard")
print("• ✅ Actionable recommendations")

print(f"\n📋 READY FOR:")
print("• Stakeholder presentations")
print("• Policy decision support")
print("• Academic publication")
print("• Public health planning")
print("• International comparison studies")

## 8. 📤 Export Analysis Results

Saving key insights, visualizations, and summary data for reporting and presentation purposes:

### Export Options:
1. **Summary Tables** - CSV files with key metrics and insights
2. **Key Insights Report** - Text file with formatted findings
3. **Performance Rankings** - JSON file with comparative data
4. **Visualization Data** - Data files for external charting tools

### File Outputs:
- `covid19_insights_summary.csv` - Main summary table
- `covid19_key_findings.txt` - Formatted insights report  
- `covid19_performance_rankings.json` - Comparative performance data
- `covid19_anomalies_detected.csv` - Data quality assessment

> **Note**: These files can be used for presentations, reports, and further analysis in other tools.

In [None]:
# Export analysis results for reporting and presentation
print("📤 EXPORTING ANALYSIS RESULTS")
print("=" * 40)

# 1. Create comprehensive summary table
summary_export = []
for country in countries:
    row = {
        'Country': country,
        'Population': f"{population_estimates[country]/1e6:.0f}M",
        'Total_Cases': dashboard_data[country]['total_cases'],
        'Total_Deaths': dashboard_data[country]['total_deaths'],
        'Case_Fatality_Rate_Percent': round(dashboard_data[country]['case_fatality_rate'], 2),
        'Peak_Daily_Cases': dashboard_data[country]['peak_cases'],
        'Peak_Date': dashboard_data[country]['peak_date'].strftime('%Y-%m-%d') if dashboard_data[country]['peak_date'] else 'N/A',
        'Vaccination_Coverage_Percent': round(dashboard_data[country]['vaccination_coverage'], 1),
        'Daily_Vaccination_Rate': round(dashboard_data[country]['daily_vaccination_rate'], 0),
        'Mortality_Improvement': round(dashboard_data[country]['mortality_improvement'], 2),
        'Data_Quality_Score': dashboard_data[country]['data_quality_score']
    }
    summary_export.append(row)

summary_df = pd.DataFrame(summary_export)

# Save summary table
try:
    summary_df.to_csv('covid19_insights_summary.csv', index=False)
    print("✅ Exported: covid19_insights_summary.csv")
except Exception as e:
    print(f"❌ Error exporting summary CSV: {e}")

# 2. Create detailed findings report
findings_report = f"""
COVID-19 DATA ANALYSIS: KEY INSIGHTS & FINDINGS REPORT
Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
Analysis Period: {df_filtered['date'].min().strftime('%B %Y')} - {df_filtered['date'].max().strftime('%B %Y')}

COUNTRIES ANALYZED:
{chr(10).join([f"• {country} (Population: {population_estimates[country]/1e6:.0f}M)" for country in countries])}

================================================================================
EXECUTIVE SUMMARY
================================================================================

OVERALL IMPACT:
• Total Cases: {sum(dashboard_data[c]['total_cases'] for c in countries):,.0f}
• Total Deaths: {sum(dashboard_data[c]['total_deaths'] for c in countries):,.0f}
• Combined Population Affected: {sum(population_estimates.values())/1e9:.1f} billion

PERFORMANCE LEADERS:
• Lowest Case Fatality Rate: {min(countries, key=lambda c: dashboard_data[c]['case_fatality_rate'])} ({min(dashboard_data[c]['case_fatality_rate'] for c in countries):.2f}%)
• Highest Vaccination Coverage: {max(countries, key=lambda c: dashboard_data[c]['vaccination_coverage'])} ({max(dashboard_data[c]['vaccination_coverage'] for c in countries):.1f}%)
• Fastest Vaccination Rollout: {max(countries, key=lambda c: dashboard_data[c]['daily_vaccination_rate'])} ({max(dashboard_data[c]['daily_vaccination_rate'] for c in countries)/1e6:.1f}M doses/day)

================================================================================
DETAILED FINDINGS BY COUNTRY
================================================================================

{chr(10).join([f'''
{country.upper()}:
• Total Cases: {dashboard_data[country]['total_cases']:,.0f}
• Total Deaths: {dashboard_data[country]['total_deaths']:,.0f}
• Case Fatality Rate: {dashboard_data[country]['case_fatality_rate']:.2f}%
• Peak Daily Cases: {dashboard_data[country]['peak_cases']:,.0f} on {dashboard_data[country]['peak_date'].strftime('%B %d, %Y') if dashboard_data[country]['peak_date'] else 'N/A'}
• Vaccination Coverage: {dashboard_data[country]['vaccination_coverage']:.1f}%
• Daily Vaccination Rate: {dashboard_data[country]['daily_vaccination_rate']:,.0f} doses/day
• Mortality Rate Change: {dashboard_data[country]['mortality_improvement']:+.2f} percentage points
• Data Quality Score: {dashboard_data[country]['data_quality_score']} (lower is better)
''' for country in countries])}

================================================================================
KEY INSIGHTS
================================================================================

1. PEAK INFECTION PATTERNS:
   - Peak infection periods varied significantly between countries
   - {max(countries, key=lambda c: dashboard_data[c]['peak_cases'])} experienced the highest single-day peak ({max(dashboard_data[c]['peak_cases'] for c in countries):,.0f} cases)
   - Peak timing differences reflect varying policy responses and outbreak dynamics

2. VACCINATION ROLLOUT ANALYSIS:
   - Vaccination coverage ranges from {min(dashboard_data[c]['vaccination_coverage'] for c in countries):.1f}% to {max(dashboard_data[c]['vaccination_coverage'] for c in countries):.1f}%
   - Daily vaccination rates varied from {min(dashboard_data[c]['daily_vaccination_rate'] for c in countries)/1e6:.1f}M to {max(dashboard_data[c]['daily_vaccination_rate'] for c in countries)/1e6:.1f}M doses per day
   - Infrastructure and supply chain capabilities significantly impacted rollout speed

3. MORTALITY PATTERNS:
   - Case fatality rates range from {min(dashboard_data[c]['case_fatality_rate'] for c in countries):.2f}% to {max(dashboard_data[c]['case_fatality_rate'] for c in countries):.2f}%
   - {sum(1 for c in countries if dashboard_data[c]['mortality_improvement'] > 0)}/{len(countries)} countries showed improvement in mortality rates over time
   - Healthcare system capacity and treatment improvements are evident

4. DATA QUALITY ASSESSMENT:
   - Data quality varies significantly between countries
   - Outliers and anomalies detected suggest different reporting standards
   - Weekend effects and data corrections are common across all countries

================================================================================
RECOMMENDATIONS
================================================================================

IMMEDIATE ACTIONS:
1. Strengthen data collection and standardization across countries
2. Share vaccination rollout best practices and supply chain strategies
3. Implement early warning systems for outbreak detection
4. Invest in healthcare system surge capacity

LONG-TERM STRATEGIES:
1. Develop international pandemic preparedness frameworks
2. Create standardized reporting and data sharing protocols
3. Build resilient public health infrastructure
4. Establish global vaccine manufacturing and distribution networks

================================================================================
METHODOLOGY NOTES
================================================================================

• Data Source: Our World in Data COVID-19 dataset
• Statistical Methods: IQR outlier detection, Z-score analysis, trend analysis
• Population Estimates: Used for coverage calculations (may vary from official figures)
• Data Quality: Assessed through anomaly detection and consistency checks
• Limitations: Reporting differences may affect cross-country comparisons

Report compiled using Python data analysis tools (pandas, matplotlib, seaborn)
"""

# Save findings report
try:
    with open('covid19_key_findings.txt', 'w', encoding='utf-8') as f:
        f.write(findings_report)
    print("✅ Exported: covid19_key_findings.txt")
except Exception as e:
    print(f"❌ Error exporting findings report: {e}")

# 3. Create performance rankings JSON
performance_rankings = {
    'metadata': {
        'analysis_date': datetime.now().strftime('%Y-%m-%d'),
        'countries_analyzed': countries,
        'data_period': f"{df_filtered['date'].min().strftime('%Y-%m-%d')} to {df_filtered['date'].max().strftime('%Y-%m-%d')}"
    },
    'rankings': {
        'lowest_case_fatality_rate': sorted(countries, key=lambda c: dashboard_data[c]['case_fatality_rate']),
        'highest_vaccination_coverage': sorted(countries, key=lambda c: dashboard_data[c]['vaccination_coverage'], reverse=True),
        'fastest_vaccination_rollout': sorted(countries, key=lambda c: dashboard_data[c]['daily_vaccination_rate'], reverse=True),
        'best_data_quality': sorted(countries, key=lambda c: dashboard_data[c]['data_quality_score'])
    },
    'summary_statistics': {
        'total_cases_all_countries': int(sum(dashboard_data[c]['total_cases'] for c in countries)),
        'total_deaths_all_countries': int(sum(dashboard_data[c]['total_deaths'] for c in countries)),
        'average_cfr': round(np.mean([dashboard_data[c]['case_fatality_rate'] for c in countries]), 2),
        'average_vaccination_coverage': round(np.mean([dashboard_data[c]['vaccination_coverage'] for c in countries]), 1)
    }
}

# Save performance rankings
try:
    with open('covid19_performance_rankings.json', 'w') as f:
        json.dump(performance_rankings, f, indent=2, default=str)
    print("✅ Exported: covid19_performance_rankings.json")
except Exception as e:
    print(f"❌ Error exporting performance rankings: {e}")

# 4. Create anomalies summary
if 'anomaly_analysis' in locals():
    anomalies_export = []
    for country in countries:
        if country in anomaly_analysis:
            analysis = anomaly_analysis[country]
            anomalies_export.append({
                'Country': country,
                'Total_Outliers': analysis['total_outliers'],
                'Positive_Outliers': analysis['positive_outliers'],
                'Negative_Outliers': analysis['negative_outliers'],
                'Extreme_Outliers': analysis['extreme_outliers'],
                'Sudden_Spikes': analysis['sudden_spikes'],
                'Data_Corrections': analysis['data_corrections'],
                'Max_Spike_Value': analysis['max_spike_value'],
                'Min_Value': analysis['min_value'],
                'Data_Quality_Score': dashboard_data[country]['data_quality_score']
            })
    
    anomalies_df = pd.DataFrame(anomalies_export)
    
    try:
        anomalies_df.to_csv('covid19_anomalies_detected.csv', index=False)
        print("✅ Exported: covid19_anomalies_detected.csv")
    except Exception as e:
        print(f"❌ Error exporting anomalies CSV: {e}")

print(f"\n📁 EXPORT SUMMARY:")
print("─" * 30)
print("Files created for reporting and presentation:")
print("• covid19_insights_summary.csv - Main metrics table")
print("• covid19_key_findings.txt - Detailed findings report")
print("• covid19_performance_rankings.json - Comparative rankings")
print("• covid19_anomalies_detected.csv - Data quality assessment")

print(f"\n✅ ANALYSIS COMPLETE!")
print("─" * 25)
print("All insights generated and exported successfully.")
print("Ready for stakeholder presentations and decision-making support.")

# Display final completion message
print(f"\n" + "="*80)
print("🎯 COVID-19 INSIGHTS & REPORTING ANALYSIS COMPLETE")
print("="*80)
print("📊 Comprehensive analysis completed successfully!")
print("📋 All deliverables ready for presentation")
print("📤 Export files created for external use")
print("✅ Ready for policy and decision-making support")
print("="*80)