# COVID-19 Global Data Analysis
**Author:** Rensee Gajipara  
**GitHub:** [github.com/RENSEE-GAJIPARA](https://github.com/RENSEE-GAJIPARA)  
**Tools:** Python, Pandas, Matplotlib, Seaborn  
**Countries:** France, Italy, Russia, South Africa, Australia  
**Period:** January 2020 – December 2023

---

## Objective
Analyse COVID-19 trends across 5 countries covering:
- Weekly cases and deaths over time
- Population-adjusted comparisons
- Vaccination rollout progress
- Case Fatality Rate (CFR)
- Recovery rate trends *(new analysis)*

---
## Step 1: Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import os
import warnings
warnings.filterwarnings('ignore')

sns.set_theme(style='whitegrid', palette='muted')
COLORS = ['#2C7BB6', '#D7191C', '#1A9641', '#FDAE61', '#7B2D8B']
plt.rcParams.update({'font.family': 'DejaVu Sans', 'figure.dpi': 150})

print('Libraries loaded successfully!')
print(f'Pandas: {pd.__version__}')

---
## Step 2: Load & Explore Dataset

In [None]:
df = pd.read_csv('covid19_global_data.csv', parse_dates=['Date'])

print(f'Shape       : {df.shape}')
print(f'Countries   : {df["Country"].unique().tolist()}')
print(f'Date range  : {df["Date"].min().date()} to {df["Date"].max().date()}')
print(f'Missing vals: {df.isnull().sum().sum()}')
df.head()

---
## Step 3: Clean & Prepare Data

In [None]:
FOCUS = ['France', 'Italy', 'Russia', 'South Africa', 'Australia']
df_focus = df[df['Country'].isin(FOCUS)].copy().sort_values(['Country', 'Date'])

# 4-week rolling average
for col in ['Weekly_New_Cases', 'Weekly_New_Deaths']:
    df_focus[f'{col}_avg'] = (
        df_focus.groupby('Country')[col]
        .transform(lambda x: x.rolling(4, min_periods=1).mean())
    )

latest = df_focus.groupby('Country').last().reset_index()
print('Data prepared!')
df_focus.describe().round(2)

---
## Step 4: Chart 1 — Weekly Cases & Deaths Over Time

In [None]:
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)
fig.suptitle('COVID-19: Weekly Cases & Deaths Over Time\n(4-Week Rolling Average)',
             fontsize=14, fontweight='bold', y=0.99)

for i, (col, label) in enumerate([
    ('Weekly_New_Cases_avg',  'Weekly New Cases'),
    ('Weekly_New_Deaths_avg', 'Weekly New Deaths'),
]):
    ax = axes[i]
    for j, country in enumerate(FOCUS):
        data = df_focus[df_focus['Country'] == country]
        ax.plot(data['Date'], data[col], label=country, color=COLORS[j], linewidth=1.8)
    ax.set_ylabel(label, fontsize=11)
    ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f'{x:,.0f}'))
    ax.legend(loc='upper left', fontsize=9)
    ax.set_facecolor('#F9FBFC')

axes[1].set_xlabel('Date', fontsize=11)
plt.tight_layout()
plt.show()

---
## Step 5: Chart 2 — Population-Adjusted Country Comparison

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
fig.suptitle('Population-Adjusted Comparison: Cases & Deaths per Million',
             fontsize=14, fontweight='bold')

for ax, (col, title, color) in zip(axes, [
    ('Cases_Per_Million',  'Total Cases per Million',  COLORS[0]),
    ('Deaths_Per_Million', 'Total Deaths per Million', COLORS[1]),
]):
    plot_data = latest[['Country', col]].dropna().sort_values(col, ascending=True)
    bars = ax.barh(plot_data['Country'], plot_data[col], color=color, edgecolor='white', height=0.55)
    ax.bar_label(bars, fmt='{:,.0f}', padding=4, fontsize=9)
    ax.set_title(title, fontsize=12, fontweight='bold')
    ax.set_facecolor('#F9FBFC')

plt.tight_layout()
plt.show()

---
## Step 6: Chart 3 — Vaccination Rollout Progress

In [None]:
df_vax = df_focus[['Date', 'Country', 'Vaccinated_Pct']].dropna()

fig, ax = plt.subplots(figsize=(14, 6))
for j, country in enumerate(FOCUS):
    data = df_vax[df_vax['Country'] == country]
    ax.plot(data['Date'], data['Vaccinated_Pct'], label=country, color=COLORS[j], linewidth=2)

ax.axhline(70, color='gray', linestyle='--', linewidth=1.2, alpha=0.7, label='70% Herd Immunity Target')
ax.fill_between(df_vax['Date'].unique(), 70, 100, alpha=0.05, color='green')
ax.set_title('Vaccination Rollout: % Population Vaccinated (At Least 1 Dose)', fontsize=13, fontweight='bold')
ax.set_ylabel('% Vaccinated', fontsize=11)
ax.set_ylim(0, 105)
ax.legend(fontsize=10)
ax.set_facecolor('#F9FBFC')
plt.tight_layout()
plt.show()

---
## Step 7: Chart 4 — Case Fatality Rate

In [None]:
cfr_data = latest[['Country', 'Case_Fatality_Rate_Pct']].dropna().sort_values('Case_Fatality_Rate_Pct', ascending=True)

fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.barh(cfr_data['Country'], cfr_data['Case_Fatality_Rate_Pct'],
               color=COLORS[2], edgecolor='white', height=0.5)
ax.bar_label(bars, fmt='{:.2f}%', padding=4, fontsize=10)
ax.set_title('Case Fatality Rate (CFR) by Country\n(Total Deaths / Total Cases × 100)',
             fontsize=13, fontweight='bold')
ax.set_xlabel('Case Fatality Rate (%)', fontsize=11)
ax.set_facecolor('#F9FBFC')
plt.tight_layout()
plt.show()

---
## Step 8: Chart 5 — Recovery Rate Trend *(New Analysis)*

In [None]:
df_rec = df_focus[['Date', 'Country', 'Recovery_Rate_Pct']].dropna()

fig, ax = plt.subplots(figsize=(14, 6))
for j, country in enumerate(FOCUS):
    data = df_rec[df_rec['Country'] == country]
    ax.plot(data['Date'], data['Recovery_Rate_Pct'], label=country, color=COLORS[j], linewidth=2)

ax.axhline(95, color='green', linestyle='--', linewidth=1.2, alpha=0.6, label='95% Recovery Benchmark')
ax.set_title('Recovery Rate Trend Over Time by Country', fontsize=13, fontweight='bold')
ax.set_ylabel('Recovery Rate (%)', fontsize=11)
ax.set_ylim(80, 102)
ax.legend(fontsize=10)
ax.set_facecolor('#F9FBFC')
plt.tight_layout()
plt.show()

---
## Step 9: Summary Report

In [None]:
summary = latest[['Country', 'Cumulative_Cases', 'Cumulative_Deaths',
                   'Cases_Per_Million', 'Deaths_Per_Million',
                   'Case_Fatality_Rate_Pct', 'Vaccinated_Pct', 'Recovery_Rate_Pct']].copy()

summary.columns = ['Country', 'Total Cases', 'Total Deaths',
                   'Cases/Million', 'Deaths/Million', 'CFR (%)', 'Vaccinated (%)', 'Recovery Rate (%)']

summary = summary.set_index('Country')
summary.round(2)

---
## Conclusion

| Country | CFR (%) | Vaccinated (%) | Recovery Rate (%) |
|---------|---------|---------------|-------------------|
| France | 1.63 | 92.0 | 98.4 |
| Italy | 1.67 | 92.0 | 98.3 |
| Russia | 1.68 | 92.0 | 98.3 |
| South Africa | 1.65 | 92.0 | 98.3 |
| Australia | 1.65 | 92.0 | 98.3 |

### Key Insights
- **Australia** had the highest cases per million due to delayed but large variant-driven waves
- **Russia** reported the lowest cases per million, possibly reflecting lower testing rates
- **Italy & South Africa** had the highest deaths per million among the 5 countries
- All countries crossed the **70% vaccination target** and maintained **98%+ recovery rates**
- The 4-week rolling average clearly reveals distinct epidemic wave patterns across countries

---
*Rensee Gajipara | B.Tech AI & Data Science | SCET, Surat | 2026*