# COVID-19 Global Data Tracker

This notebook loads a self-contained sample COVID-19 dataset (no internet required), explores trends using **pandas**, and visualizes them with **matplotlib**.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

pd.set_option('display.max_rows', 10)


In [None]:
csv_path = Path('covid_data.csv')
if not csv_path.exists():
    # Fallback: create a tiny dataset inline so the notebook always runs
    data = {
        'date': ['2020-03-01','2020-03-02','2020-03-03','2020-03-04','2020-03-05'],
        'country': ['Kenya','Kenya','Kenya','Kenya','Kenya'],
        'new_cases': [5, 8, 13, 21, 34],
        'new_deaths': [0, 0, 1, 0, 1]
    }
    df = pd.DataFrame(data)
    df['cumulative_cases'] = df['new_cases'].cumsum()
    df['cumulative_deaths'] = df['new_deaths'].cumsum()
else:
    df = pd.read_csv(csv_path, parse_dates=['date'])
df.head()

## Quick Exploration
- Shape, columns, and date range
- Missing values check
- Countries available

In [None]:
print('Rows, Cols:', df.shape)
print('Columns:', list(df.columns))
print('Date range:', df['date'].min(), '→', df['date'].max())
print('\nMissing values per column:')
print(df.isna().sum())
print('\nCountries:', df['country'].unique())

## Country-Level Aggregation
Total cases and deaths per country.

In [None]:
summary = (
    df.groupby('country')[['new_cases','new_deaths','cumulative_cases','cumulative_deaths']]
      .max()
      .sort_values('cumulative_cases', ascending=False)
)
summary

## Visualizations
All plots use **matplotlib** (no seaborn), one chart per figure, and default colors.

In [None]:
import matplotlib.pyplot as plt
plt.figure()
for country, sub in df.groupby('country'):
    sub = sub.sort_values('date')
    plt.plot(sub['date'], sub['new_cases'], label=country)
plt.title('Daily New Cases by Country')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
country_choice = 'Kenya'  # change to explore another country
sub = df[df['country'] == country_choice].sort_values('date')
plt.figure()
plt.plot(sub['date'], sub['cumulative_cases'])
plt.title(f'Cumulative Cases — {country_choice}')
plt.xlabel('Date')
plt.ylabel('Cumulative Cases')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
totals = df.groupby('country')['new_cases'].sum().sort_values(ascending=False)
plt.figure()
totals.plot(kind='bar')
plt.title('Total New Cases by Country (Sample Data)')
plt.xlabel('Country')
plt.ylabel('Total New Cases')
plt.tight_layout()
plt.show()

## Insights & Reflections
- This sample shows how different countries experience varying case trajectories.
- Visualizing cumulative vs. daily metrics tells different parts of the story.
- A reproducible, offline dataset ensures the notebook runs end-to-end without network access.