# COVID-19 Global Data Tracker
This Jupyter Notebook tracks global COVID-19 trends by analyzing cases, deaths, recoveries, and vaccinations across countries and time.
It includes data cleaning, exploratory data analysis (EDA), visualizations, and narrative insights.

## 1. Data Collection
We use the cleaned COVID-19 dataset from Our World in Data (owid-covid-data.csv).
Make sure the CSV file is saved in the working directory.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.express as px

# Load the dataset
df = pd.read_csv('owid-covid-data.csv')
df.head()

## 2. Data Loading & Exploration
Explore the dataset structure and check for missing values.

In [None]:
# Check columns
print(df.columns)

# Preview rows
df.head()

# Identify missing values
df.isnull().sum()

## 3. Data Cleaning
Filter countries of interest and clean the data for analysis.

In [None]:
# Select countries of interest
countries = ['Kenya', 'USA', 'India']
df_countries = df[df['location'].isin(countries)].copy()

# Drop rows with missing dates or critical values
df_countries.dropna(subset=['date', 'total_cases', 'total_deaths'], inplace=True)

# Convert date column to datetime
df_countries['date'] = pd.to_datetime(df_countries['date'])

# Fill missing numeric values with interpolation
df_countries.sort_values(['location', 'date'], inplace=True)
df_countries['total_cases'] = df_countries.groupby('location')['total_cases'].apply(lambda x: x.interpolate())
df_countries['total_deaths'] = df_countries.groupby('location')['total_deaths'].apply(lambda x: x.interpolate())
df_countries['total_vaccinations'] = df_countries.groupby('location')['total_vaccinations'].apply(lambda x: x.interpolate())

df_countries.head()

## 4. Exploratory Data Analysis (EDA)
Analyze and visualize COVID-19 cases and deaths over time.

In [None]:
# Plot total cases over time for selected countries
plt.figure(figsize=(12, 6))
sns.lineplot(data=df_countries, x='date', y='total_cases', hue='location')
plt.title('Total COVID-19 Cases Over Time')
plt.ylabel('Total Cases')
plt.xlabel('Date')
plt.legend(title='Country')
plt.show()

# Plot total deaths over time
plt.figure(figsize=(12, 6))
sns.lineplot(data=df_countries, x='date', y='total_deaths', hue='location')
plt.title('Total COVID-19 Deaths Over Time')
plt.ylabel('Total Deaths')
plt.xlabel('Date')
plt.legend(title='Country')
plt.show()

# Compare daily new cases between countries
plt.figure(figsize=(12, 6))
sns.lineplot(data=df_countries, x='date', y='new_cases', hue='location')
plt.title('Daily New COVID-19 Cases')
plt.ylabel('New Cases')
plt.xlabel('Date')
plt.legend(title='Country')
plt.show()

# Calculate death rate and plot
df_countries['death_rate'] = df_countries['total_deaths'] / df_countries['total_cases']
plt.figure(figsize=(12, 6))
sns.lineplot(data=df_countries, x='date', y='death_rate', hue='location')
plt.title('COVID-19 Death Rate Over Time')
plt.ylabel('Death Rate')
plt.xlabel('Date')
plt.legend(title='Country')
plt.show()

## 5. Visualizing Vaccination Progress
Analyze vaccination rollouts and compare vaccination coverage.

In [None]:
# Plot cumulative vaccinations over time
plt.figure(figsize=(12, 6))
sns.lineplot(data=df_countries, x='date', y='total_vaccinations', hue='location')
plt.title('Cumulative COVID-19 Vaccinations Over Time')
plt.ylabel('Total Vaccinations')
plt.xlabel('Date')
plt.legend(title='Country')
plt.show()

## 6. Optional: Build a Choropleth Map
Visualize case density or vaccination rates by country on a world map.

In [None]:
# Prepare latest data for choropleth
latest_date = df['date'].max()
df_latest = df[df['date'] == latest_date]  # latest date data

fig = px.choropleth(df_latest, locations='iso_code', color='total_cases',
                    hover_name='location',
                    color_continuous_scale='Reds',
                    title='Global COVID-19 Total Cases as of ' + latest_date)
fig.show()

## 7. Insights & Reporting
Key insights and observations from the data analysis.

### Key Insights:
- USA has had the highest total cases and deaths among the selected countries.
- India experienced large waves of new cases with significant death tolls.
- Kenya shows lower total cases and deaths but vaccination progress is slower compared to USA and India.
- Death rate trends indicate fluctuations likely due to reporting and healthcare factors.
- Vaccination rollouts have varied significantly across countries, impacting case trends.

### Anomalies and Patterns:
- Sudden spikes in new cases may correspond to variant outbreaks or reporting changes.
- Some missing data points were interpolated to maintain trend continuity.
- Vaccination data is less complete for some countries, affecting analysis accuracy.