# COVID-19 Global Data Tracker

This project tracks global COVID-19 trends such as cases, deaths, and vaccinations across countries and over time. We will clean and analyze real-world data using Python tools, generate visualizations, and extract insights for presentation.

**Case Study Countries**: Kenya, USA, India  
**Data Source**: Our World in Data (owid-covid-data.csv)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
sns.set(style='whitegrid')

In [None]:
# Load data
df = pd.read_csv("owid-covid-data.csv")

# Preview
df.head()

In [None]:
# Column overview
df.columns

In [None]:
# Check missing values
df.isnull().sum()

In [None]:
# Filter countries of interest
countries = ['Kenya', 'United States', 'India']
df = df[df['location'].isin(countries)]

# Convert date to datetime
df['date'] = pd.to_datetime(df['date'])

# Drop rows with missing critical values
df = df.dropna(subset=['date', 'total_cases'])

# Fill remaining missing values
df = df.fillna(method='ffill')

In [None]:
plt.figure(figsize=(12, 6))
for country in countries:
    subset = df[df['location'] == country]
    plt.plot(subset['date'], subset['total_cases'], label=country)

plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
for country in countries:
    subset = df[df['location'] == country]
    plt.plot(subset['date'], subset['total_deaths'], label=country)

plt.title('Total COVID-19 Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Total Deaths')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
for country in countries:
    subset = df[df['location'] == country]
    plt.plot(subset['date'], subset['new_cases'], label=country)

plt.title('Daily New COVID-19 Cases')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
for country in countries:
    subset = df[df['location'] == country].copy()
    subset['death_rate'] = subset['total_deaths'] / subset['total_cases']
    plt.plot(subset['date'], subset['death_rate'], label=country)

plt.title('COVID-19 Death Rate Over Time')
plt.xlabel('Date')
plt.ylabel('Death Rate')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
for country in countries:
    subset = df[df['location'] == country]
    plt.plot(subset['date'], subset['total_vaccinations'], label=country)

plt.title('Total Vaccinations Over Time')
plt.xlabel('Date')
plt.ylabel('Cumulative Vaccinations')
plt.legend()
plt.show()

In [None]:
# Example if using Plotly
import plotly.express as px

# Get latest data
latest = df[df['date'] == df['date'].max()]
choropleth_df = latest[['iso_code', 'location', 'total_cases']].dropna()

fig = px.choropleth(choropleth_df,
                    locations="iso_code",
                    color="total_cases",
                    hover_name="location",
                    title="Global COVID-19 Total Cases")
fig.show()

In [None]:
# Key Insights

1. India experienced a large second wave in mid-2021.
2. USA had the highest vaccination count among the three.
3. Kenya showed slower vaccine uptake compared to others.
4. Death rates remained below 2% in all three countries after mid-2021.
5. Daily new cases spiked in the USA during winter 2020-21.