# COVID-19 Global Data Tracker

This project tracks COVID-19 cases, deaths, and vaccinations using real-world data from [Our World in Data](https://github.com/owid/covid-19-data). It includes data cleaning, analysis, and visualizations for selected countries.

## Step 1: Data Collection
Download the CSV from the [OWID GitHub Repository](https://github.com/owid/covid-19-data/tree/master/public/data). Save it as `owid-covid-data.csv` in a folder named `data/`.

## Step 2: Data Loading & Exploration

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv("data/owid-covid-data.csv")

# View column names
print(df.columns)

# Preview data
df.head()

In [None]:
df.isnull().sum()

## Step 3: Data Cleaning

In [None]:
# Convert 'date' to datetime
df['date'] = pd.to_datetime(df['date'])

# Filter selected countries
countries = ['Kenya', 'United States', 'India']
df = df[df['location'].isin(countries)]

# Drop rows with missing date or total_cases
df = df.dropna(subset=['date', 'total_cases'])

# Fill remaining missing values with 0
df.fillna(0, inplace=True)

## Step 4: Exploratory Data Analysis (EDA)
### Total COVID-19 Cases Over Time

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
for country in countries:
    country_data = df[df['location'] == country]
    plt.plot(country_data['date'], country_data['total_cases'], label=country)

plt.title("Total COVID-19 Cases Over Time")
plt.xlabel("Date")
plt.ylabel("Total Cases")
plt.legend()
plt.grid(True)
plt.show()

### Total COVID-19 Deaths Over Time

In [None]:
plt.figure(figsize=(10, 6))
for country in countries:
    country_data = df[df['location'] == country]
    plt.plot(country_data['date'], country_data['total_deaths'], label=country)

plt.title("Total COVID-19 Deaths Over Time")
plt.xlabel("Date")
plt.ylabel("Total Deaths")
plt.legend()
plt.grid(True)
plt.show()

## Step 5: Visualizing Vaccination Progress

In [None]:
plt.figure(figsize=(10, 6))
for country in countries:
    country_data = df[df['location'] == country]
    plt.plot(country_data['date'], country_data['total_vaccinations'], label=country)

plt.title("Vaccination Progress Over Time")
plt.xlabel("Date")
plt.ylabel("Total Vaccinations")
plt.legend()
plt.grid(True)
plt.show()

## Step 6: (Optional) Choropleth Map

In [None]:
import plotly.express as px

latest_data = df[df['date'] == df['date'].max()]
fig = px.choropleth(latest_data,
                    locations="iso_code",
                    color="total_cases",
                    hover_name="location",
                    color_continuous_scale="Reds",
                    title="Total COVID-19 Cases (Latest Date)")
fig.show()

## Step 7: Insights & Reporting
- India had the highest total number of cases.
- Kenya had a slower vaccine rollout compared to the USA and India.
- Death rates differed significantly across countries.