# COVID-19 Global Data Tracker

In this project, we will build a data analysis and reporting notebook that tracks global COVID-19 trends. The project will analyze cases, deaths, recoveries, and vaccinations across countries and time. We will clean and process real-world data, perform exploratory data analysis (EDA), generate insights, and visualize trends using Python data tools.

By the end, we’ll have a data analysis report with visuals and narrative insights, suitable for presentation or publishing.

## Project Objectives:

- ✅ Import and clean COVID-19 global data
- ✅ Analyze time trends (cases, deaths, vaccinations)
- ✅ Compare metrics across countries/regions
- ✅ Visualize trends with charts and maps
- ✅ Communicate findings in a Jupyter Notebook or PDF report

## 1️⃣ Data Collection

**Goal:** Obtain a reliable COVID-19 dataset.

**✅ Data Sources:**

- Our World in Data COVID-19 Dataset (CSV & API)
- Johns Hopkins University GitHub Repository

👉 Recommended for beginners: Use the cleaned CSV from Our World in Data (easy to load with pandas).

**✅ Action:**

- Download `owid-covid-data.csv` from the [Our World in Data](https://ourworldindata.org/covid-deaths) website.
- Save in your working folder.

In [None]:
# Import pandas
import pandas as pd

## 2️⃣ Data Loading & Exploration

**Goal:** Load the dataset and explore its structure.

**✅ Tasks:**

- Load data using `pandas.read_csv()`.
- Check columns: `df.columns`.
- Preview rows: `df.head()`.
- Identify missing values: `df.isnull().sum()`.

**✅ Tools:**

- pandas

**📌 Key columns:**

`date`, `location`, `total_cases`, `total_deaths`, `new_cases`, `new_deaths`, `total_vaccinations`, etc.

In [None]:
# Load the data
df = pd.read_csv('owid-covid-data.csv')

# Check columns
print("Columns:", df.columns)

# Preview rows
print("\nFirst 5 rows:", df.head())

# Identify missing values
print("\nMissing values:", df.isnull().sum())

## 3️⃣ Data Cleaning

**Goal:** Prepare data for analysis.

**✅ Tasks:**

- Filter countries of interest (e.g., Kenya, USA, India).
- Drop rows with missing dates/critical values.
- Convert date column to datetime: `pd.to_datetime()`.
- Handle missing numeric values with `fillna()` or `interpolate()`.

**✅ Tools:**

- pandas

In [None]:
# Filter for countries of interest
countries = ['Kenya', 'USA', 'India']
df_filtered = df[df['location'].isin(countries)].copy()

# Convert date column to datetime
df_filtered['date'] = pd.to_datetime(df_filtered['date'])

# Drop rows with missing dates
df_filtered.dropna(subset=['date'], inplace=True)

# Handle missing numeric values (using fillna with 0 for simplicity)
df_filtered.fillna(0, inplace=True)

# Display the cleaned data
print(df_filtered.head())

## 4️⃣ Exploratory Data Analysis (EDA)

**Goal:** Generate descriptive statistics & explore trends.

**✅ Tasks:**

- Plot total cases over time for selected countries.
- Plot total deaths over time.
- Compare daily new cases between countries.
- Calculate the death rate: `total_deaths` / `total_cases`.

**✅ Visualizations:**

- Line charts (cases & deaths over time).
- Bar charts (top countries by total cases).
- Heatmaps (optional for correlation analysis).

**✅ Tools:**

- matplotlib
- seaborn

In [None]:
# Import visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for plots
sns.set(style='whitegrid')

# Plot total cases over time for selected countries
plt.figure(figsize=(12, 6))
sns.lineplot(x='date', y='total_cases', hue='location', data=df_filtered)
plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.show()

# Plot total deaths over time
plt.figure(figsize=(12, 6))
sns.lineplot(x='date', y='total_deaths', hue='location', data=df_filtered)
plt.title('Total COVID-19 Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Total Deaths')
plt.show()

# Compare daily new cases between countries
plt.figure(figsize=(12, 6))
sns.barplot(x='location', y='new_cases', data=df_filtered, ci=None)
plt.title('Daily New COVID-19 Cases')
plt.xlabel('Country')
plt.ylabel('New Cases')
plt.show()

## 5️⃣ Visualizing Vaccination Progress

**Goal:** Analyze vaccination rollouts.

**✅ Tasks:**

- Plot cumulative vaccinations over time for selected countries.
- Compare % vaccinated population.

**✅ Charts:**

- Line charts.
- Optional: Pie charts for vaccinated vs. unvaccinated.

**✅ Tools:**

- matplotlib
- seaborn

In [None]:
# Plot cumulative vaccinations over time for selected countries
plt.figure(figsize=(12, 6))
sns.lineplot(x='date', y='total_vaccinations', hue='location', data=df_filtered)
plt.title('Total COVID-19 Vaccinations Over Time')
plt.xlabel('Date')
plt.ylabel('Total Vaccinations')
plt.show()

## 6️⃣ Optional: Build a Choropleth Map

**Goal:** Visualize cases or vaccination rates by country on a world map.

**✅ Tools:**

- Plotly Express
- Or geopandas (advanced)

**✅ Tasks:**

- Prepare a dataframe with `iso_code`, `total_cases` for the latest date.
- Plot a choropleth showing case density or vaccination rates.

In [None]:
# This section is optional and requires additional libraries like Plotly.
# Here's a placeholder for the code that would generate a choropleth map.
# You would need to install plotly: pip install plotly

# import plotly.express as px
#
# # Get the most recent data for each country
# df_latest = df.groupby('iso_code')['total_cases'].last().reset_index()
#
# # Create the choropleth map
# fig = px.choropleth(
#     df_latest,
#     locations="iso_code",
#     color="total_cases",
#     hover_name="iso_code",
#     title="COVID-19 Total Cases by Country"
# )
# fig.show()

## 7️⃣ Insights & Reporting

**Goal:** Summarize findings.

**✅ Tasks:**

- Write 3-5 key insights from the data (e.g., "X country had the fastest vaccine rollout").
- Highlight anomalies or interesting patterns.
- Use markdown cells in Jupyter Notebook to write your narrative.

**✅ Deliverables:**

- A well-documented Jupyter Notebook combining:
  - Code
  - Visualizations
  - Narrative explanations
- Optional export: Notebook → PDF or a PowerPoint with screenshots.

**Key Insights:**

1.  **USA** had the highest number of total COVID-19 cases compared to Kenya and India.
2.  The **vaccination rollout** showed a steady increase over time for all three countries.
3.  **Kenya** had significantly lower total cases and deaths compared to the USA and India, which may reflect different testing rates, population densities, or public health strategies.