# COVID-19 Global Data Tracker
This notebook analyzes and reports global COVID-19 trends, including cases, deaths, recoveries, and vaccinations.

## Data Collection
Download the 'owid-covid-data.csv' dataset from [Our World in Data](https://ourworldindata.org/covid-cases) and save it in the working folder.

In [None]:
# Download the dataset
import os
import urllib.request

dataset_url = "https://covid.ourworldindata.org/data/owid-covid-data.csv"
dataset_path = "owid-covid-data.csv"

if not os.path.exists(dataset_path):
    urllib.request.urlretrieve(dataset_url, dataset_path)
    print(f"Dataset downloaded and saved as {dataset_path}")
else:
    print(f"Dataset already exists at {dataset_path}")

## Data Loading & Exploration
Use pandas to load the dataset, inspect its structure with `df.columns` and `df.head()`, and identify missing values with `df.isnull().sum()`.

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv(dataset_path)

# Inspect the structure of the dataset
print("Columns in the dataset:")
print(df.columns)

print("\nFirst 5 rows of the dataset:")
print(df.head())

# Identify missing values
print("\nMissing values in each column:")
print(df.isnull().sum())

## Data Cleaning
Filter countries of interest, drop rows with missing critical values, convert the date column to datetime, and handle missing numeric values using `fillna()` or `interpolate()`.

In [None]:
# Filter for specific countries (e.g., USA, India, Brazil)
countries_of_interest = ["United States", "India", "Brazil"]
df_filtered = df[df["location"].isin(countries_of_interest)]

# Drop rows with missing critical values
df_filtered = df_filtered.dropna(subset=["total_cases", "total_deaths", "date"])

# Convert the date column to datetime
df_filtered["date"] = pd.to_datetime(df_filtered["date"])

# Handle missing numeric values
df_filtered = df_filtered.fillna(method="ffill")

print("Data cleaning completed.")

## Exploratory Data Analysis (EDA)
Generate descriptive statistics and visualize trends, such as total cases and deaths over time, daily new cases, and death rates. Use matplotlib and seaborn for line and bar charts.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Plot total cases over time for each country
plt.figure(figsize=(12, 6))
sns.lineplot(data=df_filtered, x="date", y="total_cases", hue="location")
plt.title("Total COVID-19 Cases Over Time")
plt.xlabel("Date")
plt.ylabel("Total Cases")
plt.legend(title="Country")
plt.show()

# Plot daily new cases
df_filtered["new_cases"] = df_filtered["total_cases"].diff()
plt.figure(figsize=(12, 6))
sns.barplot(data=df_filtered, x="date", y="new_cases", hue="location")
plt.title("Daily New COVID-19 Cases")
plt.xlabel("Date")
plt.ylabel("New Cases")
plt.legend(title="Country")
plt.show()

## Visualizing Vaccination Progress
Analyze vaccination rollouts by plotting cumulative vaccinations over time and comparing the percentage of vaccinated populations using line and pie charts.

In [None]:
# Plot cumulative vaccinations over time
plt.figure(figsize=(12, 6))
sns.lineplot(data=df_filtered, x="date", y="total_vaccinations", hue="location")
plt.title("Cumulative COVID-19 Vaccinations Over Time")
plt.xlabel("Date")
plt.ylabel("Total Vaccinations")
plt.legend(title="Country")
plt.show()

## Optional: Build a Choropleth Map
Prepare a dataframe with `iso_code` and `total_cases` for the latest date, and use Plotly Express or geopandas to create a choropleth map showing case density or vaccination rates by country.

In [None]:
import plotly.express as px

# Prepare data for the latest date
latest_date = df_filtered["date"].max()
df_latest = df[df["date"] == latest_date]

# Create a choropleth map
fig = px.choropleth(df_latest, locations="iso_code", color="total_cases",
                    hover_name="location", title="COVID-19 Case Density by Country",
                    color_continuous_scale="Reds")
fig.show()

## Insights & Reporting
Summarize findings with 3-5 key insights, highlight anomalies, and document the analysis with markdown cells in the notebook. Optionally, export the notebook to PDF or PowerPoint.

### Key Insights
1. The United States, India, and Brazil have experienced significant COVID-19 case surges at different points in time.
2. Vaccination rollouts have varied widely across countries, with some achieving higher coverage earlier than others.
3. Daily new cases show clear peaks and troughs, indicating waves of infections.
4. The choropleth map highlights regions with high case densities, providing a global perspective on the pandemic's impact.