# COVID-19 Global Data Tracker

This notebook analyzes global COVID-19 trends including cases, deaths, recoveries, and vaccinations across countries and time.

## 1. Data Collection
We use the Our World in Data COVID-19 dataset for our analysis.

## 2. Data Loading & Exploration


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_csv('data/owid-covid-data.csv')

# Display basic information about the dataset
print("Dataset Info:")
print(df.info())

# Display first few rows
print("\nFirst few rows:")
print(df.head())

# Check for missing values
print("\nMissing values:")
print(df.isnull().sum())

## 3. Data Cleaning

We'll clean the data by:
1. Converting date column to datetime
2. Handling missing values
3. Filtering countries of interest

In [None]:
# Convert date to datetime
df['date'] = pd.to_datetime(df['date'])

# Select countries of interest
countries = ['Kenya', 'United States', 'India', 'United Kingdom', 'South Africa']
df_selected = df[df['location'].isin(countries)]

# Handle missing values in critical columns
critical_columns = ['total_cases', 'total_deaths', 'new_cases', 'new_deaths']
df_selected[critical_columns] = df_selected[critical_columns].fillna(0)

## 4. Exploratory Data Analysis (EDA)

We'll analyze:
1. Total cases over time
2. Total deaths over time
3. Daily new cases
4. Death rates

In [None]:
# Function to plot metrics over time
def plot_metric_over_time(data, metric, title):
    plt.figure(figsize=(12, 6))
    for country in countries:
        country_data = data[data['location'] == country]
        plt.plot(country_data['date'], country_data[metric], label=country)
    
    plt.title(title)
    plt.xlabel('Date')
    plt.ylabel(metric.replace('_', ' ').title())
    plt.legend()
    plt.xticks(rotation=45)
    plt.grid(True)
    plt.tight_layout()

# Plot total cases
plot_metric_over_time(df_selected, 'total_cases', 'Total COVID-19 Cases Over Time')

# Plot total deaths
plot_metric_over_time(df_selected, 'total_deaths', 'Total COVID-19 Deaths Over Time')

# Calculate and plot death rates
df_selected['death_rate'] = (df_selected['total_deaths'] / df_selected['total_cases']) * 100
plot_metric_over_time(df_selected, 'death_rate', 'COVID-19 Death Rate Over Time (%)')

## 5. Vaccination Progress Analysis

In [None]:
# Plot vaccination progress
plot_metric_over_time(df_selected, 'total_vaccinations', 'Total Vaccinations Over Time')

# Plot percentage of population vaccinated
plot_metric_over_time(df_selected, 'people_fully_vaccinated_per_hundred', 'Percentage of Population Fully Vaccinated')

## 6. Key Insights & Findings

1. Case Trends:
   - [Will be filled based on actual data analysis]

2. Death Rates:
   - [Will be filled based on actual data analysis]

3. Vaccination Progress:
   - [Will be filled based on actual data analysis]

4. Country Comparisons:
   - [Will be filled based on actual data analysis]

5. Notable Patterns:
   - [Will be filled based on actual data analysis]