# COVID-19 Global Data Analysis

This project analyzes global COVID-19 data to understand case trends, deaths, vaccination rollout, and regional impacts using visualizations.


## Objectives

- Clean and preprocess COVID-19 dataset.
- Analyze case trends and death rates.
- Visualize vaccination progress.
- Draw insights from the data.


In [None]:
import pandas as pd

# Load the dataset using the correct path
df = pd.read_csv('C:/Users/user/Desktop/COVID-19 Global Data Tracker Project/Data/owid-covid-data.csv')

# 1. Handling missing values:
# Fill missing continent with 'Unknown'
df['continent'] = df['continent'].fillna('Unknown')  # If the continent data is missing, we label it as 'Unknown'

# Fill missing numeric values (like total_cases, total_deaths) with 0 or use interpolation
df['total_cases'] = df['total_cases'].fillna(0)  # Missing total cases are replaced by 0, assuming no cases were reported
df['total_deaths'] = df['total_deaths'].fillna(0)  # Missing total deaths are replaced by 0
df['total_vaccinations'] = df['total_vaccinations'].fillna(0)  # Missing vaccination data is replaced by 0

# For other numeric columns, you can interpolate missing values (or fill with 0)
df['excess_mortality'] = df['excess_mortality'].interpolate(method='linear')  # Interpolate missing mortality data

# 2. Filter countries of interest
countries_of_interest = ['Kenya', 'USA', 'India']  # We are focusing on these specific countries for analysis
df_filtered = df[df['location'].isin(countries_of_interest)]  # Filter the DataFrame to include only the countries in the list

# 3. Drop rows with missing values in critical columns
df_cleaned = df_filtered.dropna(subset=['total_cases', 'total_deaths', 'total_vaccinations'])  # Remove rows with missing critical data

# 4. Convert the 'date' column to datetime format
df_cleaned['date'] = pd.to_datetime(df_cleaned['date'])  # Convert the date column to a pandas datetime object for better time-based analysis

# 5. Check the cleaned data
print("\nCleaned DataFrame (first 5 rows):")
print(df_cleaned.head())  # Display the first 5 rows of the cleaned DataFrame

# Check for missing values again after cleaning
print("\nMissing values in cleaned dataset:")
print(df_cleaned.isnull().sum())  # Check if any missing values still exist in the cleaned dataset

import matplotlib.pyplot as plt
import seaborn as sns

# Set up the style of the plots
sns.set(style="whitegrid")  # Set a clean white grid background for the plots

# 1. Plot total cases over time for selected countries
plt.figure(figsize=(12, 6))  # Set the figure size for the plot
for country in countries_of_interest:
    country_data = df_cleaned[df_cleaned['location'] == country]  # Filter data for each country
    plt.plot(country_data['date'], country_data['total_cases'], label=country)  # Plot the total cases over time for each country

# Total COVID-19 Cases Over Time Explanation:
# This chart shows the total number of cases over time for selected countries.
# We can see a rise in cases across various countries, especially in mid-2021.
plt.title("Total COVID-19 Cases Over Time")  # Set the title for the plot
plt.xlabel("Date")  # Label for the x-axis
plt.ylabel("Total Cases")  # Label for the y-axis
plt.legend()  # Add a legend to the plot to distinguish between countries
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.tight_layout()  # Adjust layout to make sure everything fits in the figure area
plt.show()  # Display the plot for total cases

# 2. Plot total deaths over time for selected countries
plt.figure(figsize=(12, 6))  # Set the figure size for the plot
for country in countries_of_interest:
    country_data = df_cleaned[df_cleaned['location'] == country]  # Filter data for each country
    plt.plot(country_data['date'], country_data['total_deaths'], label=country)  # Plot the total deaths over time for each country

# Total COVID-19 Deaths Over Time Explanation:
# This chart shows the total number of deaths over time for selected countries.
# We can observe a rise in death numbers that generally follows the case trends.
plt.title("Total COVID-19 Deaths Over Time")  # Set the title for the plot
plt.xlabel("Date")  # Label for the x-axis
plt.ylabel("Total Deaths")  # Label for the y-axis
plt.legend()  # Add a legend to the plot
plt.xticks(rotation=45)  # Rotate x-axis labels for readability
plt.tight_layout()  # Adjust layout
plt.show()  # Display the plot for total deaths

# 3. Compare daily new cases between countries
plt.figure(figsize=(12, 6))  # Set the figure size for the plot
for country in countries_of_interest:
    country_data = df_cleaned[df_cleaned['location'] == country]  # Filter data for each country
    plt.plot(country_data['date'], country_data['new_cases'], label=country)  # Plot the daily new cases over time for each country

# Daily New COVID-19 Cases Over Time Explanation:
# This chart shows the daily new cases for the selected countries.
# The spikes in the graph indicate when the virus spread faster, leading to a higher number of new cases.
plt.title("Daily New COVID-19 Cases Over Time")  # Set the title for the plot
plt.xlabel("Date")  # Label for the x-axis
plt.ylabel("New Cases")  # Label for the y-axis
plt.legend()  # Add a legend to the plot
plt.xticks(rotation=45)  # Rotate x-axis labels for readability
plt.tight_layout()  # Adjust layout
plt.show()  # Display the plot for daily new cases

# 4. Calculate death rate and plot it
df_cleaned['death_rate'] = df_cleaned['total_deaths'] / df_cleaned['total_cases']  # Calculate the death rate as the ratio of total deaths to total cases

plt.figure(figsize=(12, 6))  # Set the figure size for the plot
for country in countries_of_interest:
    country_data = df_cleaned[df_cleaned['location'] == country]  # Filter data for each country
    plt.plot(country_data['date'], country_data['death_rate'], label=country)  # Plot the death rate over time for each country

# COVID-19 Death Rate Over Time Explanation:
# This chart shows the death rate (ratio of total deaths to total cases) over time for the selected countries.
# This helps us understand how the severity of the pandemic evolved in each country.
plt.title("COVID-19 Death Rate Over Time")  # Set the title for the plot
plt.xlabel("Date")  # Label for the x-axis
plt.ylabel("Death Rate (Total Deaths / Total Cases)")  # Label for the y-axis
plt.legend()  # Add a legend to the plot
plt.xticks(rotation=45)  # Rotate x-axis labels for readability
plt.tight_layout()  # Adjust layout
plt.show()  # Display the plot for the death rate


## Key Insights

- India had one of the fastest vaccine rollouts.
- Africa had the lowest case reporting rates.
- Spike in global cases seen in mid-2021.
- Some countries have 60%+ of population fully vaccinated.
