# COVID-19 Global Data Analysis and Reporting

This notebook analyzes global COVID-19 trends focusing on cases, deaths, and vaccination progress for selected countries. It loads the Our World in Data COVID-19 dataset, cleans the data, explores key metrics, and visualizes trends with insightful commentary.

## 1. Data Loading & Exploration

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style="darkgrid")

# Load dataset
df = pd.read_csv('owid-covid-data.csv')

# Preview dataset
df.head()

ModuleNotFoundError: No module named 'matplotlib'

### Dataset columns overview

In [2]:
# Show columns
df.columns

NameError: name 'df' is not defined

### Check missing values in key columns

In [3]:
key_cols = ['date', 'location', 'total_cases', 'total_deaths', 'new_cases', 'new_deaths', 'total_vaccinations']
df[key_cols].isnull().sum()

NameError: name 'df' is not defined

## 2. Data Cleaning

In [4]:
# Filter countries of interest
countries = ['Kenya', 'United States', 'India', 'Brazil', 'South Africa']
df_countries = df[df['location'].isin(countries)].copy()

# Drop rows with missing dates or critical values
df_countries = df_countries.dropna(subset=['date'])

# Convert date to datetime
df_countries['date'] = pd.to_datetime(df_countries['date'])

# Fill missing numeric values with 0 for cumulative columns and interpolate daily new cases/deaths
df_countries['total_cases'] = df_countries['total_cases'].fillna(method='ffill').fillna(0)
df_countries['total_deaths'] = df_countries['total_deaths'].fillna(method='ffill').fillna(0)
df_countries['new_cases'] = df_countries['new_cases'].interpolate().fillna(0)
df_countries['new_deaths'] = df_countries['new_deaths'].interpolate().fillna(0)
df_countries['total_vaccinations'] = df_countries['total_vaccinations'].fillna(method='ffill').fillna(0)

NameError: name 'df' is not defined

## 3. Exploratory Data Analysis (EDA)

### Total Cases Over Time

In [5]:
import matplotlib.pyplot as plt
plt.figure(figsize=(12,6))
for country in countries:
    subset = df_countries[df_countries['location'] == country]
    plt.plot(subset['date'], subset['total_cases'], label=country)
plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.tight_layout()
plt.show()

ModuleNotFoundError: No module named 'matplotlib'

### Total Deaths Over Time

In [6]:
plt.figure(figsize=(12,6))
for country in countries:
    subset = df_countries[df_countries['location'] == country]
    plt.plot(subset['date'], subset['total_deaths'], label=country)
plt.title('Total COVID-19 Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Total Deaths')
plt.legend()
plt.tight_layout()
plt.show()

NameError: name 'plt' is not defined

### New Cases Comparison

In [7]:
plt.figure(figsize=(12,6))
for country in countries:
    subset = df_countries[df_countries['location'] == country]
    plt.plot(subset['date'], subset['new_cases'], label=country)
plt.title('Daily New COVID-19 Cases')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.legend()
plt.tight_layout()
plt.show()

NameError: name 'plt' is not defined

### Death Rate (Total Deaths / Total Cases)

In [8]:
# Avoid division by zero
df_countries['death_rate'] = df_countries.apply(lambda row: (row['total_deaths'] / row['total_cases']) if row['total_cases'] > 0 else 0, axis=1)

plt.figure(figsize=(12,6))
for country in countries:
    subset = df_countries[df_countries['location'] == country]
    plt.plot(subset['date'], subset['death_rate'], label=country)
plt.title('COVID-19 Death Rate Over Time')
plt.xlabel('Date')
plt.ylabel('Death Rate')
plt.legend()
plt.tight_layout()
plt.show()

NameError: name 'df_countries' is not defined

## 4. Vaccination Progress

### Cumulative Vaccinations Over Time

In [9]:
plt.figure(figsize=(12,6))
for country in countries:
    subset = df_countries[df_countries['location'] == country]
    plt.plot(subset['date'], subset['total_vaccinations'], label=country)
plt.title('Total COVID-19 Vaccinations Over Time')
plt.xlabel('Date')
plt.ylabel('Total Vaccinations')
plt.legend()
plt.tight_layout()
plt.show()

NameError: name 'plt' is not defined

## 5. Key Insights

### Insight 1
- The United States shows high total COVID-19 cases and deaths, but also a strong vaccination rollout.

### Insight 2
- India and Brazil experienced sharp increases in daily new cases during waves, impacting global trends.

### Insight 3
- Kenya and South Africa have lower total cases and deaths but show steady vaccination progress.

### Insight 4
- Death rate varies significantly across countries, with some countries showing decreases over time, indicating improved treatment.

### Insight 5
- Vaccination rollouts correlate with slowing growth in new cases in some countries, highlighting effectiveness.

## Conclusion

This analysis highlights global disparities in COVID-19 impact and vaccination progress. Continued data monitoring and vaccination efforts remain critical in controlling the pandemic. This notebook serves as a foundation for further analysis and reporting.