# COVID-19 Analysis for South Africa, Nigeria, Kenya, and the USA

## Key Insights
1. **United States** consistently reported the highest total cases and deaths across the period.
2. **South Africa** showed a high death rate compared to total cases, especially in the early waves.
3. **Kenya** and **Nigeria** had lower reported cases and deaths, possibly due to testing/reporting limitations.
4. A noticeable **spike in daily new cases** occurred mid-2021 across all countries, aligned with the Delta variant spread.
5. **Death rate declined over time**, indicating improved treatment or vaccination effectiveness.

## Anomalies & Patterns
- There were **sudden drops or spikes** in daily new cases, likely due to data reporting lags or adjustments.
- Some countries (like Nigeria) showed extended periods of **zero new cases**, suggesting data unavailability rather than true absence of cases.
- The correlation heatmap revealed that **new deaths are more strongly correlated with new cases** than with total cases.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load and clean data
df = pd.read_csv("owid-covid-data.csv")
countries = ['South Africa', 'Nigeria', 'Kenya', 'United States']
df = df[df['location'].isin(countries)]
df = df.dropna(subset=['date', 'total_cases', 'total_deaths'])
df['date'] = pd.to_datetime(df['date'])
df['death_rate'] = df['total_deaths'] / df['total_cases']

In [None]:
# Plot total cases over time
plt.figure(figsize=(10, 6))
for country in countries:
    country_data = df[df['location'] == country]
    plt.plot(country_data['date'], country_data['total_cases'], label=country)
plt.title("Total COVID-19 Cases Over Time")
plt.xlabel("Date")
plt.ylabel("Total Cases")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Plot total deaths over time
plt.figure(figsize=(10, 6))
for country in countries:
    country_data = df[df['location'] == country]
    plt.plot(country_data['date'], country_data['total_deaths'], label=country)
plt.title("Total COVID-19 Deaths Over Time")
plt.xlabel("Date")
plt.ylabel("Total Deaths")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Compare daily new cases between countries
plt.figure(figsize=(10, 6))
for country in countries:
    country_data = df[df['location'] == country]
    plt.plot(country_data['date'], country_data['new_cases'].fillna(0), label=country)
plt.title("Daily New COVID-19 Cases")
plt.xlabel("Date")
plt.ylabel("New Cases")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Bar chart of top countries by total cases on latest available date
latest = df.sort_values("date").groupby("location").tail(1)
latest_sorted = latest.sort_values("total_cases", ascending=False)
plt.figure(figsize=(8, 5))
sns.barplot(data=latest_sorted, x="location", y="total_cases", palette="viridis")
plt.title("Total Cases by Country (Most Recent Date)")
plt.xlabel("Country")
plt.ylabel("Total Cases")
plt.tight_layout()
plt.show()

In [None]:
# Heatmap for correlation analysis
plt.figure(figsize=(10, 6))
correlation_data = df[['total_cases', 'new_cases', 'total_deaths', 'new_deaths', 'death_rate']]
sns.heatmap(correlation_data.corr(), annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Correlation Heatmap of COVID-19 Metrics")
plt.tight_layout()
plt.show()