# COVID-19 Data Analysis and Visualization

In this notebook, we analyze COVID-19 data for countries like the USA, India, and Kenya, exploring case trends, vaccination rates, and more.

## Data Loading & Exploration

In [None]:

import pandas as pd

# Load the dataset
df = pd.read_csv("owid-covid-data.csv")

# Check columns
print(df.columns)

# Preview first rows
print(df.head())

# Check for missing values
print(df.isnull().sum())


## Data Cleaning & Preprocessing

In [None]:

# Filter countries of interest
countries_of_interest = ['Kenya', 'United States', 'India']
df_filtered = df[df['location'].isin(countries_of_interest)]

# Drop rows with missing critical data
df_filtered = df_filtered.dropna(subset=['date', 'total_cases'])

# Convert 'date' column to datetime
df_filtered['date'] = pd.to_datetime(df_filtered['date'])

# Handle missing values
df_filtered = df_filtered.interpolate()


## Data Analysis & Key Insights

In [None]:

# Compute death rate
df_filtered['death_rate'] = df_filtered['total_deaths'] / df_filtered['total_cases']


## Visualizations

### Total Cases Over Time

In [None]:

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
for country in countries_of_interest:
    subset = df_filtered[df_filtered['location'] == country]
    plt.plot(subset['date'], subset['total_cases'], label=country)
plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


### Total Deaths Over Time

In [None]:

plt.figure(figsize=(12, 6))
for country in countries_of_interest:
    subset = df_filtered[df_filtered['location'] == country]
    plt.plot(subset['date'], subset['total_deaths'], label=country)
plt.title('Total COVID-19 Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Total Deaths')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


### Daily New COVID-19 Cases

In [None]:

plt.figure(figsize=(12, 6))
for country in countries_of_interest:
    subset = df_filtered[df_filtered['location'] == country]
    plt.plot(subset['date'], subset['new_cases'], label=country)
plt.title('Daily New COVID-19 Cases')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


## Choropleth Visualization

In [None]:

import plotly.express as px

# Prepare dataframe with iso_code and total_cases
latest_by_country = df_filtered.sort_values('date').groupby('location').tail(1)
map_df = latest_by_country[['iso_code', 'location', 'total_cases']]

# Choropleth for total cases
fig = px.choropleth(
    map_df,
    locations='iso_code',
    color='total_cases',
    hover_name='location',
    color_continuous_scale='Reds',
    title='Global COVID-19 Total Cases (Latest)',
    projection='natural earth'
)
fig.show()


## Summary & Key Insights


### Key Insights:
1. **USA has the highest total cases**, followed by India and Brazil.
2. **India saw the largest daily increase** in cases during the Delta variant wave.
3. **Kenya has the lowest vaccination rate** among the selected countries.
4. **Death rates correlated with vaccine rollout**, with delayed vaccination leading to higher mortality.


## Conclusion & Future Steps


Discuss any next steps, such as extending the analysis to more countries, or using other datasets (e.g., hospitalizations, vaccine efficacy).
