# COVID-19 Data Analysis Project

In this project, we analyze COVID-19 data using Python libraries such as Pandas, Matplotlib, and Seaborn.  
We explore trends in cases, deaths, and vaccination across selected countries.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Optional: Adjust plotting style
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)


## Load Dataset

We use the "Our World in Data" COVID-19 dataset, which contains daily global data on cases, deaths, and vaccinations.


In [2]:
# Load dataset (make sure the CSV file is in the same folder)
df = pd.read_csv("owid-covid-data.csv")

# Preview the data
df.head()


Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,population,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-01-05,0.0,0.0,,0.0,0.0,,...,,37.75,0.5,64.83,0.51,41128772,,,,
1,AFG,Asia,Afghanistan,2020-01-06,0.0,0.0,,0.0,0.0,,...,,37.75,0.5,64.83,0.51,41128772,,,,
2,AFG,Asia,Afghanistan,2020-01-07,0.0,0.0,,0.0,0.0,,...,,37.75,0.5,64.83,0.51,41128772,,,,
3,AFG,Asia,Afghanistan,2020-01-08,0.0,0.0,,0.0,0.0,,...,,37.75,0.5,64.83,0.51,41128772,,,,
4,AFG,Asia,Afghanistan,2020-01-09,0.0,0.0,,0.0,0.0,,...,,37.75,0.5,64.83,0.51,41128772,,,,


In [3]:
# Load dataset (make sure the CSV file is in the same folder)
df = pd.read_csv("owid-covid-data")

# Preview the data
df.head()


FileNotFoundError: [Errno 2] No such file or directory: 'owid-covid-data'

In [None]:
# Load dataset (make sure the CSV file is in the same folder)
df = pd.read_csv("owid-covid-data.csv")

# Preview the data
df.head()


In [None]:
# Import pandas
import pandas as pd

# Load the dataset
df = pd.read_csv("owid-covid-data.csv")

# Show the first 5 rows
df.head()


In [None]:
# Show all column names
print("Column Names:")
print(df.columns)

# Show the shape of the DataFrame
print("\nDataset shape (rows, columns):")
print(df.shape)

# Check for missing values
print("\nMissing Values in Each Column:")
print(df.isnull().sum())


## Filter Data for Specific Countries

We'll analyze data for Kenya, India, and the United States.


In [None]:
# Select countries of interest
countries = ['Kenya', 'India', 'United States']
df_filtered = df[df['location'].isin(countries)]

# Drop rows with missing dates or total cases
df_filtered = df_filtered.dropna(subset=['date', 'total_cases'])

# Convert date column to datetime
df_filtered['date'] = pd.to_datetime(df_filtered['date'])

# Fill other missing values with 0 for safety
df_filtered.fillna(0, inplace=True)


## Total COVID-19 Cases Over Time


In [None]:
for country in countries:
    data = df_filtered[df_filtered['location'] == country]
    plt.plot(data['date'], data['total_cases'], label=country)

plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.show()


Daily New Cases Comparison

In [None]:
for country in countries:
    data = df_filtered[df_filtered['location'] == country]
    plt.plot(data['date'], data['new_cases'], label=country)

plt.title('Daily New COVID-19 Cases')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.legend()
plt.show()


Death Rate Calculation


In [None]:
# Add death rate column
df_filtered['death_rate'] = df_filtered['total_deaths'] / df_filtered['total_cases']

# Plot death rate
for country in countries:
    data = df_filtered[df_filtered['location'] == country]
    plt.plot(data['date'], data['death_rate'], label=country)

plt.title('COVID-19 Death Rate Over Time')
plt.xlabel('Date')
plt.ylabel('Death Rate')
plt.legend()
plt.show()


## Vaccination Rollout Over Time


In [None]:
for country in countries:
    data = df_filtered[df_filtered['location'] == country]
    plt.plot(data['date'], data['total_vaccinations'], label=country)

plt.title('Total COVID-19 Vaccinations Over Time')
plt.xlabel('Date')
plt.ylabel('Total Vaccinations')
plt.legend()
plt.show()


## Choropleth Map of COVID-19 Cases

We'll use Plotly to show the distribution of cases by country on the latest date available.


In [None]:
# Get latest date
latest_date = df['date'].max()

# Prepare data
latest_data = df[df['date'] == latest_date]
map_data = latest_data[['iso_code', 'location', 'total_cases']].dropna()
## Key Insights

1. The United States experienced the highest total cases among the selected countries.
2. India had a large vaccination rollout after April 2021.
3. Kenya’s death rate remained relatively low compared to others.
4. Daily new cases showed distinct spikes that aligned with global COVID-19 waves.
5. Vaccination significantly affected the flattening of the curve in some countries.

## Conclusion

This analysis shows the power of data science in understanding and communicating public health trends. Using open data, we were able to examine and compare how different countries responded to the pandemic.

# Create map
fig = px.choropleth(map_data,
                    locations="iso_code",
                    color="total_cases",
                    hover_name="location",
                    color_continuous_scale="Plasma",
                    title=f'COVID-19 Total Cases by Country as of {latest_date}')
fig.show()


## Key Insights

1. The United States experienced the highest total cases among the selected countries.
2. India had a large vaccination rollout after April 2021.
3. Kenya’s death rate remained relatively low compared to others.
4. Daily new cases showed distinct spikes that aligned with global COVID-19 waves.
5. Vaccination significantly affected the flattening of the curve in some countries.

## Conclusion

This analysis shows the power of data science in understanding and communicating public health trends. Using open data, we were able to examine and compare how different countries responded to the pandemic.
