# Exploratory Data Analysis (EDA)

In this notebook, we will perform exploratory data analysis on the COVID-19 dataset. We will calculate basic statistics and visualize trends and patterns in the data.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = pd.read_csv('../data/covid19_data.csv')

# Display the first few rows of the dataset
data.head()

In [2]:
# Calculate basic statistics
statistics = data.describe()
statistics

In [3]:
# Convert date column to datetime format
data['date'] = pd.to_datetime(data['date'])

# Set date as index
data.set_index('date', inplace=True)

# Plotting daily cases over time
plt.figure(figsize=(14, 7))
plt.plot(data['daily_cases'], label='Daily Cases', color='blue')
plt.title('Daily COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Cases')
plt.legend()
plt.grid()
plt.show()

In [4]:
# Plotting daily deaths over time
plt.figure(figsize=(14, 7))
plt.plot(data['daily_deaths'], label='Daily Deaths', color='red')
plt.title('Daily COVID-19 Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Deaths')
plt.legend()
plt.grid()
plt.show()

In [5]:
# Grouping data by region for comparative analysis
grouped_data = data.groupby('region').sum()

# Plotting total cases by region
plt.figure(figsize=(14, 7))
grouped_data['total_cases'].plot(kind='bar')
plt.title('Total COVID-19 Cases by Region')
plt.xlabel('Region')
plt.ylabel('Total Cases')
plt.xticks(rotation=45)
plt.grid()
plt.show()

## Conclusion

In this exploratory data analysis, we calculated basic statistics and visualized trends in COVID-19 cases and deaths over time. We also compared total cases across different regions. Further analysis can be performed to derive more insights.