## Covid-19 Data Project Overview:


The Covid-19 Pandemic had a profound worldwide impact, leading to extensive loss of life and significant disruptions to people's daily lives. It is essential to analyze Covid-19 data to anticipate potential outcomes in future pandemics and evaluate the genuine effectiveness of vaccines in protecting human lives. Utilizing data analysis can provide valuable insights into this critical phenomenon. In the following sections, we will apply these analytical skills for this purpose. It's important to emphasize that these datasets have been anonymized to ensure that the privacy of individuals is respected.

### Essential information we can derive from global Covid-19 data:

* Exploring the available information
* Preparing data for analysis
* Assessing the number of Covid-19 cases by country
* Determining the proportion of fully vaccinated individuals in each continent
* Evaluating the effects of vaccination 

### Analyzing Covid-19 Data Using Python 

I utilized a Covid-19 dataset sourced from the `Our World in Data Organization`. This dataset receives regular updates, ensuring that we have access to up-to-date information for our analysis.

**Analyzing the data sets:**


In [7]:
import numpy as np 
import pandas as pd 
import matplotlib as plt
from matplotlib.animation import FuncAnimation
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
# from sklearn.model_selection import train_test_split
# from sklearn.linear_model import LinearRegression 
# from sklearn.metrics import mean_squared_error
# from sklearn.ensemble import GradientBoostingRegressor


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

df_d = pd.read_csv('/Users/zoewalp/Desktop/DataScienceDataSets/CovidDeaths1a.csv')
df_v = pd.read_csv('/Users/zoewalp/Desktop/DataScienceDataSets/CovidVaccinations1a.csv')


**Looking at each data table:**

In [8]:
df_d.head()


Unnamed: 0,iso_code,continent,location,date,population,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,...,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million
0,AFG,Asia,Afghanistan,2020-01-03,41128772.0,,0.0,,,0.0,...,,,,,,,,,,
1,AFG,Asia,Afghanistan,2020-01-04,41128772.0,,0.0,,,0.0,...,,,,,,,,,,
2,AFG,Asia,Afghanistan,2020-01-05,41128772.0,,0.0,,,0.0,...,,,,,,,,,,
3,AFG,Asia,Afghanistan,2020-01-06,41128772.0,,0.0,,,0.0,...,,,,,,,,,,
4,AFG,Asia,Afghanistan,2020-01-07,41128772.0,,0.0,,,0.0,...,,,,,,,,,,


In [9]:
df_v.head()

Unnamed: 0,iso_code,continent,location,date,total_tests,new_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,...,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,Unnamed: 41,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-01-03,,,,,,,...,,37.746,0.5,64.83,0.511,,,,,
1,AFG,Asia,Afghanistan,2020-01-04,,,,,,,...,,37.746,0.5,64.83,0.511,,,,,
2,AFG,Asia,Afghanistan,2020-01-05,,,,,,,...,,37.746,0.5,64.83,0.511,,,,,
3,AFG,Asia,Afghanistan,2020-01-06,,,,,,,...,,37.746,0.5,64.83,0.511,,,,,
4,AFG,Asia,Afghanistan,2020-01-07,,,,,,,...,,37.746,0.5,64.83,0.511,,,,,


### Meta Data Exploration -- Examining the Schema 

Here I am reviewing the datatypes and noting
any instances of missing data. I will also investigate data dimensions and adjust datatypes when appropriate.

In [None]:
df_d.info()
df_d.describe()

In [None]:
df_v.info()
df_v.describe()

Here are dimensions (rows, columns) of two data sets:

In [None]:
display(df_d.shape)
display(df_v.shape)

Changing datatypes to string and datetime for both datasets:

In [None]:
#vaccine data set
df_v_columns = ['iso_code', 'continent','location','tests_units']
df_v[df_v_columns] = df_v[df_v_columns].astype('string')
print(df_v.dtypes)
df_v_datetime = ['date']
df_v[df_v_datetime]=df_v[df_v_datetime].apply(pd.to_datetime)
display(df_v.dtypes)

#death data set 

df_d_columns = ['iso_code', 'continent','location']
df_d[df_d_columns] = df_d[df_d_columns].astype('string')
print(df_d.dtypes)
df_d_datetime = ['date']
df_d[df_d_datetime]=df_d[df_d_datetime].apply(pd.to_datetime)
display(df_d.dtypes)



Joining the two dataset tables. 

In [None]:
d_v_tables = pd.merge(df_d, df_v, on=['location','date'],how ='inner')
display(d_v_tables)

In [None]:
display(d_v_tables)

### Count of New Covid-19 Deaths by Month:

In [None]:
d_v_tables['month'] = d_v_tables['date'].dt.strftime('%b %Y')
sns.set(style='whitegrid')
plt.figure(figsize=(10,6))
ax=sns.barplot(x='month',y='new_deaths',data=d_v_tables,palette='Set2')
plt.xticks(rotation=45,ha='right')
ax.set_xticklabels(ax.get_xticklabels(), fontdict={'fontsize': 8, 'fontweight': 'normal'})
plt.subplots_adjust(bottom=0.2)

### Count of New Covid-19 Cases by Month:

In [None]:
 
d_v_tables.dropna(subset=['new_cases'],inplace=True)

d_v_tables['new_cases']=pd.to_numeric(d_v_tables['new_cases'])
d_v_tables['month']=d_v_tables['date'].dt.strftime('%b %Y')

sns.set(style='whitegrid')
plt.figure(figsize=(12,8))
ax=sns.barplot(x='month',y='new_cases',data=d_v_tables,palette='Set2')
plt.xticks(rotation=45,ha='right')
ax.set_xticklabels(ax.get_xticklabels(),fontdict={'fontsize':6,'fontweight': 'normal'})
plt.subplots_adjust(bottom=.3)
plt.show()

### Percentage of New Cases By Continent:

In [None]:
continent_cases_sum = d_v_tables.groupby('continent_x')['new_cases'].sum().reset_index()
total_cases = continent_cases_sum['new_cases'].sum()
continent_cases_sum['percentage_cases'] = (continent_cases_sum['new_cases'] / total_cases) * 100
sns.set(style='whitegrid')
plt.figure(figsize=(10, 6))
ax = sns.barplot(x='continent_x', y='percentage_cases', data=continent_cases_sum, color='blue')
ax.set_xlabel('Continent')
ax.set_ylabel('Percentage of New Cases by Continent')
ax.set_title('Percentage of COVID-19 New Cases by Continent')
plt.xticks(rotation=45)
ax.yaxis.set_major_formatter(mticker.PercentFormatter(decimals=2))
plt.tight_layout()
plt.show()

### Percentage New Deaths By Continent:

In [None]:

total_deaths = d_v_tables['new_deaths'].sum()
continent_deaths_sum = d_v_tables.groupby('continent_x')['new_deaths'].sum().reset_index()

continent_deaths_sum['percentage_deaths'] = (continent_deaths_sum['new_deaths'] / total_deaths) * 100
plt.figure(figsize=(10, 6))
ax = sns.barplot(x='continent_x', y='percentage_deaths', data=continent_deaths_sum, color='red')
ax.set_xlabel('Continent')
ax.set_ylabel('Percentage of New Deaths by Continent')
ax.set_title('Percentage of COVID-19 New Deaths by Continent')
plt.xticks(rotation=45)
ax.yaxis.set_major_formatter(mticker.PercentFormatter(decimals=2))
plt.tight_layout()
plt.show()

### Trends of People Fully Vaccinated in the United States, India, France and South Korea:

In [None]:
desired_locations = ['France', 'United States', 'India', 'South Korea']

plt.figure(figsize=(10, 6))


for desired_location in desired_locations:
    current_population = d_v_tables.loc[d_v_tables['location'] == desired_location, 'population'].values[0]
    filtered_data = d_v_tables[(d_v_tables['location'] == desired_location) & (d_v_tables['people_fully_vaccinated'] != '')].copy()
    filtered_data['rnk'] = filtered_data.groupby(['location', 'month'])['people_fully_vaccinated'].rank(method='max', ascending=False)
    ranked_data = filtered_data[(filtered_data['rnk'] == 1) & (filtered_data['location'] == desired_location)].copy()
    percentage_fully_vaccinated = (ranked_data['people_fully_vaccinated'].astype(int) / current_population) * 100
    ranked_data['percentage_fully_vaccinated'] = percentage_fully_vaccinated
    ranked_data['date'] = pd.to_datetime(ranked_data['month'], format='%b %Y')
    ranked_data = ranked_data.sort_values(by='date')
    
    plt.plot(ranked_data['month'], ranked_data['percentage_fully_vaccinated'],marker='o', label=desired_location)

plt.xlabel('Month')
plt.ylabel('Percentage of People Fully Vaccinated (%)')
plt.title('Percentage of People Fully Vaccinated Over Time')
plt.xticks(rotation=45)
plt.gca().yaxis.set_major_formatter(mticker.ScalarFormatter(useMathText=True, useOffset=False))
plt.legend()  

plt.tight_layout()
plt.show()

### Visualization of Vaccine Distribution Over Time Across Continents with a Stacked Area Chart: 

In [None]:
vaccination_data = d_v_tables.groupby(['date','continent_x'])['new_vaccinations'].sum().reset_index()
pivot_vaccination_data=vaccination_data.pivot(index='date', columns='continent_x',values='new_vaccinations')
plt.figure(figsize=(10,6))
pivot_vaccination_data.plot(kind='line',linewidth=1)
plt.xlabel('Date')
plt.ylabel('Number of New Vaccinations')
plt.title('Distribution of New Vaccinations Over Time By Continent')
plt.legend()
plt.tight_layout()
plt.xticks(rotation=45)
plt.show()

### SIGNIFICANT FINDINGS

Through the successful completion of this project, I gained valuable insights into the global and country-specific effects of Covid-19. These insights were achieved through exploratory data analysis and visualization techniques, allowing for a thorough examination of data distributions and the identification of correlations among various Covid-19 variables.

### Summaries of Discoveries 
* **Deaths per Month:** The data indicates that the highest number of monthly deaths occurred in January 2021, prior to widespread vaccine distribution.
* **Cases per Month:** The early part of 2022 saw the highest number of newly recorded cases, with January registering 50,000 cases worldwide.
* **New Vaccinations Count:** Between May 2021 and December 2021, the majority of global vaccinations were administered, ranging from 1,000,000 to 1,600,000 vaccinations.
* **Percentage of Cases By Contintent** Asia reported the highest percentage of Covid-19 cases among all continents, which can be attributed to its dense population.
* **Percentage of Deaths By Continent** Europe experienced more Covid-19 deaths than Asia. Asia's success in suppressing the virus was attributed to effective non-pharmaceutical interventions, including quarantines, mask use, physical distancing, and widespread testing, as highlighted in the article titled 'Reasons for Asia-Pacific Success in Suppressing Covid-19' from the World Happiness Report.
* **Comparing Vaccinations in US, France, India and South Korea:** France witnessed a surge in vaccinations from April to September 2021. In the US, vaccination rates declined starting in May 2021. India's vaccination rates continued until January 2022 and then stabilized. South Korea initiated vaccinations in the summer of 2021, with a sharp rise, eventually reaching approximately 90% full vaccination by December 2021.
* **Vaccinations By Continent:** Most continents commenced mass vaccinations from July 2021 to January 2022. The US initiated vaccinations in January 2021.