INTRODUCTION

This project aims to analyze the global impact of the COVID-19 pandemic on health outcomes and socioeconomic status. By examining datasets related to COVID-19 case numbers, deaths, vaccination rates, and socioeconomic indicators such as GDP, we will explore how the pandemic has affected different population groups worldwide. The goal is to identify patterns and provide insights that could inform public health policies and economic recovery efforts.

First Perspective:  
Countries with higher GDP and higher vaccination rates have managed the COVID-19 pandemic more effectively, resulting in lower mortality rates and better health outcomes despite high case numbers.

Arguments:  
Higher GDP Allows for Better Healthcare Infrastructure and Access to Medical Supplies:  
Visualization 1: GDP vs. COVID-19 Deaths (scatter plot)  
Visualization 2: GDP vs. Tests per Thousand (scatter plot)  
Higher Vaccination Rates Reduce Severe Cases and Deaths:  
Visualization 3: Mortality Rates and Vaccinations per 100 (map)  
Visualization 4: Excess Mortality vs. Vaccinations per 100 (scatter plot)

Second perspective:  
Lower-income countries faced greater difficulties in managing the COVID-19 pandemic due to limited healthcare resources and slower vaccine distribution, leading to higher mortality rates.

Arguments:  
Limited healthcare infrastructure and economic instability:  
Visualization 1: GDP vs. COVID-19 Deaths (scatter plot)  
Visualization 2: GDP vs. Tests per Thousand (scatter plot)  
Visualization 5: Mortality Rates and Case Numbers per Country (map)  
Visualization 6: Tests per Thousand vs. Total Cases (scatter Plot)  
Slower vaccine distribution:  
Visualization 3: Mortality Rates and Vaccinations per 100 (map)  
Visualization 4: Excess Mortality vs. Vaccinations per 100 (scatter plot)

Dataset and preprocessing  
The datasets that we use are the OWID Covid-19 dataset and the GDP per capita, PPP in US$ dataset. The Covid-19 dataset contains statistics on COVID-19 for every country, through the years 2020-2024. It has variables such as “Total_death” and “Total_cases”. The second dataset contains the GDP per capita in PPP in US$ per country per year. This means it contains the economic output in US dollars per inhabitant. PPP stands for purchasing power parity and it means the differences between countries have been normalized for differences in purchasing power, to make the comparisons more fair.  
The idea of the dataset is to give a reliable overview of the economic power of the countries per year.

The links for these datasets are:  
OWID Covid-19: [https://ourworldindata.org/coronavirus#deaths-and-cases-our-data-source](https://ourworldindata.org/coronavirus#deaths-and-cases-our-data-source)  
GDP per capita: [https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD](https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD)

We preprocessed these databases by filtering it for the year 2021. Then for specific variable numbers we take the last value per country for 2021. For a few graphs we also filtered out some values in the location variable, we took out continents, the whole world and income categories. Because these values would make outliers in the graphs.


In [2]:
import plotly.express as px
import pandas as pd
import seaborn as sns

file_path = 'GDP-data.csv'
GDPdata = pd.read_csv(file_path, skiprows=4)
file_path = 'owid-covid-data.csv'
CovidData = pd.read_csv(file_path)

GDPdata = GDPdata.rename(columns={'Country Code': 'iso_code'})


CovidData['date'] = pd.to_datetime(CovidData['date'])
CovidData = CovidData[CovidData['date'] == '2020-12-31']

# Join de dataframes op de Date kolom
df = pd.merge(GDPdata, CovidData, on='iso_code', how='inner')

fig1 = px.choropleth(
    df, 
    locations="iso_code",
    color="total_cases_per_million",
    hover_name="Country Name",
    color_continuous_scale=px.colors.sequential.Plasma,
    title="Cases per million by Country"
)

fig1.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
    height=600
)

fig1.show()

fig2 = px.choropleth(
    df, 
    locations="iso_code",
    color="total_deaths_per_million",
    hover_name="Country Name",
    color_continuous_scale=px.colors.sequential.Plasma,
    title="Deaths per million by Country"
)

fig2.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
    height=600
)

fig2.show()

  Country Name iso_code                                 Indicator Name  \
0        Aruba      ABW  GDP per capita, PPP (current international $)   
1  Afghanistan      AFG  GDP per capita, PPP (current international $)   
2       Angola      AGO  GDP per capita, PPP (current international $)   
3      Albania      ALB  GDP per capita, PPP (current international $)   
4      Andorra      AND  GDP per capita, PPP (current international $)   

      Indicator Code  1960  1961  1962  1963  1964  1965  ...  male_smokers  \
0  NY.GDP.PCAP.PP.CD   NaN   NaN   NaN   NaN   NaN   NaN  ...           NaN   
1  NY.GDP.PCAP.PP.CD   NaN   NaN   NaN   NaN   NaN   NaN  ...           NaN   
2  NY.GDP.PCAP.PP.CD   NaN   NaN   NaN   NaN   NaN   NaN  ...           NaN   
3  NY.GDP.PCAP.PP.CD   NaN   NaN   NaN   NaN   NaN   NaN  ...          51.2   
4  NY.GDP.PCAP.PP.CD   NaN   NaN   NaN   NaN   NaN   NaN  ...          37.8   

   handwashing_facilities  hospital_beds_per_thousand  life_expectancy  \
0     

In [4]:
import plotly.express as px
import pandas as pd
import plotly.graph_objects as go
import numpy as np
import seaborn as sns

GDPdata = GDPdata.rename(columns={'Country Code': 'iso_code'})


CovidData['date'] = pd.to_datetime(CovidData['date'])
CovidData = CovidData[CovidData['date'] == '2020-12-31']

# Join de dataframes op de Date kolom
df = pd.merge(GDPdata, CovidData, on='iso_code', how='inner')
df = df.dropna(subset=['total_cases_per_million', 'total_deaths_per_million'])

fig = px.scatter(
    df,
    x="total_cases_per_million",
    y="total_deaths_per_million",
    hover_name="Country Name",
    trendline="ols",
    title="Comparison of Total Cases and Deaths per Million by Country",
    labels={
        "total_cases_per_million": "Total Cases per Million",
        "total_deaths_per_million": "Total Deaths per Million"
    }
)

correlation = df['total_cases_per_million'].corr(df['total_deaths_per_million'])
print(correlation)

# Update layout for better appearance
fig.update_traces(textposition='top center')
fig.update_layout(
    height=600
)

fig.show()

fig = px.scatter_geo(
    df, 
    locations="iso_code",
    size="total_cases_per_million",
    color="total_deaths_per_million",
    hover_name="Country Name",
    size_max=50,
    color_continuous_scale=px.colors.sequential.Plasma,
    title="total cases and deaths per million by Country"
)

fig.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
    height=600
)

fig.show()

0.7246975031268591
