# Interactive Maps Exploring Relationships Between Fermented Vegetables and Covid-19 Mortality Rates

The original [paper](https://www.medrxiv.org/content/10.1101/2020.07.06.20147025v1) suggests that low COVID-19 death rates at the country level were linked to high fermented vegetable consumption in Europe. However, this conclusion was based on data from June 2020, an early stage of the three-year pandemic. I aim to explore whether this finding holds with the latest data using interactive maps. Additionally, I will visualize longitudinal trends in death rates or absolute death numbers.

## Load modules

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.io as pio
import kaleido # This package is required to save the map as a static image

## Load and preprocess epidemiological data

### Covid 19 mortality and population data

In [None]:
# Load Covid-19 death data
covid_death_df = pd.read_csv('time_series_covid19_deaths_global.csv')

In [3]:
covid_death_df.head() 

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,2/28/23,3/1/23,3/2/23,3/3/23,3/4/23,3/5/23,3/6/23,3/7/23,3/8/23,3/9/23
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,3598,3598,3598,3598,3598,3598,3598,3598,3598,3598
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,6881,6881,6881,6881,6881,6881,6881,6881,6881,6881
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,165,165,165,165,165,165,165,165,165,165
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,1933,1933,1933,1933,1933,1933,1933,1933,1933,1933


The Covid-19 death data contains geographic information in the first four columns, followed by daily death counts. I will aggregate the data to get the total number of deaths and death rates per country. 

In [None]:
# Aggregate daily death counts to get total death count
covid_death_df['Total Deaths'] = covid_death_df.iloc[:, 4:].sum(axis=1) 

count    2.890000e+02
mean     3.058697e+07
std      1.233291e+08
min      0.000000e+00
25%      4.949400e+04
50%      1.036828e+06
75%      1.007104e+07
max      1.427754e+09
Name: Total Deaths, dtype: float64

In [6]:
covid_death_df['Total Deaths'].describe() 

count    2.890000e+02
mean     3.058697e+07
std      1.233291e+08
min      0.000000e+00
25%      4.949400e+04
50%      1.036828e+06
75%      1.007104e+07
max      1.427754e+09
Name: Total Deaths, dtype: float64

In [7]:
# Inspect Country/Region column
covid_death_df['Country/Region'].value_counts()

Country/Region
China             34
Canada            16
United Kingdom    15
France            12
Australia          8
                  ..
Guinea             1
Guinea-Bissau      1
Guyana             1
Haiti              1
Zimbabwe           1
Name: count, Length: 201, dtype: int64

In [10]:
# Inspect Province/State column
covid_death_df['Province/State'].value_counts()

Province/State
Australian Capital Territory    1
Saint Pierre and Miquelon       1
Reunion                         1
New Caledonia                   1
Mayotte                         1
                               ..
Gansu                           1
Fujian                          1
Chongqing                       1
Beijing                         1
Turks and Caicos Islands        1
Name: count, Length: 91, dtype: int64

There are multiple states or provinces within a country in the data. I will aggregate the data to the country level.

In [11]:
# Aggregate total deaths by country 
covid_death_country_df = covid_death_df.groupby('Country/Region')['Total Deaths'].sum().reset_index() 

In [None]:
# Sort countries by total deaths in descending order 
covid_death_country_df = covid_death_country_df.sort_values(by='Total Deaths', ascending=False) 

In [15]:
covid_death_country_df.head()

Unnamed: 0,Country/Region,Total Deaths
186,US,1427754430
24,Brazil,976362000
80,India,729842474
117,Mexico,482170378
147,Russia,441967180


In [16]:
covid_death_country_df.tail()

Unnamed: 0,Country/Region,Total Deaths
5,Antarctica,0
185,Tuvalu,0
197,Winter Olympics 2022,0
170,Summer Olympics 2020,0
76,Holy See,0


In [17]:
covid_death_country_df.rename(columns={'Country/Region': 'Region'}, inplace=True)

The dataframe `covid_death_country_df` contains total COVID-19 deaths at the country level from January 2020 to March 2023, used for the following visualization.

### Fermented vegetable consumption data