# Interactive Maps Exploring Relationships Between Fermented Vegetables and Covid-19 Mortality Rates

The original [paper](https://www.medrxiv.org/content/10.1101/2020.07.06.20147025v1) suggests that low COVID-19 death rates at the country level were linked to high fermented vegetable consumption in Europe. However, this conclusion was based on data from June 2020, an early stage of the three-year pandemic. I aim to explore whether this finding holds with the latest data using interactive maps. Additionally, I will visualize longitudinal trends in death rates or absolute death numbers.

## Load modules

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.io as pio
import kaleido # This package is required to save the map as a static image

## Load and preprocess epidemiological data

### Covid 19 mortality and population data

In [4]:
# Load Covid-19 death data
covid_death_df = pd.read_csv('time_series_covid19_deaths_global.csv')

In [5]:
covid_death_df.head() 

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,2/28/23,3/1/23,3/2/23,3/3/23,3/4/23,3/5/23,3/6/23,3/7/23,3/8/23,3/9/23
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,3598,3598,3598,3598,3598,3598,3598,3598,3598,3598
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,6881,6881,6881,6881,6881,6881,6881,6881,6881,6881
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,165,165,165,165,165,165,165,165,165,165
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,1933,1933,1933,1933,1933,1933,1933,1933,1933,1933


The Covid-19 death data contains geographic information in the first four columns, followed by daily death counts. I will aggregate the data to get the total number of deaths and death rates per country. 

In [6]:
# Aggregate daily death counts to get total death count
covid_death_df['Total Deaths'] = covid_death_df.iloc[:, 4:].sum(axis=1) 

In [7]:
covid_death_df['Total Deaths'].describe() 

count    2.890000e+02
mean     1.529348e+07
std      6.166454e+07
min      0.000000e+00
25%      2.474700e+04
50%      5.184140e+05
75%      5.035519e+06
max      7.138772e+08
Name: Total Deaths, dtype: float64

In [8]:
# Inspect Country/Region column
covid_death_df['Country/Region'].value_counts()

Country/Region
China             34
Canada            16
United Kingdom    15
France            12
Australia          8
                  ..
Guinea             1
Guinea-Bissau      1
Guyana             1
Haiti              1
Zimbabwe           1
Name: count, Length: 201, dtype: int64

In [9]:
# Inspect Province/State column
covid_death_df['Province/State'].value_counts()

Province/State
Australian Capital Territory    1
Saint Pierre and Miquelon       1
Reunion                         1
New Caledonia                   1
Mayotte                         1
                               ..
Gansu                           1
Fujian                          1
Chongqing                       1
Beijing                         1
Turks and Caicos Islands        1
Name: count, Length: 91, dtype: int64

There are multiple states or provinces within a country in the data. I will aggregate the data to the country level.

In [10]:
# Aggregate total deaths by country 
covid_death_country_df = covid_death_df.groupby('Country/Region')['Total Deaths'].sum().reset_index() 

In [11]:
# Sort countries by total deaths in descending order 
covid_death_country_df = covid_death_country_df.sort_values(by='Total Deaths', ascending=False) 

In [12]:
covid_death_country_df.head()

Unnamed: 0,Country/Region,Total Deaths
186,US,713877215
24,Brazil,488181000
80,India,364921237
117,Mexico,241085189
147,Russia,220983590


In [13]:
covid_death_country_df.tail()

Unnamed: 0,Country/Region,Total Deaths
5,Antarctica,0
185,Tuvalu,0
197,Winter Olympics 2022,0
170,Summer Olympics 2020,0
76,Holy See,0


In [14]:
covid_death_country_df.rename(columns={'Country/Region': 'Region'}, inplace=True)

The dataframe `covid_death_country_df` contains aggregated COVID-19 deaths at the country level from January 2020 to March 2023, used for the following visualization.

### Fermented vegetable consumption data

In [17]:
# Read in fermented vegetable consumption data in xlsx format
food_df = pd.read_excel('Foodex 2 L4 dashboard.xlsx', skiprows=2)

In [18]:
food_df.head()

Unnamed: 0,Survey's country,Survey start year,Survey name,Population Group (L2),Exposure hierarchy (L1),Exposure hierarchy (L2),Exposure hierarchy (L3),Exposure hierarchy (L4),Number of subjects,Number of consumers,Mean,Standard Deviation,5th percentile,10th percentile,Median,95th percentile,97.5th percentile,99th percentile,Comment
0,Austria,2010,Austrian Study on Nutritional Status 2010-12 -...,Adults,Vegetables and vegetable products,Processed or preserved vegetables and similar,Fermented or pickled vegetables,Fermented vegetables,308,11,2.383117,12.625241,0,0,0,0.0,65.5,65.5,
1,Austria,2010,Austrian Study on Nutritional Status 2010-12 -...,Elderly,Vegetables and vegetable products,Processed or preserved vegetables and similar,Fermented or pickled vegetables,Fermented vegetables,67,4,4.044776,16.201789,0,0,0,65.5,65.5,69.9775,"'Number of observations lower than 180, the 97..."
2,Austria,2010,Austrian Study on Nutritional Status 2010-12 -...,Very elderly,Vegetables and vegetable products,Processed or preserved vegetables and similar,Fermented or pickled vegetables,Fermented vegetables,25,3,7.86,21.723892,0,0,0,65.5,65.5,65.5,"'Number of observations lower than 60, the 95t..."
3,Austria,2010,Austrian Study on Nutritional Status 2010-12 -...,Adolescents,Vegetables and vegetable products,Processed or preserved vegetables and similar,Fermented or pickled vegetables,Fermented vegetables,237,8,1.485232,8.153679,0,0,0,0.0,43.666667,43.666667,"'Number of observations lower than 298, the 99..."
4,Austria,2010,Austrian Study on Nutritional Status 2010-12 -...,Other children,Vegetables and vegetable products,Processed or preserved vegetables and similar,Fermented or pickled vegetables,Fermented vegetables,128,2,0.463542,3.722254,0,0,0,0.0,0.0,28.64,"'Number of observations lower than 180, the 97..."


In [21]:
# Investigate countries, years, and population columns
food_df.rename(columns={"Survey's country": 'Country'}, inplace=True) 
food_df['Country'].value_counts()

Country
Netherlands                    15
France                         10
Latvia                         10
Germany                        10
Poland                          9
Austria                         8
Romania                         8
Hungary                         8
Estonia                         8
Belgium                         7
Croatia                         7
Serbia                          7
Montenegro                      6
Sweden                          6
Finland                         5
Slovenia                        5
United Kingdom                  4
Czechia                         3
Bosnia and Herzegovina          3
Bulgaria                        2
Portugal                        2
Republic of North Macedonia     2
Greece                          1
Name: count, dtype: int64

In [22]:
# Survey start year 
food_df['Survey start year'].value_counts()

Survey start year
2019    23
2012    17
2017    16
2007    15
2010    10
2014    10
2013     8
2018     8
2003     8
2006     6
2008     5
2021     4
2004     4
2011     3
2000     2
2001     2
2016     2
2015     2
1997     1
Name: count, dtype: int64

In [25]:
# Population 
food_df['Population Group (L2)'].value_counts()

Population Group (L2)
Adults             32
Elderly            24
Adolescents        24
Other children     23
Toddlers           16
Very elderly       13
Pregnant women      7
Vegetarians         4
Infants             2
Lactating women     1
Name: count, dtype: int64

Aggregated daily consumption of fermented vegetables in general population and over time by country.

In [29]:
avg_consumption_country = food_df.groupby(by ='Country')['Mean'].mean().reset_index()

In [31]:
avg_consumption_country.describe() # Summary statistics 

Unnamed: 0,Mean
count,23.0
mean,3.432499
std,3.132373
min,0.012646
25%,0.795268
50%,2.660731
75%,5.7428
max,10.636471


## Reproduce the static map (baseline comparison)