# Simple Data

The goal of this Notebook is simplify the data to be directly readable, as done in exp001 (this is very similar).

In [1]:
INHABITANTS_GERMANY = 83.2E6 # https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bevoelkerung/Bevoelkerungsstand/_inhalt.html

In [2]:
import pandas as pd
import numpy as np
from datetime import datetime

In [3]:
corona_data = pd.read_csv("../dat/CoronaData.csv", parse_dates=["date"])
weather_data = pd.read_csv("../dat/WeatherData.csv", parse_dates=["date"])

In [4]:
corona_data.head()

Unnamed: 0.1,Unnamed: 0,id_county,name_county,id_state,name_state,cases,deaths,date,cum_cases,cum_deaths
0,0,1001.0,SK Flensburg,1.0,Schleswig-Holstein,0.0,0.0,2020-01-02,0.0,0.0
1,1,1001.0,SK Flensburg,1.0,Schleswig-Holstein,0.0,0.0,2020-01-03,0.0,0.0
2,2,1001.0,SK Flensburg,1.0,Schleswig-Holstein,0.0,0.0,2020-01-04,0.0,0.0
3,3,1001.0,SK Flensburg,1.0,Schleswig-Holstein,0.0,0.0,2020-01-05,0.0,0.0
4,4,1001.0,SK Flensburg,1.0,Schleswig-Holstein,0.0,0.0,2020-01-06,0.0,0.0


In [5]:
weather_data.head()

Unnamed: 0,date,air temperature
0,2020-01-01 01:00:00,-0.452
1,2020-01-01 02:00:00,-0.584
2,2020-01-01 03:00:00,-0.948
3,2020-01-01 04:00:00,-1.08
4,2020-01-01 05:00:00,-1.308


## Simplify data

For this first experiment, we will simplify the data by reducing it to daily precision and germanywide data.

### Corona Data

In [6]:
corona_data_simple = corona_data.groupby("date").sum()[["cases", "deaths"]]
corona_data_simple

Unnamed: 0_level_0,cases,deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-02,1.0,0.0
2020-01-03,0.0,0.0
2020-01-04,0.0,0.0
2020-01-05,0.0,0.0
2020-01-06,0.0,0.0
...,...,...
2022-01-16,33898.0,7.0
2022-01-17,76094.0,25.0
2022-01-18,111752.0,10.0
2022-01-19,91985.0,0.0


In order to be more useful, the data has to be smoothed over seven days

In [7]:
cases = corona_data_simple.cases.to_numpy()
cases_smoothed = np.convolve(cases, np.ones(7), mode="same")
deaths = corona_data_simple.deaths.to_numpy()
deaths_smoothed = np.convolve(deaths, np.ones(7), mode="same")

corona_data_simple["cases_smoothed"] = cases_smoothed / 7
corona_data_simple["deaths_smoothed"] = deaths_smoothed / 7

Now we can calculate the case fatality rate. Since the values are not reliable in the beginning, we will set $cfr = 0$ if there have been few (e.g. 2) new cases that day (this should only occur before the first wave).

In [8]:
cases_smoothed_modified = corona_data_simple["cases_smoothed"].apply(lambda x: x if x > 2 else np.inf)
cfr = corona_data_simple["deaths_smoothed"] / cases_smoothed_modified

In [9]:
corona_data_simple["cfr"] = cfr

In [10]:
corona = corona_data_simple

### Weather data

In [11]:
weather_data

Unnamed: 0,date,air temperature
0,2020-01-01 01:00:00,-0.452
1,2020-01-01 02:00:00,-0.584
2,2020-01-01 03:00:00,-0.948
3,2020-01-01 04:00:00,-1.080
4,2020-01-01 05:00:00,-1.308
...,...,...
17994,2022-01-19 19:00:00,2.048
17995,2022-01-19 20:00:00,2.152
17996,2022-01-19 21:00:00,2.028
17997,2022-01-19 22:00:00,1.884


In [12]:
weather = weather_data.set_index("date").resample("D").asfreq().bfill()

## Combining the data

In [13]:
weather.head()

Unnamed: 0_level_0,air temperature
date,Unnamed: 1_level_1
2020-01-01,-2.128
2020-01-02,-2.128
2020-01-03,0.652
2020-01-04,4.496
2020-01-05,1.748


In [14]:
corona.head()

Unnamed: 0_level_0,cases,deaths,cases_smoothed,deaths_smoothed,cfr
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-02,1.0,0.0,0.142857,0.0,0.0
2020-01-03,0.0,0.0,0.142857,0.0,0.0
2020-01-04,0.0,0.0,0.142857,0.0,0.0
2020-01-05,0.0,0.0,0.142857,0.0,0.0
2020-01-06,0.0,0.0,0.0,0.0,0.0


In [16]:
combined = corona.merge(weather, how="inner", left_index=True, right_index=True)

In [17]:
combined.head()

Unnamed: 0_level_0,cases,deaths,cases_smoothed,deaths_smoothed,cfr,air temperature
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,1.0,0.0,0.142857,0.0,0.0,-2.128
2020-01-03,0.0,0.0,0.142857,0.0,0.0,0.652
2020-01-04,0.0,0.0,0.142857,0.0,0.0,4.496
2020-01-05,0.0,0.0,0.142857,0.0,0.0,1.748
2020-01-06,0.0,0.0,0.0,0.0,0.0,1.296


In [18]:
combined.to_csv("../dat/SimpleCombinedData.csv")