# Covid-19 Data Science Project

Eine Datenanalyse zu Covid-19 Daten. Ziel dieser Datenanalyse ist es ein Modell zu entwickeln, dass die zukünftigen Fallzahlen anhand verschiedener Input-Variablen wie dem Standort, den aktuellen Maßnahmen (Lockdown) und den vorherigen Fallzahlen vorhersagen kann. Zur Analyse werden Regressionsmodelle verwendet. Anschließend wird ein Deep Learning Algorithmus (neuronales Netz) angewendet.

In [57]:
import io
import urllib
import datetime
from datetime import timedelta
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from progressbar import ProgressBar
import time

from datetime import datetime
import warnings
warnings.filterwarnings("ignore")
from wetterdienst.dwd.observations import DwdObservationRequest, DwdObservationPeriod, DwdObservationResolution, DwdObservationParameter, DwdObservationDataset

Historische Daten vom RKI zum Bundesland Baden-Württemberg.

Spaltenbeschreibung: https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0

Weitere Beschreibungen: https://www.bbsr.bund.de/BBSR/DE/forschung/raumbeobachtung/InteraktiveAnwendungen/corona-dashboard/corona-dashboard_einstieg.html

Berechnung der 7-Tage-Inzidenz mittels Meldedatum: https://lua.rlp.de/de/presse/detail/news/News/detail/corona-hinweise-zur-berechnung-der-7-tage-inzidenz/

In [58]:
# # Datendownload
# url_cases_rki = "https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv?where=IdBundesland%20%3E%3D%208%20AND%20IdBundesland%20%3C%3D%208"
# filename_cases = "./RKI_daily.csv"

# urllib.request.urlretrieve(url_cases_rki, filename_cases)

In [59]:
data_all = pd.read_csv("./RKI_daily.csv")

In [60]:
# data_all["Altersgruppe"].unique()

data_all = data_all.sort_values(by=['Landkreis','Meldedatum'])\
            .drop(["ObjectId", "IdBundesland", "Bundesland", "Altersgruppe2", "AnzahlTodesfall", "Geschlecht",\
                    "NeuerFall", "NeuerTodesfall", "Refdatum", "NeuGenesen", "IstErkrankungsbeginn"], axis=1)

mask_age = (data_all["Altersgruppe"] == "A60-A79") | (data_all["Altersgruppe"] == "A80+")
data_all["AnzahlFall>59"] = 0
data_all["AnzahlFall>59"][mask_age] = data_all["AnzahlFall"]
# Alternativ new column mit numpy.where erstellen

In [61]:
anzahl_faelle = data_all["AnzahlFall"].sum()
data_all

Unnamed: 0,Landkreis,Altersgruppe,AnzahlFall,Meldedatum,IdLandkreis,Datenstand,AnzahlGenesen,AnzahlFall>59
183523,LK Alb-Donau-Kreis,A35-A59,1,2020/02/28 00:00:00+00,8425,"19.03.2021, 00:00 Uhr",1,0
181105,LK Alb-Donau-Kreis,A15-A34,1,2020/03/04 00:00:00+00,8425,"19.03.2021, 00:00 Uhr",1,0
183524,LK Alb-Donau-Kreis,A35-A59,1,2020/03/04 00:00:00+00,8425,"19.03.2021, 00:00 Uhr",1,0
183525,LK Alb-Donau-Kreis,A35-A59,1,2020/03/04 00:00:00+00,8425,"19.03.2021, 00:00 Uhr",1,0
183526,LK Alb-Donau-Kreis,A35-A59,1,2020/03/07 00:00:00+00,8425,"19.03.2021, 00:00 Uhr",1,0
...,...,...,...,...,...,...,...,...
177559,SK Ulm,A05-A14,2,2021/03/18 00:00:00+00,8421,"19.03.2021, 00:00 Uhr",0,0
179542,SK Ulm,A15-A34,2,2021/03/18 00:00:00+00,8421,"19.03.2021, 00:00 Uhr",0,0
180404,SK Ulm,A60-A79,1,2021/03/18 00:00:00+00,8421,"19.03.2021, 00:00 Uhr",0,1
180601,SK Ulm,A60-A79,1,2021/03/18 00:00:00+00,8421,"19.03.2021, 00:00 Uhr",0,1


In [62]:
data_all["Meldedatum"] = data_all["Meldedatum"].str.slice(stop=10)
data_all["Meldedatum"] = pd.to_datetime(data_all["Meldedatum"], format='%Y/%m/%d') #- pd.to_timedelta(7, unit='d')

In [63]:
landkreise_id = data_all[["Landkreis", "IdLandkreis"]].drop_duplicates()

In [64]:
data_all = data_all.groupby(['IdLandkreis', pd.Grouper(key='Meldedatum', freq='W-FRI', label="right")])['AnzahlFall', "AnzahlGenesen", "AnzahlFall>59"]\
       .sum()\
       .reset_index()\
       .sort_values(['IdLandkreis', 'Meldedatum'])

data_all['Meldedatum'] = data_all['Meldedatum'] + timedelta(days=1)

In [65]:
data_all = data_all.merge(landkreise_id, left_on='IdLandkreis', right_on='IdLandkreis')
data_all

Unnamed: 0,IdLandkreis,Meldedatum,AnzahlFall,AnzahlGenesen,AnzahlFall>59,Landkreis
0,8111,2020-03-07,5,5,0,SK Stuttgart
1,8111,2020-03-14,78,78,3,SK Stuttgart
2,8111,2020-03-21,310,303,41,SK Stuttgart
3,8111,2020-03-28,275,261,65,SK Stuttgart
4,8111,2020-04-04,237,225,65,SK Stuttgart
...,...,...,...,...,...,...
2345,8437,2021-02-20,40,40,11,LK Sigmaringen
2346,8437,2021-02-27,80,78,14,LK Sigmaringen
2347,8437,2021-03-06,108,96,19,LK Sigmaringen
2348,8437,2021-03-13,161,37,22,LK Sigmaringen


In [66]:
print("Anzahl der Fälle stimmt noch überein? " + str(anzahl_faelle == data_all["AnzahlFall"].sum()))

Anzahl der Fälle stimmt noch überein? True


<br>
Bevölkerungsentwicklung der Stadt- und Landkreise in Baden-Württemberg
https://www.statistik-bw.de/BevoelkGebiet/Bevoelkerung/01035055.tab?R=LA

Bevölkerungsdichte der Stadt- und Landkreise in Baden-Württemberg
https://www.statistik-bw.de/BevoelkGebiet/Bevoelkerung/01515020.tab?R=LA

In [67]:
data_inhab = pd.read_csv("https://www.statistik-bw.de/BevoelkGebiet/Bevoelk_I_D_A_vj.csv",
                 encoding = "ISO-8859-1",
                 sep=";",
                 decimal=",",
                 skiprows=17)
data_inhab = data_inhab[(data_inhab["Amtlicher Gemeindeschlüssel (AGS)"]>1000) &
                        (data_inhab["Amtlicher Gemeindeschlüssel (AGS)"]<10000) &
                        # (data_inhab["Bevölkerung insgesamt"].str.isnumeric())==True &
                        (data_inhab["Stichtag"] == "30.09.2020")]
                         
# Aktuellster Stichtag und alles nur Schätzungen. Also keine Differenzierung notwendig

In [68]:
data_inhab

Unnamed: 0,Kürzel der Regionaleinheit,Amtlicher Gemeindeschlüssel (AGS),Regionalname,Stichtag,Bevölkerung insgesamt,Bevölkerung männlich,Bevölkerung weiblich,Deutsche zusammen,Deutsche männlich,Deutsche weiblich,Ausländer zusammen,Ausländer männlich,Ausländer weiblich
6989,KR,8111,Stadtkreis Stuttgart,30.09.2020,631688,315677,316011,474057,233142,240915,157631,82535,75096
6990,KR,8115,Landkreis Böblingen,30.09.2020,393609,195789,197820,319500,157258,162242,74109,38531,35578
6991,KR,8116,Landkreis Esslingen,30.09.2020,534525,266993,267532,439603,215137,224466,94922,51856,43066
6992,KR,8117,Landkreis Göppingen,30.09.2020,259076,129150,129926,214155,105076,109079,44921,24074,20847
6993,KR,8118,Landkreis Ludwigsburg,30.09.2020,545782,270971,274811,445920,218034,227886,99862,52937,46925
6994,KR,8119,Landkreis Rems-Murr-Kreis,30.09.2020,428007,211651,216356,358065,175040,183025,69942,36611,33331
6995,KR,8121,Stadtkreis Heilbronn,30.09.2020,126241,63617,62624,92994,45529,47465,33247,18088,15159
6996,KR,8125,Landkreis Heilbronn,30.09.2020,346652,174017,172635,294400,145993,148407,52252,28024,24228
6997,KR,8126,Landkreis Hohenlohekreis,30.09.2020,112964,56992,55972,99599,49667,49932,13365,7325,6040
6998,KR,8127,Landkreis Schwäbisch Hall,30.09.2020,197898,99252,98646,173652,86061,87591,24246,13191,11055


In [69]:
data_inhab_perkm = pd.read_csv("https://www.statistik-bw.de/BevoelkGebiet/Bevoelk_I_Flaeche_j.csv",
                 encoding = "ISO-8859-1",
                 sep=";",
                 decimal=",",
                 skiprows=18)

In [70]:
data_inhab_perkm = data_inhab_perkm[(data_inhab_perkm["Amtlicher Gemeindeschlüssel (AGS)"]>1000) &
                        (data_inhab_perkm["Amtlicher Gemeindeschlüssel (AGS)"]<10000) &
                        (data_inhab_perkm["Stichtag"] == "31.12.2019")]
# Aktuellster Stichtag

In [71]:
data_inhab_perkm

Unnamed: 0,Kürzel der Regionaleinheit,Amtlicher Gemeindeschlüssel (AGS),Postleitzahl,Regionalname,Stichtag,Bevölkerung insgesamt,Gemeindegebiet ha,Bevölkerungsdichte EW/km²
4665,KR,8111,X,Stadtkreis Stuttgart,31.12.2019,635911,20733,3067
4666,KR,8115,X,Landkreis Böblingen,31.12.2019,392807,61776,636
4667,KR,8116,X,Landkreis Esslingen,31.12.2019,535024,64128,834
4668,KR,8117,X,Landkreis Göppingen,31.12.2019,258145,64234,402
4669,KR,8118,X,Landkreis Ludwigsburg,31.12.2019,545423,68677,794
4670,KR,8119,X,Landkreis Rems-Murr-Kreis,31.12.2019,427248,85808,498
4671,KR,8121,X,Stadtkreis Heilbronn,31.12.2019,126592,9990,1267
4672,KR,8125,X,Landkreis Heilbronn,31.12.2019,344456,109991,313
4673,KR,8126,X,Landkreis Hohenlohekreis,31.12.2019,112655,77676,145
4674,KR,8127,X,Landkreis Schwäbisch Hall,31.12.2019,196761,148407,133


<br>
Altersstruktur der Stadt- und Landkreise in Ba-Wü
https://www.statistik-bw.de/BevoelkGebiet/Alter/98015200.tab?R=KR237

Download jeweils mittels des Amtlichen Gemeindeschlüssels (AGS)

In [72]:
# pbar = ProgressBar()

# AGS = ( pd.unique(data_inhab_perkm["Amtlicher Gemeindeschlüssel (AGS)"]) )%1000
# data_age = pd.DataFrame(columns = ["Jahr", "Unter20", "Über65", "20bis65", "AGS"])

# for ags in pbar(AGS):
#     temp_data = pd.read_csv("https://www.statistik-bw.de/BevoelkGebiet/Alter/98015200.tab?R=KR" + str(ags) + "&form=csv",
#                  encoding = "ISO-8859-1",
#                  sep=";",
#                  decimal=",",
#                  names = ["Jahr", "Unter20", "Über65", "20bis65"],
#                  usecols=[0,1,2,3],
#                  skiprows = 28,
#                  skipfooter = 20,
#                     engine='python')
#     temp_data["AGS"] = ags
#     data_age = data_age.append(temp_data, ignore_index=True)

# data_age["AGS"] = data_age["AGS"] + 8000

# data_age.to_csv("altersstruktur.csv", index = False)

In [73]:
data_age = pd.read_csv("altersstruktur.csv")
data_age

Unnamed: 0,Jahr,Unter20,Über65,20bis65,AGS
0,2020,109.680,113.716,421.919,8111
1,2021,109.892,114.218,423.461,8111
2,2020,78.538,80.109,236.644,8115
3,2021,78.881,81.539,236.339,8115
4,2020,101.883,111.712,325.572,8116
...,...,...,...,...,...
83,2021,40.311,50.212,126.459,8435
84,2020,56.464,58.085,172.557,8436
85,2021,56.517,59.122,172.422,8436
86,2020,25.482,27.600,78.546,8437


In [74]:
data_age['Bevölkerung gesamt'] = data_age['Unter20'] + data_age['Über65'] + data_age['20bis65']
data_age['Ü65%'] = data_age['Über65']/ data_age['Bevölkerung gesamt'] 
data_age

#data_inhab_perkm1 = data_inhab_perkm.drop(['Kürzel der Regionaleinheit', 'Postleitzahl', 'Regionalname', 'Stichtag', 'Bevölkerung insgesamt', 'Gemeindegebiet ha'], axis = 1)
#data_inhab_perkm1

data_inhab['Ausländer%'] = data_inhab['Ausländer zusammen'].astype(int) / data_inhab['Bevölkerung insgesamt'].astype(int)
data_inhab = data_inhab.drop(['Kürzel der Regionaleinheit', 'Regionalname', 'Stichtag', 'Bevölkerung insgesamt', 'Bevölkerung männlich',\
                            'Bevölkerung weiblich', 'Deutsche männlich', 'Deutsche weiblich',\
                            'Deutsche zusammen', 'Ausländer zusammen', 'Ausländer männlich', 'Ausländer weiblich'], axis = 1)

In [75]:
data_age

Unnamed: 0,Jahr,Unter20,Über65,20bis65,AGS,Bevölkerung gesamt,Ü65%
0,2020,109.680,113.716,421.919,8111,645.315,0.176218
1,2021,109.892,114.218,423.461,8111,647.571,0.176379
2,2020,78.538,80.109,236.644,8115,395.291,0.202658
3,2021,78.881,81.539,236.339,8115,396.759,0.205513
4,2020,101.883,111.712,325.572,8116,539.167,0.207194
...,...,...,...,...,...,...,...
83,2021,40.311,50.212,126.459,8435,216.982,0.231411
84,2020,56.464,58.085,172.557,8436,287.106,0.202312
85,2021,56.517,59.122,172.422,8436,288.061,0.205241
86,2020,25.482,27.600,78.546,8437,131.628,0.209682


In [76]:
data_all = data_all.merge(right= data_age, how = 'outer', right_on = 'AGS', left_on = 'IdLandkreis')
data_all2020 = data_all[(data_all["Meldedatum"].astype(str).str.contains(pat = '2020'))&
                        (data_all["Jahr"].astype(str).str.contains(pat = '2020'))]
data_all2021 = data_all[(data_all["Meldedatum"].astype(str).str.contains(pat = '2021'))&
                        (data_all["Jahr"].astype(str).str.contains(pat = '2021'))]
data_all = pd.concat([data_all2020, data_all2021])
data_all = data_all.drop(['Jahr', 'Unter20', 'Über65', '20bis65', 'AGS'], axis = 1)

data_all = data_all.merge(right = data_inhab, how = 'inner', right_on = 'Amtlicher Gemeindeschlüssel (AGS)', left_on = 'IdLandkreis')
data_all = data_all.drop(['Amtlicher Gemeindeschlüssel (AGS)'], axis = 1)
data_all = data_all.merge(right = data_inhab_perkm, how = 'inner', right_on = 'Amtlicher Gemeindeschlüssel (AGS)', left_on = 'IdLandkreis')
data_all = data_all.drop(['Amtlicher Gemeindeschlüssel (AGS)', 'Kürzel der Regionaleinheit', 'Postleitzahl', 'Regionalname',\
                          'Stichtag', 'Bevölkerung insgesamt', 'Gemeindegebiet ha'], axis = 1)

data_all["Bevölkerung gesamt"] = data_all["Bevölkerung gesamt"] * 1000
data_all["Bevölkerung gesamt"] = data_all["Bevölkerung gesamt"].astype(int)
data_all

Unnamed: 0,IdLandkreis,Meldedatum,AnzahlFall,AnzahlGenesen,AnzahlFall>59,Landkreis,Bevölkerung gesamt,Ü65%,Ausländer%,Bevölkerungsdichte EW/km²
0,8111,2020-03-07,5,5,0,SK Stuttgart,645315,0.176218,0.249539,3067
1,8111,2020-03-14,78,78,3,SK Stuttgart,645315,0.176218,0.249539,3067
2,8111,2020-03-21,310,303,41,SK Stuttgart,645315,0.176218,0.249539,3067
3,8111,2020-03-28,275,261,65,SK Stuttgart,645315,0.176218,0.249539,3067
4,8111,2020-04-04,237,225,65,SK Stuttgart,645315,0.176218,0.249539,3067
...,...,...,...,...,...,...,...,...,...,...
2345,8437,2021-02-20,40,40,11,LK Sigmaringen,131950,0.212952,0.113027,109
2346,8437,2021-02-27,80,78,14,LK Sigmaringen,131950,0.212952,0.113027,109
2347,8437,2021-03-06,108,96,19,LK Sigmaringen,131950,0.212952,0.113027,109
2348,8437,2021-03-13,161,37,22,LK Sigmaringen,131950,0.212952,0.113027,109


In [77]:
print("Klimadaten von der Station Ohlsbach (1602) - mittig von Baden-Württemberg")

request = DwdObservationRequest(
    parameter=[
        DwdObservationParameter.MONTHLY.TEMPERATURE_AIR_200
    ],
    resolution=DwdObservationResolution.MONTHLY,
    period=DwdObservationPeriod.RECENT,
).filter(station_id=(1602, ))

station_data = request.values.all().df

station_data.tail()

  0%|          | 0/1 [00:00<?, ?it/s]

Klimadaten von der Station Ohlsbach (1602) - mittig von Baden-Württemberg


100%|██████████| 1/1 [00:00<00:00,  4.02it/s]


Unnamed: 0,STATION_ID,FROM_DATE,DATASET,TO_DATE,PARAMETER,VALUE,QUALITY
14,1602,2020-10-01 00:00:00+00:00,CLIMATE_SUMMARY,2020-10-31 00:00:00+00:00,TEMPERATURE_AIR_200,11.53,3
15,1602,2020-11-01 00:00:00+00:00,CLIMATE_SUMMARY,2020-11-30 00:00:00+00:00,TEMPERATURE_AIR_200,6.95,3
16,1602,2020-12-01 00:00:00+00:00,CLIMATE_SUMMARY,2020-12-31 00:00:00+00:00,TEMPERATURE_AIR_200,5.07,3
17,1602,2021-01-01 00:00:00+00:00,CLIMATE_SUMMARY,2021-01-31 00:00:00+00:00,TEMPERATURE_AIR_200,2.42,1
18,1602,2021-02-01 00:00:00+00:00,CLIMATE_SUMMARY,2021-02-28 00:00:00+00:00,TEMPERATURE_AIR_200,5.21,1


In [78]:
station_data.FROM_DATE = station_data.FROM_DATE.astype(str).str.slice(stop=7)
data_all["FROM_DATE"] = data_all.Meldedatum.astype(str).str.slice(stop=7)
station_data = station_data.drop(columns=["STATION_ID", "DATASET", "TO_DATE", "PARAMETER", "QUALITY"])

In [79]:
data_all = pd.merge(data_all, station_data, on="FROM_DATE")\
            .sort_values(['IdLandkreis', 'Meldedatum'])\
            .drop(columns=["FROM_DATE"])\
            .rename(columns={"VALUE": "Temperatur"})
data_all

Unnamed: 0,IdLandkreis,Meldedatum,AnzahlFall,AnzahlGenesen,AnzahlFall>59,Landkreis,Bevölkerung gesamt,Ü65%,Ausländer%,Bevölkerungsdichte EW/km²,Temperatur
0,8111,2020-03-07,5,5,0,SK Stuttgart,645315,0.176218,0.249539,3067,7.49
1,8111,2020-03-14,78,78,3,SK Stuttgart,645315,0.176218,0.249539,3067,7.49
2,8111,2020-03-21,310,303,41,SK Stuttgart,645315,0.176218,0.249539,3067,7.49
3,8111,2020-03-28,275,261,65,SK Stuttgart,645315,0.176218,0.249539,3067,7.49
163,8111,2020-04-04,237,225,65,SK Stuttgart,645315,0.176218,0.249539,3067,13.72
...,...,...,...,...,...,...,...,...,...,...,...
2029,8437,2021-01-30,129,128,20,LK Sigmaringen,131950,0.212952,0.113027,109,2.42
2202,8437,2021-02-06,93,93,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21
2203,8437,2021-02-13,44,44,9,LK Sigmaringen,131950,0.212952,0.113027,109,5.21
2204,8437,2021-02-20,40,40,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21


<br>

BIP Daten Download

Letzter Datenstand: 2018

BIP pro Kopf mit Bevölkerungszahlen von 2020. Für eine Schätzung ausreichend

BIP Angabe in Mio. Deswegen BIP * 1000000 

In [80]:
# pbar = ProgressBar()

# AGS = ( pd.unique(data_inhab_perkm["Amtlicher Gemeindeschlüssel (AGS)"]) )%1000
# data_bip = pd.DataFrame(columns = ["BIP", "AGS"])

# for ags in pbar(AGS):
#     condition = 0
#     while (condition < 1):
#         temp_data2 = pd.read_csv("https://www.statistik-bw.de/GesamtwBranchen/VGR/20013001.tab?R=KR" + str(ags) + "&form=csv",
#                      encoding = "ISO-8859-1",
#                      sep=";",
#                      decimal=",",
#                      names = ["BIP"],
#                      usecols=[1],
#                      skiprows = 33,
#                      skipfooter = 3,
#                         engine='python')
#         condition = temp_data2["BIP"].fillna(0).values[0]
        
#     temp_data2["AGS"] = ags
#     data_bip = data_bip.append(temp_data2, ignore_index=True)

# data_bip["AGS"] = data_bip["AGS"] + 8000
# data_bip["BIP"] = data_bip["BIP"] * 1000000
# data_bip["BIP"] = data_bip["BIP"].astype(int)
# data_bip.to_csv("bip.csv", index = False)

In [81]:
data_bip = pd.read_csv("bip.csv")
data_bip

Unnamed: 0,BIP,AGS
0,57369000,8111
1,25988000,8115
2,22779000,8116
3,8894000,8117
4,25522000,8118
5,15081000,8119
6,6993000,8121
7,19019000,8125
8,5676000,8126
9,8231000,8127


In [82]:
data_all = pd.merge(data_all, data_bip, left_on="IdLandkreis", right_on="AGS")\
            .sort_values(['IdLandkreis', 'Meldedatum'])\
            .drop(columns=["AGS"])
data_all["BIPK"] = data_all["BIP"]/data_all["Bevölkerung gesamt"]
data_all = data_all.drop(columns=["BIP"])
data_all

Unnamed: 0,IdLandkreis,Meldedatum,AnzahlFall,AnzahlGenesen,AnzahlFall>59,Landkreis,Bevölkerung gesamt,Ü65%,Ausländer%,Bevölkerungsdichte EW/km²,Temperatur,BIPK
0,8111,2020-03-07,5,5,0,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769
1,8111,2020-03-14,78,78,3,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769
2,8111,2020-03-21,310,303,41,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769
3,8111,2020-03-28,275,261,65,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769
4,8111,2020-04-04,237,225,65,SK Stuttgart,645315,0.176218,0.249539,3067,13.72,88.900769
...,...,...,...,...,...,...,...,...,...,...,...,...
2213,8437,2021-01-30,129,128,20,LK Sigmaringen,131950,0.212952,0.113027,109,2.42,37.953770
2214,8437,2021-02-06,93,93,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770
2215,8437,2021-02-13,44,44,9,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770
2216,8437,2021-02-20,40,40,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770


<br>

Beschäftigte in Gastgewerbe & Gesundheits-/Sozialwesen (2020)

https://www.statistik-bw.de/Arbeit/Beschaeftigte/0302317x.tab?R=KR111&form=csv

<br>

In [83]:
data_all["BIPK"].astype(str)

0       88.90076939169244
1       88.90076939169244
2       88.90076939169244
3       88.90076939169244
4       88.90076939169244
              ...        
2213    37.95377036756347
2214    37.95377036756347
2215    37.95377036756347
2216    37.95377036756347
2217    37.95377036756347
Name: BIPK, Length: 2218, dtype: object

In [84]:
# pbar = ProgressBar()

# AGS = ( pd.unique(data_inhab_perkm["Amtlicher Gemeindeschlüssel (AGS)"]) )%1000
# data_besch = pd.DataFrame(columns = ["AGS", "Bgast", "Bgesund"])

# for ags in pbar(AGS):
#     condition = 0
#     while (condition < 1):
#         temp_data3 = pd.read_csv("https://www.statistik-bw.de/Arbeit/Beschaeftigte/0302317x.tab?R=KR" + str(ags) + "&form=csv",
#                      encoding = "ISO-8859-1",
#                      sep=";",
#                      decimal=",",
#                      names = ["TEMPCOL"],
#                      usecols=[2],
#                      skiprows = 12,
#                      skipfooter = 8,
#                      dtype = {
#                          "TEMPCOL": str
#                      },
#                      engine='python')
#         condition = temp_data3["TEMPCOL"].astype(float).fillna(0).values[4]
        
# data_besch["AGS"] = data_besch["AGS"] + 8000
# data_besch["Bgast"] = data_besch["Bgast"].str.replace(".", "")
# data_besch["Bgesund"] = data_besch["Bgesund"].str.replace(".", "")
# data_besch.to_csv("besch.csv", index = False)

# data_besch

In [85]:
data_besch = pd.read_csv("besch.csv")

In [86]:
data_all = pd.merge(data_all, data_besch, left_on="IdLandkreis", right_on="AGS")\
            .sort_values(['IdLandkreis', 'Meldedatum'])\
            .drop(columns=["AGS"])
data_all

Unnamed: 0,IdLandkreis,Meldedatum,AnzahlFall,AnzahlGenesen,AnzahlFall>59,Landkreis,Bevölkerung gesamt,Ü65%,Ausländer%,Bevölkerungsdichte EW/km²,Temperatur,BIPK,Bgast,Bgesund
0,8111,2020-03-07,5,5,0,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955
1,8111,2020-03-14,78,78,3,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955
2,8111,2020-03-21,310,303,41,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955
3,8111,2020-03-28,275,261,65,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955
4,8111,2020-04-04,237,225,65,SK Stuttgart,645315,0.176218,0.249539,3067,13.72,88.900769,12014,47955
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2213,8437,2021-01-30,129,128,20,LK Sigmaringen,131950,0.212952,0.113027,109,2.42,37.953770,1086,7365
2214,8437,2021-02-06,93,93,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365
2215,8437,2021-02-13,44,44,9,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365
2216,8437,2021-02-20,40,40,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365


<br>

Google Trends zur Suche "Corona" - Standarisiert von 0 - 100

CSV Download über: https://trends.google.de/trends/explore?date=2020-01-01%202021-03-18&geo=DE-BW&q=Corona

<br>

In [87]:
data_ggl = pd.read_csv("googleTrendsCorona.csv", skiprows = 2)\
            .rename(columns={"Corona: (Baden-Württemberg)": "GoogleCorona"})
data_ggl["Woche"] = pd.to_datetime(data_ggl["Woche"])
data_ggl["GoogleCorona"] = data_ggl["GoogleCorona"].replace(["<1"], 0)

In [88]:
data_all = pd.merge_asof(data_all.sort_values(['Meldedatum']), data_ggl, left_on="Meldedatum", right_on="Woche", direction="nearest")\
            .sort_values(['IdLandkreis', 'Meldedatum'])\
            .drop(columns=["Woche"])
data_all

Unnamed: 0,IdLandkreis,Meldedatum,AnzahlFall,AnzahlGenesen,AnzahlFall>59,Landkreis,Bevölkerung gesamt,Ü65%,Ausländer%,Bevölkerungsdichte EW/km²,Temperatur,BIPK,Bgast,Bgesund,GoogleCorona
23,8111,2020-03-07,5,5,0,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955,73
44,8111,2020-03-14,78,78,3,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955,100
112,8111,2020-03-21,310,303,41,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955,74
171,8111,2020-03-28,275,261,65,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955,59
180,8111,2020-04-04,237,225,65,SK Stuttgart,645315,0.176218,0.249539,3067,13.72,88.900769,12014,47955,46
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024,8437,2021-01-30,129,128,20,LK Sigmaringen,131950,0.212952,0.113027,109,2.42,37.953770,1086,7365,27
2082,8437,2021-02-06,93,93,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365,33
2090,8437,2021-02-13,44,44,9,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365,27
2141,8437,2021-02-20,40,40,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365,29


Berechnung der Inzidenz: AnzahlFall / Bevölkerung gesamt * 100.000

Anteil der Genesenen pro Landkreis

Anteil der Fälle > 59

In [89]:
data_all['7dInzidenz'] = data_all['AnzahlFall'] /  data_all['Bevölkerung gesamt'] * 100000
data_all['7dInzidenzFall>59'] = data_all['AnzahlFall>59'] / data_all['Bevölkerung gesamt'] * 100000
data_all['AnzahlFall>59%'] = data_all['AnzahlFall>59'] / data_all['AnzahlFall']

In [90]:
data_all['AnzahlGenesenLandkreis'] = data_all.groupby(['Landkreis'])['AnzahlGenesen'].cumsum()
data_all['AnzahlGenesenLandkreis%'] = data_all['AnzahlGenesenLandkreis'] / data_all['Bevölkerung gesamt']
data_all

Unnamed: 0,IdLandkreis,Meldedatum,AnzahlFall,AnzahlGenesen,AnzahlFall>59,Landkreis,Bevölkerung gesamt,Ü65%,Ausländer%,Bevölkerungsdichte EW/km²,Temperatur,BIPK,Bgast,Bgesund,GoogleCorona,7dInzidenz,7dInzidenzFall>59,AnzahlFall>59%,AnzahlGenesenLandkreis,AnzahlGenesenLandkreis%
23,8111,2020-03-07,5,5,0,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955,73,0.774815,0.000000,0.000000,5,0.000008
44,8111,2020-03-14,78,78,3,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955,100,12.087120,0.464889,0.038462,83,0.000129
112,8111,2020-03-21,310,303,41,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955,74,48.038555,6.353486,0.132258,386,0.000598
171,8111,2020-03-28,275,261,65,SK Stuttgart,645315,0.176218,0.249539,3067,7.49,88.900769,12014,47955,59,42.614847,10.072600,0.236364,647,0.001003
180,8111,2020-04-04,237,225,65,SK Stuttgart,645315,0.176218,0.249539,3067,13.72,88.900769,12014,47955,46,36.726250,10.072600,0.274262,872,0.001351
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024,8437,2021-01-30,129,128,20,LK Sigmaringen,131950,0.212952,0.113027,109,2.42,37.953770,1086,7365,27,97.764305,15.157257,0.155039,2699,0.020455
2082,8437,2021-02-06,93,93,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365,33,70.481243,8.336491,0.118280,2792,0.021160
2090,8437,2021-02-13,44,44,9,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365,27,33.345964,6.820765,0.204545,2836,0.021493
2141,8437,2021-02-20,40,40,11,LK Sigmaringen,131950,0.212952,0.113027,109,5.21,37.953770,1086,7365,29,30.314513,8.336491,0.275000,2876,0.021796


<br>
<span style="color:red">
    Aufgabe:
</span>

- Inzidenz-Werte berechnen mit den Bevölkerungszahlen (Oben ist ein Link zum "Wie")
- AnzahlFall>59 prozentual zu AnzahlFall = Fall>59
- GenesenBevölkerung
    - AnzahlGenesen zum Datum aufsummieren
    - Prozentual zur Bevölkerung

<br>

<br>
<span style="color:red">
    Aufgabe:
</span>

- Durchschnittliches Einkommen: Bruttolöhne https://www.statistik-bw.de/GesamtwBranchen/VGR/20023030.tab?R=KR111
- Arbeitslosenquoten: https://www.statistik-bw.de/Arbeit/Arbeitslose/03033022.tab?R=KR111
- Pflegebedürftige: https://www.statistik-bw.de/SozSicherung/Pflege/15163020.tab?R=KR111

<br>

<br>
<span style="color:red">
    Aufgabe:
</span>

- Dummy Schulschließung
- Dummy Gastro-Schließung
- Anzahl private Versammlung
    - Durchschnittlicher Haushalt + x
    - Dokumentieren was wieso
- Anzahl öffentliche Versammlung
- Dummy Veranstaltungen
    - Dokumentieren was das bedeutet
- Dummy Maskenpflicht
- Dummy Einzelhandel-Schließung
- Dummy Ausgangssperre


<br>

<br>
<span style="color:red">
    Aufgabe:
</span>

- Zwei Tabellen
    - Eine Tabelle für Visualisierungen, Regressionsanalysen usw.
    - Eine Tabelle standardisiert für Neuronales Netz


<br>

<br>
<span style="color:green">
    Beispiele:
</span>

- https://www.kaggle.com/docxian/covid-19-tracking-germany
- https://www.kaggle.com/mreverybody/covid19-germany-deutschland-rki-data
- Beispiele für Tabellen-Manipulation: https://datascience-enthusiast.com/R/pandas_datatable.html
- 