-
Notifications
You must be signed in to change notification settings - Fork 11
Truth data
Seven-day hospitalization incidences have replaced case incidence rates as the central indicator for the steering of the pandemic in Germany (see here for a media article from September 2021). Description of this indicator can be found
- in this summary on the website of the German Federal Ministry of Health (in German)
- in [this FAQ] on the website of RKI (question "Wie wird die 7-Tage-Hospitalisierungsinzidenz berechnet und was ist bei der Bewertung zu berücksichtigen?")
As the actual date of hospitalization is often not available, the official seven-day hospitalization incidences are aggregated by the so-called Meldedatum, which is typically the date at which a person has first been been tested positively for COVID-19. This means that the total number of hospitalizations associated with a given date in this data set only becomes known several weeks later (see here for a media article on these delays).
Example: Assume a person gets her first positive test for COVID-19 on 1 October 2021, is hospitalized on 15 October 2021 and the hospitalization gets reported to RKI on 16 October 2021. Then this person/hospitalization will be assigned to the Meldedatum 1 October 2021, and the number of hospitalizations for this date will retrospectively be increased by one.
Robert Koch Institute is publishing seven-day hospitalization incidences on a daily basis via its reports, dashboards and a GitHub repository. The latter contains snapshots of the time series of seven-day hospitalization incidences since April 2021 and serves as our data source. Note that we use absolute numbers (seven-day sums; column 7T_Hospitalisierung_Faelle
) rather than numbers per 100,000 population. All data are available stratified by location and age group.
The definition of the nowcast targets can be found in the entry Targets.
The RKI GitHub repository contains 7-day sums of hospitalizations by Meldedatum (i.e., for a Monday, a sum over the period from the preceding Tuesday through the respective Monday is reported; on Tuesday the sum from the preceding Wednesday through the respective Tuesday is reported etc.). After reformatting, these rolling sums are stored in the folder rolling-sum
.
To obtain data in a daily resolution (i.e. actual daily values rather than seven-day sums), we deconvolved the data and stored the results in the folder deconvoluted
.
From these files we then created a "reporting triangle" in (COVID-19_hospitalizations.csv
). Each row shows the initially reported count of hospitalizations on a given Meldedatum ("0d_value"
) and increments of the respective value on subsequent days (e.g., "value_2d"
represents by how many the number of hospitalizations for a given Meldedatum increased between the first and second day after the Meldedatum). The data set contains these values for the following 80 days; hospitalizations added to the record with a delay of more than 80 days are reported in the column "value_>80d"
. Note that the row sum over "value_0d"
, "value_1d"
, ..., "value_80d
", "value_>80d"
is the current number of all reported hospitalizations for a given Meldedatum. Note: These increments are only a proxy for hospitalizations reported with a given delay. Retrospective downward corrections of the RKI data can even lead to occasional negative values.
We provide a second, pre-processed version of the data in COVID-19_hospitalizations_preprocessed.csv
. Here, we re-distributed negative values to the preceding days with positive counts (e.g., a sequence of values (3, 2, -1) would become (3, 1, 1) and (3, 1, -2) would become (2, 0, 0).