## Data extraction
This script allows to extract data about the number of hospitalised/ICU cases from Sciensano database: https://epistat.wiv-isp.be/covid/

An Excel sheet is updated daily with new data for different regions with the number of new cases and discharges.

Data is reported as showed in: https://epistat.sciensano.be/COVID19BE_codebook.pdf

In [1]:
import pandas as pd 
import numpy as np

### Define data source

In [2]:
url = 'https://epistat.sciensano.be/Data/COVID19BE.xlsx'

### Extract data from Excel sheet
Hospitalisation data is found in the sheet named *HOSP*.

In [3]:
df = pd.read_excel(url, sheet_name="HOSP")
print(df)

          DATE        PROVINCE    REGION  NR_REPORTING  TOTAL_IN  \
0   2020-03-15       Antwerpen  Flanders            14        50   
1   2020-03-15        Brussels  Brussels            14        58   
2   2020-03-15         Hainaut  Wallonia            15        56   
3   2020-03-15         Limburg  Flanders             7        20   
4   2020-03-15           Liège  Wallonia            12        22   
..         ...             ...       ...           ...       ...   
413 2020-04-21           Namur  Wallonia             6       184   
414 2020-04-21  OostVlaanderen  Flanders            14       597   
415 2020-04-21   VlaamsBrabant  Flanders             6       242   
416 2020-04-21   BrabantWallon  Wallonia             2        47   
417 2020-04-21  WestVlaanderen  Flanders            11       574   

     TOTAL_IN_ICU  TOTAL_IN_RESP  TOTAL_IN_ECMO  NEW_IN  NEW_OUT  
0               9              4              0       8        8  
1              11              8              0  

### Extract and lump cases in Flanders

In [4]:
df.loc[df['REGION'] == 'Flanders']

Unnamed: 0,DATE,PROVINCE,REGION,NR_REPORTING,TOTAL_IN,TOTAL_IN_ICU,TOTAL_IN_RESP,TOTAL_IN_ECMO,NEW_IN,NEW_OUT
0,2020-03-15,Antwerpen,Flanders,14,50,9,4,0,8,8
3,2020-03-15,Limburg,Flanders,7,20,6,3,0,9,3
7,2020-03-15,OostVlaanderen,Flanders,14,16,5,1,0,5,1
8,2020-03-15,VlaamsBrabant,Flanders,6,14,2,0,0,2,0
10,2020-03-15,WestVlaanderen,Flanders,11,19,3,1,0,6,2
...,...,...,...,...,...,...,...,...,...,...
407,2020-04-21,Antwerpen,Flanders,14,747,170,133,4,39,76
410,2020-04-21,Limburg,Flanders,7,360,77,60,1,22,30
414,2020-04-21,OostVlaanderen,Flanders,14,597,136,91,2,28,55
415,2020-04-21,VlaamsBrabant,Flanders,6,242,56,29,4,17,17


### Extract initial number of cases

In [5]:
df.loc[df['DATE'] == df['DATE'][0]]

Unnamed: 0,DATE,PROVINCE,REGION,NR_REPORTING,TOTAL_IN,TOTAL_IN_ICU,TOTAL_IN_RESP,TOTAL_IN_ECMO,NEW_IN,NEW_OUT
0,2020-03-15,Antwerpen,Flanders,14,50,9,4,0,8,8
1,2020-03-15,Brussels,Brussels,14,58,11,8,0,7,2
2,2020-03-15,Hainaut,Wallonia,15,56,13,11,1,26,1
3,2020-03-15,Limburg,Flanders,7,20,6,3,0,9,3
4,2020-03-15,Liège,Wallonia,12,22,2,1,0,4,1
5,2020-03-15,Luxembourg,Wallonia,3,4,0,0,0,3,0
6,2020-03-15,Namur,Wallonia,6,2,1,1,0,0,0
7,2020-03-15,OostVlaanderen,Flanders,14,16,5,1,0,5,1
8,2020-03-15,VlaamsBrabant,Flanders,6,14,2,0,0,2,0
9,2020-03-15,BrabantWallon,Wallonia,2,5,2,2,0,1,0


### Initial cases of hospitalisation and ICU for all regions

In [6]:
df[df['DATE'] == df['DATE'][0]].loc[:,['TOTAL_IN','TOTAL_IN_ICU']].sum()

TOTAL_IN        266
TOTAL_IN_ICU     54
dtype: int64

### Sum cases for all regions and resample

In [7]:
df.loc[:,['DATE','TOTAL_IN','TOTAL_IN_ICU']].resample('D', on='DATE').sum()

Unnamed: 0_level_0,TOTAL_IN,TOTAL_IN_ICU
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-03-15,266,54
2020-03-16,370,79
2020-03-17,497,100
2020-03-18,650,131
2020-03-19,844,165
2020-03-20,1099,228
2020-03-21,1384,290
2020-03-22,1646,322
2020-03-23,1883,385
2020-03-24,2140,474


### Convert to list and save initial date

In [8]:
initial = df.astype(str)['DATE'][0]
print(initial)

2020-03-15


In [9]:
hospital = df.loc[:,['DATE','TOTAL_IN']]
hospital = hospital.resample('D', on='DATE').sum()
hospital = np.array([hospital.loc[:,'TOTAL_IN'].tolist()])
print(hospital)

[[ 266  370  497  650  844 1099 1384 1646 1883 2140 2721 3077 3650 4089
  4480 4897 4989 5220 5378 5513 5531 5620 5759 5715 5616 5636 5663 5437
  5441 5554 5532 5331 5181 5088 4892 4940 4996 4765]]


In [10]:
ICUvect = np.array([df.loc[:,['DATE','TOTAL_IN','TOTAL_IN_ICU']].resample('D', on='DATE').sum().loc[:,'TOTAL_IN_ICU'].tolist()])
print(ICUvect)

[[  54   79  100  131  165  228  290  322  385  474  612  690  789  867
   942 1021 1088 1144 1205 1245 1261 1267 1260 1276 1285 1278 1262 1232
  1234 1226 1204 1182 1140 1119 1081 1071 1079 1020]]
