In this Notebook, we prepare the Proximus data into a usable mobility matrix at the level of postal codes, municipalities, arrondissements and provinces.

- Question: how should we interpret the foreigners that go abroad? The current matrix for December 21st says that there are roughly 200k people that come from abroad and go abroad. What does that mean?
- Question: how should we handle the -1 values? Is there a clever way to go about?
    - Add 0
    - Add some value between 0 and 30 (such as 10)
    - Add a random value between 0 and 30 every time
    - Extrapolate the actual value from the values at all previous days: if for some days the value goes above 30, the average value is expected to be close to 30 as well, whilst the opposite is true if the value is _always_ at -1.
- Question: how should we handle multiple visits per day (i.e. people coming from place A, visiting place B and C, and returning to place A)
- Question: the raw data is not a square matrix, so some postal codes are missing. This is not an issue after aggregation into municipalities
- Problem: Herstappe (PC 3717, NIS 73028) is not in the data (only 88 inhabitants). How to handle this?
- Question: _our_ list of postal codes is 1147 PCs long. The Internet claims there are 1169 PCs in Belgium (including special PCs for e.g. military bases). We stick with the 1147?
- Problem: There is no data for a number of small PC regions, such as Herstappe (PC 3717)
- Question: what is considered a *visit*? In corona times, people are still staying connected, but perhaps not leaving their house. How is this counted? Something is definitely visible, because the overall number of visits clearly goes down, especially in the first wave.
- Problem: there is no data for who goes abroad in the first few weeks (so also no baseline mobility to be calculated there).

# Load packages

In [52]:
# import pandas as pd
import numpy as np
import glob
%matplotlib notebook
import matplotlib.pyplot as plt
import sys
import datetime
# sys.path.insert(0, "../tools")
from covid19model.data.mobility import * # contains all necessary functions
from covid19model.visualization.utils import moving_avg

# OPTIONAL: Load the "autoreload" extension so that package code can change
%load_ext autoreload
# OPTIONAL: always reload modules so that as you change code in src, it gets loaded
%autoreload 2

**Legend**

- _mllp_postalcode_ : Postalcode (PC) of the most likely living place (MLLP) of the Proximus client
- _postalcode_ : Visited postalcode (when device had consecutive transactions on the same cell in the PC for at least 15 min)
- _imsisinpostalcode_ : Number of users having that PC as MLLP 
- _habitants_ : Reference data for that PC 
- _nrofimsi_ : Number of users (proximus only) that were in the PC, -1 if < 30 (we can only report about groups >= 30, for GDPR reasons)
- _visitors_ : extrapolated value of the users, -1 if < 30 (we can only report about groups >= 30, for GDPR reasons)
- _est_staytime_ : total time spent (in seconds) by all users from *mllp_postalcode* in *postalcode*
- _total_est_staytime_ : total registered time spend by all users from *mllp_postalcode* on the network 
- _est_staytime_perc_ : _est_staytime_ / _total_est_staytime_ * 100%: This almost provides the mobility matrix that we need, but not quite.

We are mainly interested in _nrofimsi_ , because we want to know how many people from PC _x_ travel to PC _y_. Perhaps it is interesting to, in a next stage, weigh this with respect to the length of stay.

# Load and clean data

NOTE: these private data are _not_ shared on GitHub (.gitignore). Make sure you downloaded the latest version from the S-drive and manually updated the relevant directory.

In [2]:
# Example: data for a single date. Note that the function is made to load several dates at once in a dict.

data_location = "../../data/raw/mobility/proximus/"
date='20201221'
mmprox = load_mobility_proximus(date, data_location)

# Nonessential help functions
def print_date(today):
    print('==========')
    print(today[:4], today[4:6], today[6:])
    print('==========')
    
def visualise_matrix(mmprox, cmap='Wistia', interpolation=None):
    # Note the log scale
    offset=1.01
    raw_matrix=np.log(np.array(mmprox.values, dtype=float)+offset)
    plt.imshow(raw_matrix, cmap=cmap, interpolation=interpolation)
    plt.show()
    
print_date(date)
mmprox[date]

Loaded dataframe for date 20201221.    Loaded dataframe for date 20201221.
2020 12 21


Unnamed: 0_level_0,1000,1020,1030,1040,1050,1060,1070,1080,1081,1082,...,9970,9971,9980,9981,9982,9988,9990,9991,9992,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1000,12934,278,485,695,1069,1458,1264,1493,128,107,...,0,0,0,0,0,0,0,0,0,1080
1020,1063,12727,753,362,496,259,714,728,155,284,...,-1,-1,0,0,0,0,0,-1,0,406
1030,2398,959,25789,2417,1414,536,699,519,115,146,...,0,0,0,0,0,0,-1,-1,0,1432
1040,1597,201,1883,18738,3565,427,395,255,68,93,...,0,0,0,0,0,0,-1,-1,0,2527
1050,2654,238,719,2998,29072,1909,745,326,86,100,...,0,-1,0,0,0,0,-1,0,0,4000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9988,0,0,0,0,0,0,-1,0,0,0,...,86,-1,-1,-1,106,760,45,74,-1,71
9990,-1,-1,-1,-1,-1,-1,-1,0,0,-1,...,-1,-1,34,-1,-1,-1,4764,817,56,145
9991,-1,0,0,0,-1,-1,0,0,0,0,...,30,40,47,-1,-1,-1,812,2623,-1,63
9992,0,0,0,0,-1,0,0,0,0,0,...,-1,-1,-1,-1,-1,-1,87,-1,254,45


In [3]:
# Change the -1 values for visits to values between 1 and 30

mmprox_GDPR = GDPR_replace(mmprox[date])
print_date(date)
mmprox_GDPR

2020 12 21


Unnamed: 0_level_0,1000,1020,1030,1040,1050,1060,1070,1080,1081,1082,...,9970,9971,9980,9981,9982,9988,9990,9991,9992,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1000,12934,278,485,695,1069,1458,1264,1493,128,107,...,0,0,0,0,0,0,0,0,0,1080
1020,1063,12727,753,362,496,259,714,728,155,284,...,7,13,0,0,0,0,0,18,0,406
1030,2398,959,25789,2417,1414,536,699,519,115,146,...,0,0,0,0,0,0,18,2,0,1432
1040,1597,201,1883,18738,3565,427,395,255,68,93,...,0,0,0,0,0,0,2,4,0,2527
1050,2654,238,719,2998,29072,1909,745,326,86,100,...,0,14,0,0,0,0,4,0,0,4000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9988,0,0,0,0,0,0,7,0,0,0,...,86,4,11,2,106,760,45,74,1,71
9990,4,4,5,3,6,1,1,0,0,4,...,9,3,34,6,9,7,4764,817,56,145
9991,3,0,0,0,8,4,0,0,0,0,...,30,40,47,6,14,3,812,2623,7,63
9992,0,0,0,0,2,0,0,0,0,0,...,2,12,4,3,5,5,87,3,254,45


In [4]:
# Add missing postal codes in rows and columns

mmprox_complete = fill_missing_pc(mmprox_GDPR)
print_date(date)
mmprox_complete

2020 12 21


Unnamed: 0_level_0,1000,1020,1030,1040,1050,1060,1070,1080,1081,1082,...,3831,5572,5589,6986,7504,7533,7543,7783,8952,9403
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1000,12934,278,485,695,1069,1458,1264,1493,128,107,...,0,0,0,0,0,0,0,0,0,0
1020,1063,12727,753,362,496,259,714,728,155,284,...,0,0,0,0,0,0,0,0,0,0
1030,2398,959,25789,2417,1414,536,699,519,115,146,...,0,0,0,0,0,0,0,0,0,0
1040,1597,201,1883,18738,3565,427,395,255,68,93,...,0,0,0,0,0,0,0,0,0,0
1050,2654,238,719,2998,29072,1909,745,326,86,100,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7543,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7783,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8572,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8952,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# Spatial aggregation

In [5]:
# Municipality level

agg='mun'
mmprox_mun = mm_aggregate(mmprox_complete, agg=agg)
print_date(date)
mmprox_mun

2020 12 21


Unnamed: 0_level_0,11001,11002,11004,11005,11007,11008,11009,11013,11016,11018,...,92141,92142,93010,93014,93018,93022,93056,93088,93090,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11001,4604,1692,3,441,7,3,3,168,21,101,...,0,5,1,0,0,0,0,0,0,66
11002,1085,184517,429,360,1539,2388,464,2584,76,615,...,13,49,12,22,0,31,5,45,20,3148
11004,36,831,4549,7,123,12,20,151,5,11,...,4,8,0,0,0,0,0,0,0,43
11005,465,1040,14,5271,3,5,3,79,3,67,...,0,16,4,0,0,0,0,0,0,74
11007,6,1533,99,9,2525,1,2,47,3,1,...,0,0,0,2,0,0,2,0,0,55
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93022,0,17,0,0,0,0,0,0,0,0,...,42,39,11,49,35,4462,364,532,14,63
93056,0,16,0,0,0,2,8,0,0,0,...,9,15,100,636,362,358,3795,307,177,190
93088,0,19,0,0,0,0,0,0,0,0,...,17,28,223,120,4,637,562,8882,4,176
93090,0,9,0,0,0,0,0,0,0,7,...,3,18,3,845,125,61,322,49,2890,233


In [6]:
# logarithmic visualisation of mobility matrix
visualise_matrix(mmprox_mun)

<IPython.core.display.Javascript object>

In [7]:
# Arrondissement level

agg='arr'
mmprox_arr = mm_aggregate(mmprox_complete, agg=agg)
mmprox_arr

Unnamed: 0_level_0,11000,12000,13000,21000,23000,24000,25000,31000,32000,33000,...,73000,81000,82000,83000,84000,85000,91000,92000,93000,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11000,484919,15117,11719,4135,7297,5579,1810,3092,315,756,...,1677,169,934,1274,1075,253,1281,975,332,6932
12000,19430,156002,6866,2631,9054,8801,998,905,99,356,...,709,161,366,608,400,189,709,552,117,1155
13000,13750,6403,208094,1602,2860,6601,678,1292,144,301,...,1323,115,461,667,431,119,382,327,89,3144
21000,4148,2116,1428,578204,56528,6081,14347,2122,186,386,...,919,633,956,1397,1510,901,2283,3274,773,20656
23000,8256,9167,2543,76497,377460,14763,16572,3064,268,448,...,1115,361,922,1514,1468,395,1917,3041,595,4616
24000,7187,9420,6843,9016,17441,275774,6599,1937,236,396,...,2515,226,677,890,725,247,1125,1771,272,1852
25000,1906,765,629,23702,18128,7183,307113,1417,54,217,...,392,710,852,1619,1858,942,2629,10611,892,4288
31000,1947,546,768,1180,1870,1079,566,134880,943,695,...,233,61,226,490,321,142,546,357,84,1342
32000,337,98,137,149,290,91,40,1914,17391,1473,...,24,1,18,85,49,18,69,70,18,156
33000,574,153,213,371,713,363,139,758,1147,52350,...,59,11,31,190,48,3,113,61,43,915


In [8]:
# logarithmic visualisation of mobility matrix
visualise_matrix(mmprox_arr)

<IPython.core.display.Javascript object>

In [9]:
# Province level

agg='prov'
mmprox_prov = mm_aggregate(mmprox_complete, agg=agg)
print_date(date)
mmprox_prov

2020 12 21


Unnamed: 0_level_0,10000,20001,20002,21000,30000,40000,50000,60000,70000,80000,90000,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
10000,922300,40192,3486,8368,18318,29704,6112,9872,20258,7222,4764,11231
20001,43416,685438,23171,85513,18268,36317,19307,13771,21721,7425,8721,6468
20002,3300,25311,307113,23702,4244,2901,22752,10619,1638,5981,14132,4288
21000,7692,62609,14347,578204,7469,7408,14006,8664,3185,5397,6330,20656
30000,13476,11949,2427,4649,669867,40706,20580,4620,4672,4944,4045,6166
40000,36045,39260,4158,11981,45573,837253,16597,8100,6841,7401,5947,7558
50000,5277,23197,28571,19350,25639,15972,999828,13092,1936,8093,30129,15692
60000,4528,10387,9988,8920,6309,3732,13444,871012,12754,17360,19850,12163
70000,21216,23037,1866,3456,6281,6514,2520,15749,438519,2791,2137,7412
80000,1689,2620,2837,2967,2120,1138,5581,15407,855,192296,14824,11972


In [10]:
# logarithmic visualisation of mobility matrix
visualise_matrix(mmprox_prov)

<IPython.core.display.Javascript object>

# All-in-one solution

In [11]:
agg = 'arr'
mmprox_clean = complete_data_clean(mmprox[date], agg=agg)
mmprox_clean

Unnamed: 0_level_0,11000,12000,13000,21000,23000,24000,25000,31000,32000,33000,...,73000,81000,82000,83000,84000,85000,91000,92000,93000,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11000,485351,15129,11502,4191,7398,5699,1916,2972,258,703,...,1534,228,1041,1354,1132,169,1312,1137,315,6926
12000,19377,155813,6811,2749,9189,8814,1114,1006,87,296,...,655,119,392,631,425,144,719,532,87,1162
13000,13722,6296,208229,1372,2823,6686,619,1302,165,224,...,1134,142,470,618,466,119,450,395,104,3135
21000,3966,2114,1559,578192,57085,6125,14193,2085,169,436,...,807,623,1023,1432,1441,989,2381,3560,826,20656
23000,8096,8995,2561,76514,376926,14465,16843,3051,232,525,...,1006,394,815,1344,1384,394,1819,3220,638,4595
24000,6863,9361,7170,8769,17334,275905,6619,2086,231,395,...,2327,242,717,1053,784,262,1017,1884,223,1880
25000,1695,849,618,23913,17939,7227,307099,1322,65,157,...,384,666,898,1522,1782,857,2569,10221,934,4284
31000,1951,596,847,1200,1921,1144,566,134922,1016,725,...,229,70,264,487,304,114,554,374,87,1337
32000,265,75,113,163,389,83,57,1937,17409,1495,...,33,1,49,90,79,9,54,69,36,144
33000,452,151,215,355,734,386,169,696,1166,52438,...,107,20,47,231,51,1,86,79,60,903


# Visualisation

In [12]:
# Visualisation in geopandas
import geopandas as gp

shp_dir = "../../data/raw/GIS/shapefiles/BE/"

# Load different geographical aggregations
country = gp.read_file(shp_dir + "AD_6_Country.shp")
regions = gp.read_file(shp_dir + "AD_5_Region.shp")
provinces = gp.read_file(shp_dir + "AD_4_Province.shp")
arrondissements = gp.read_file(shp_dir + "AD_3_District.shp")
municipalities = gp.read_file(shp_dir + "AD_2_Municipality.shp")

# Add NIS-code to Arrondissement Brussel-Hoofdstad
provinces.loc[provinces['NISCode']=='NA', 'NISCode'] = '21000'

# Create circle denoting foreigners
import shapely.affinity
from shapely.geometry import Point
circle = Point(570000, 600000).buffer(1)  # type(circle)=polygon
radius = 1.3e4
ellipse = shapely.affinity.scale(circle, radius, radius)  # type(ellipse)=polygon

# Add additional column to geopandas dataframes for foreigners, depicted by circle
foreign_mun = pd.DataFrame([[np.nan, np.nan, np.nan, 'Foreigner', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, ellipse]], columns=municipalities.columns)
municipalities = municipalities.append(foreign_mun, ignore_index=True)

foreign_arr = pd.DataFrame([[np.nan, 'Foreigner', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, ellipse]], columns=arrondissements.columns)
arrondissements = arrondissements.append(foreign_arr, ignore_index=True)

foreign_prov = pd.DataFrame([[np.nan, 'Foreigner', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, ellipse]], columns=provinces.columns)
provinces = provinces.append(foreign_prov, ignore_index=True)

## Municipality level

In [17]:
# Add columns with 'traffic to' values
to_NIS = ['44021', '21004', '11002'] # Ghent, Brussels, Antwerp
# to_NIS = ['93014']

# Prepare empty columns
for nis in to_NIS:
    municipalities['Traffic to ' + nis] = 0
    
# Add value to proper column
for nis_from in municipalities['NISCode'].values:
    for nis_to in to_NIS:
        municipalities.loc[municipalities['NISCode']==nis_from,'Traffic to ' + nis_to] = mmprox_mun.loc[nis_from, nis_to]

In [18]:
# Plot connections

from mpl_toolkits.axes_grid1 import make_axes_locatable # for plot aesthetics
from matplotlib import colors

for nis in to_NIS:
    # Make figure
    fig, ax = plt.subplots(figsize = (8,7)) # 1200 pixels x 1200 pixels
    cax = make_axes_locatable(ax).append_axes("right", size="5%", pad=0.1) # Legend properties
    ax.set_axis_off()
    
    vmin=0
    vmax=max(municipalities['Traffic to ' + nis])
    fig = municipalities.plot(column='Traffic to ' + nis, ax=ax, cmap='hot',
                                legend=True, edgecolor = 'gray', norm=colors.SymLogNorm(linthresh=1, linscale=1, vmin=vmin, vmax=vmax), cax=cax)

    textbox = 'Traffic to NIS code ' + nis
    plt.figtext(.15, .25, textbox, backgroundcolor='whitesmoke', fontfamily='monospace', fontsize=14)
#     plt.close()

    # (Create directory and) save figure
    path = '../figures/maps/municipalities/time_series_tests/'
    chart = fig.get_figure()
    # chart.savefig('time-delays_arr_to_' + arr + '.jpg' ,dpi=50, bbox_inches='tight')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Arrondissement level

In [19]:
# visualise in geopandas

# Add columns with 'traffic to' values
to_NIS = ['44000', '21000', '11000'] # Ghent, Brussels, Antwerp

# Prepare empty columns
for nis in to_NIS:
    arrondissements['Traffic to ' + nis] = 0
    
# Add value to proper column
for nis_from in arrondissements['NISCode'].values:
    for nis_to in to_NIS:
        arrondissements.loc[arrondissements['NISCode']==nis_from,'Traffic to ' + nis_to] = mmprox_arr.loc[nis_from, nis_to]

In [20]:
# Plot connections

for nis in to_NIS:
    # Make figure
    fig, ax = plt.subplots(figsize = (8,7)) # 1200 pixels x 1200 pixels
    cax = make_axes_locatable(ax).append_axes("right", size="5%", pad=0.1) # Legend properties
    ax.set_axis_off()
    
    vmin=0
    vmax=max(arrondissements['Traffic to ' + nis])
    fig = arrondissements.plot(column='Traffic to ' + nis, ax=ax, cmap='hot',
                                legend=True, edgecolor = 'gray', norm=colors.SymLogNorm(linthresh=10, linscale=1, vmin=vmin, vmax=vmax), cax=cax)

    textbox = 'Traffic to NIS code ' + nis
    plt.figtext(.15, .25, textbox, backgroundcolor='whitesmoke', fontfamily='monospace', fontsize=14)
    # plt.close()

    # (Create directory and) save figure
    path = '../figures/maps/arrondissements/time_series_tests/'
    chart = fig.get_figure()
    # chart.savefig('time-delays_arr_to_' + arr + '.jpg' ,dpi=50, bbox_inches='tight')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Province level

In [21]:
# visualise in geopandas

# Add columns with 'traffic to' values
to_NIS = ['40000', '21000', '10000'] # Ghent, Brussels, Antwerp

# Prepare empty columns
for nis in to_NIS:
    provinces['Traffic to ' + nis] = 0
    
# Add value to proper column
for nis_from in provinces['NISCode'].values:
    for nis_to in to_NIS:
        provinces.loc[provinces['NISCode']==nis_from,'Traffic to ' + nis_to] = mmprox_prov.loc[nis_from, nis_to]

In [22]:
# Plot connections

for nis in to_NIS:
    # Make figure
    fig, ax = plt.subplots(figsize = (8,7)) # 1200 pixels x 1200 pixels
    cax = make_axes_locatable(ax).append_axes("right", size="5%", pad=0.1) # Legend properties
    ax.set_axis_off()
    
    vmin=0
    vmax=max(provinces['Traffic to ' + nis])
    fig = provinces.plot(column='Traffic to ' + nis, ax=ax, cmap='hot',
                                legend=True, edgecolor = 'gray', norm=colors.SymLogNorm(linthresh=100, linscale=1, vmin=vmin, vmax=vmax), cax=cax)

    textbox = 'Traffic to NIS code ' + nis
    plt.figtext(.15, .25, textbox, backgroundcolor='whitesmoke', fontfamily='monospace', fontsize=14)
    # plt.close()

    # (Create directory and) save figure
    path = '../figures/maps/provinces/time_series_tests/'
    chart = fig.get_figure()
    # chart.savefig('time-delays_arr_to_' + arr + '.jpg' ,dpi=50, bbox_inches='tight')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Temporal aggregation

In [26]:
# TAKES A WHILE TO LOAD

# Select dates of the week the mobility will be averaged over
week_nr = 51
dates=make_date_list(week_nr) # [YYYYMMDD, ...]

# Load data for all these dates
location = '../../data/raw/mobility/proximus/'
mmprox = load_mobility_proximus(dates, data_location)

# Clean data for all dates
agg='mun'
for date in mmprox:
    mmprox[date] = complete_data_clean(mmprox[date], agg=agg)

... proceeding with 6 dates.
Loaded dataframe for date 20201220.    


In [27]:
# Take average of all mobility matrices in the mmprox dictionary
mmprox_avg = average_mobility(mmprox)
mmprox_avg

Unnamed: 0_level_0,11001,11002,11004,11005,11007,11008,11009,11013,11016,11018,...,92141,92142,93010,93014,93018,93022,93056,93088,93090,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11001,4640.500000,1705.833333,11.833333,436.833333,5.500000,7.500000,2.666667,176.666667,6.500000,97.500000,...,0.000000,3.000000,2.833333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,60.833333
11002,1067.500000,191359.333333,540.833333,401.500000,1528.666667,2677.666667,562.500000,2722.000000,94.333333,654.500000,...,23.833333,61.166667,10.333333,35.666667,7.166667,13.166667,24.833333,28.333333,23.333333,2726.000000
11004,35.000000,954.500000,4600.833333,9.666667,155.000000,7.833333,7.666667,176.166667,7.666667,12.500000,...,0.000000,3.666667,0.000000,0.333333,0.666667,0.000000,0.000000,0.166667,0.000000,45.500000
11005,492.666667,1052.333333,7.500000,5269.000000,5.666667,3.000000,3.166667,78.500000,1.833333,79.000000,...,0.000000,3.500000,1.833333,0.000000,0.000000,0.000000,0.000000,0.333333,4.000000,61.166667
11007,7.333333,1609.666667,116.333333,8.833333,2550.666667,12.666667,5.000000,35.833333,4.166667,2.833333,...,3.000000,1.000000,0.000000,0.333333,0.000000,0.000000,1.000000,0.000000,0.000000,31.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93022,0.000000,10.000000,0.000000,0.000000,0.000000,0.166667,0.166667,0.000000,0.000000,0.166667,...,14.666667,22.666667,15.000000,48.166667,44.333333,4523.666667,375.333333,535.000000,11.500000,59.166667
93056,0.000000,7.333333,0.000000,0.333333,0.000000,0.166667,0.000000,1.666667,0.000000,0.000000,...,8.166667,20.500000,106.666667,633.166667,355.833333,367.000000,3823.666667,324.000000,175.666667,176.666667
93088,2.833333,35.333333,0.166667,0.833333,0.000000,0.166667,0.333333,0.000000,1.500000,0.000000,...,16.500000,29.333333,224.666667,106.166667,10.000000,666.166667,670.500000,8920.166667,15.666667,111.166667
93090,0.000000,12.166667,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.166667,3.333333,...,4.666667,13.000000,4.000000,870.833333,130.166667,63.333333,361.000000,41.500000,2888.666667,228.333333


In the cell below, we save processed mobility data to `data/interim/mobility/`. This directory is also put in .gitignore. You may have to create this directory locally first (because it is ignored while uploading to GitHub)

In [29]:
# Save the weekly average as a CSV.
# NOTE: TAKES LONG TIME

year = 2020
agg = 'mun'
mob_type = 'visits'
save_location = "../../data/interim/mobility/" + agg + '/' + mob_type + '/'
save_prefix = "average-mobility_" + agg + "_" + mob_type + '_' + str(year) + "-week"
weeks = range(7,54) # week 7 is the first week with data. Week 53 is the last week of 2020

location = "../../data/raw/mobility/proximus/"

for week in weeks:
    # Load all data
    dates = make_date_list(week, year)
    mmprox_dict_tmp = load_mobility_proximus(dates, data_location)
    for date in mmprox_dict_tmp:
        mmprox_dict_tmp[date] = complete_data_clean(mmprox_dict_tmp[date], agg=agg)
        print(f"Cleaned dataframe for date {date}")
    mmprox_avg = average_mobility(mmprox_dict_tmp)
    # Save all data
    save_name = save_location + save_prefix + str(week)
    mmprox_avg.to_csv(save_name)
    print(f"Saved average of week {week}.")

In [34]:
# Load the weekly average in the proper way (test)

mob_type = 'visits'
week = 7
agg = 'mun'

def load_avg_mobility(week, year=2020, agg='mun'):
    load_location = "../../data/interim/mobility/" + agg + '/' + mob_type + '/'
    load_prefix = "average-mobility_" + agg + "_" + mob_type + '_' + str(year) + "-week"
    load_name = load_location + load_prefix + str(week)
    index_col = 'mllp_postalcode'
    mmprox_avg = pd.read_csv(load_name, index_col=index_col)
    return mmprox_avg

mmprox_loaded = load_avg_mobility(week, agg=agg)

mmprox_loaded

Unnamed: 0_level_0,11001,11002,11004,11005,11007,11008,11009,11013,11016,11018,...,92141,92142,93010,93014,93018,93022,93056,93088,93090,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11001,4323.857143,2319.142857,25.857143,481.142857,2.571429,14.571429,3.142857,237.142857,6.571429,110.142857,...,4.857143,1.142857,0.000000,0.857143,1.142857,0.714286,1.428571,0.285714,0.000000,23.285714
11002,1276.714286,211832.857143,714.000000,488.857143,1814.285714,3144.142857,649.428571,3259.142857,97.428571,762.714286,...,13.285714,83.714286,6.714286,25.857143,9.285714,16.571429,31.285714,13.428571,29.714286,700.857143
11004,37.571429,1651.714286,4515.571429,9.000000,187.285714,12.714286,15.714286,239.142857,8.142857,9.285714,...,1.571429,2.428571,0.000000,2.285714,0.000000,0.000000,3.285714,0.000000,0.571429,18.714286
11005,538.571429,1608.857143,9.285714,5154.285714,5.857143,5.285714,3.142857,129.428571,4.571429,96.428571,...,0.000000,2.857143,0.000000,0.000000,0.000000,5.142857,0.428571,0.000000,0.571429,19.714286
11007,8.857143,2326.428571,169.714286,5.857143,2589.142857,8.285714,3.714286,60.142857,1.571429,7.428571,...,0.000000,0.428571,3.857143,0.142857,0.000000,0.000000,3.714286,0.000000,0.000000,10.571429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93022,0.000000,11.428571,0.000000,0.428571,0.000000,0.857143,0.857143,0.000000,0.000000,0.000000,...,24.571429,26.142857,23.000000,61.285714,34.857143,4473.000000,451.000000,549.714286,13.285714,9.142857
93056,0.000000,21.428571,0.000000,0.571429,0.000000,0.000000,0.571429,0.000000,0.000000,1.000000,...,14.000000,16.142857,128.142857,662.857143,341.571429,414.428571,3714.142857,345.571429,205.714286,23.142857
93088,0.571429,46.000000,0.000000,1.000000,0.000000,1.142857,0.857143,4.428571,3.142857,0.000000,...,22.142857,28.714286,276.142857,138.857143,9.142857,729.857143,788.714286,8749.857143,19.428571,21.571429
93090,0.428571,17.142857,0.000000,0.142857,0.000000,3.857143,0.000000,0.000000,0.000000,0.000000,...,9.857143,8.142857,12.142857,959.857143,155.428571,69.571429,433.428571,47.857143,2761.857143,32.571429


In [35]:
# Aggregate on arr and prov level
# Definitions

def agg_mun_to_arr(mmprox_mun): # copied from mm_aggregate function
    # Rename columns
    mmprox_arr = mmprox_mun.copy()
    for nis in mmprox_arr.columns:
        if nis != 'ABROAD':
            new_nis = nis[:-3] + '000'
            mmprox_arr = mmprox_arr.rename(columns={nis : new_nis})

    # Rename rows
    for nis in mmprox_arr.index:
        if nis != 'Foreigner':
            new_nis = nis[:-3] + '000'
            mmprox_arr = mmprox_arr.rename(index={nis : new_nis})

    # Collect rows and columns with the same NIS code, and automatically order column/row names
    mmprox_arr = mmprox_arr.groupby(level=0, axis=1).sum()
    mmprox_arr = mmprox_arr.groupby(level=0, axis=0).sum().astype(int)
    
    return mmprox_arr

def agg_arr_to_prov(mmprox_arr):
    # Rename columns
    mmprox_prov = mmprox_arr.copy()
    for nis in mmprox_arr.columns:
        if nis not in ['ABROAD', '21000', '23000', '24000', '25000']: # Brussels is '11th province'
            new_nis = nis[:-4] + '0000'
            mmprox_prov = mmprox_prov.rename(columns={nis : new_nis})
        if nis in ['23000', '24000']:
            new_nis = '20001'
            mmprox_prov = mmprox_prov.rename(columns={nis : new_nis})
        if nis == '25000':
            new_nis = '20002'
            mmprox_prov = mmprox_prov.rename(columns={nis : new_nis})

    # Rename rows
    for nis in mmprox_prov.index:
        if nis not in ['Foreigner', '21000', '23000', '24000', '25000']:
            new_nis = nis[:-4] + '0000'
            mmprox_prov = mmprox_prov.rename(index={nis : new_nis})
        if nis in ['23000', '24000']:
            new_nis = '20001'
            mmprox_prov = mmprox_prov.rename(index={nis : new_nis})
        if nis == '25000':
            new_nis = '20002'
            mmprox_prov = mmprox_prov.rename(index={nis : new_nis})

    # Collect rows and columns with the same NIS code, and automatically order column/row names
    mmprox_prov = mmprox_prov.groupby(level=0, axis=1).sum()
    mmprox_prov = mmprox_prov.groupby(level=0, axis=0).sum().astype(int)
    
    return mmprox_prov

In [36]:
# Save the weekly average of arrondissements as a CSV.

year = 2020
save_agg = 'arr'
load_agg = 'mun'
mob_type = 'visits'
save_location = "../../data/interim/mobility/" + save_agg + '/' + mob_type + '/'
save_prefix = "average-mobility_" + save_agg + "_" + mob_type + '_' + str(year) + "-week"

weeks = range(7,54) # week 7 is the first week with data. Week 53 is the last week of 2020

for week in weeks:
    # Load CSV
    mmprox_mun = load_avg_mobility(week, year=year, agg=load_agg)
    # Aggregate to higher level
    mmprox_arr = agg_mun_to_arr(mmprox_mun)
    # Save in proper location
    save_name = save_location + save_prefix + str(week)
    mmprox_arr.to_csv(save_name)
    print(f"Saved arrondissement average of week {week}.  ", end='\r')

Saved arrondissement average of week 53.  

In [38]:
# Save the weekly average of provinces as a CSV.

year = 2020
save_agg = 'prov'
load_agg = 'arr'
mob_type = 'visits'
save_location = "../../data/interim/mobility/" + save_agg + '/' + mob_type + '/'
save_prefix = "average-mobility_" + save_agg + "_" + mob_type + '_' + str(year) + "-week"

weeks = range(7,54) # week 7 is the first week with data. Week 53 is the last week of 2020

for week in weeks:
    # Load CSV
    mmprox_arr = load_avg_mobility(week, year=year, agg=load_agg)
    # Aggregate to higher level
    mmprox_prov = agg_arr_to_prov(mmprox_arr)
    # Save in proper location
    save_name = save_location + save_prefix + str(week)
    mmprox_prov.to_csv(save_name)
    print(f"Saved provincial average of week {week}.  ", end='\r')

Saved provincial average of week 53.  

# Temporal visualisation

In [41]:
# All aggregations are identical: good sanity check.

year=2020
aggs=['mun', 'arr', 'prov']
weeks=np.arange(7,54)
offset=1e5 # offset for visualisation
fontsize=12


figure=plt.figure()
for idx, agg in enumerate(aggs):
    dates=[]
    total_mob=[]
    for week in weeks:
        date = week_to_date(week, day=4) # day=4 for middle of week
        dates.append(date)
        total_mob_weekly = load_avg_mobility(week, year=year, agg=agg).sum().sum()
        total_mob.append(total_mob_weekly + idx*offset)
    plt.plot(dates, total_mob)
    
plt.title('Proximus data aggregated on 3 geographical levels',fontsize=fontsize)
plt.ylabel('Absolute number of visits (>15min)', fontsize=fontsize)
plt.xticks(fontsize=fontsize)
figure.autofmt_xdate(bottom=.2, rotation=25, ha='center', which=None) # Automatic x-tick fix!
figure.tight_layout()

<IPython.core.display.Javascript object>

In [63]:
# Overlay total mobility over hospitalisation data

# Note: takes input from private data

fontsize=12

agg_type = 'prov'
data_file = '../../../COVID19_spatial_private/interim/all_nonpublic_timeseries_' + agg_type + '.csv'

# Load and copy the data file
raw_data = pd.read_csv(data_file, sep=',', header=0,  parse_dates = ['DATE'])
raw_data.fillna(0, inplace=True)

# select and average data
hosp_data = raw_data[['DATE','hospitalised_IN']].groupby(['DATE']).sum()
hosp_data = moving_avg(hosp_data)
hosp_indices = hosp_data.index.values
hosp_values = hosp_data['hospitalised_IN'].values


fig, ax1 = plt.subplots()
ax1.grid(False)

color = 'maroon'
# ax1.set_xlabel('date')
ax1.set_ylabel('absolute number of visits (>15 min)', color=color, fontsize=fontsize)
ax1.plot(dates, total_mob, color=color)
ax1.tick_params(axis='y', labelcolor=color, labelsize=fontsize)
ax1.tick_params(axis='x', labelsize=fontsize)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'olivedrab'
ax2.set_ylabel('New hospitalisations', color=color, fontsize=fontsize)  # we already handled the x-label with ax1
ax2.plot(hosp_indices, hosp_values, color=color)
ax2.tick_params(axis='y', labelcolor=color, labelsize=fontsize)

fig.autofmt_xdate(bottom=.2, rotation=25, ha='center', which=None) # Automatic x-tick fix!
fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Mobility vs. Hospitalisations', fontsize=fontsize)
plt.show()



# figure=plt.figure()
# plt.plot(data.groupby(['DATE']).sum())
# figure.autofmt_xdate(bottom=.2, rotation=25, ha='center', which=None) # Automatic x-tick fix!
# # plt.xticks(rotation=30)
# figure.tight_layout()

<IPython.core.display.Javascript object>

# Animations

Note: this is additional and interesting for visualisation, not core science.

## Varying over destination NIS for fixed date

In [None]:
########################
# Uncomment to execute #
########################

# # Add columns with 'traffic to' values
# to_NIS = mmprox_mun.columns

# municipalities_anim = municipalities.copy()

# # Prepare empty columns
# for nis in to_NIS:
#     municipalities_anim['Traffic to ' + nis] = 0
    
# # Add value to proper column
# for nis_from in municipalities_anim['NISCode'].values:
#     for nis_to in to_NIS:
#         municipalities_anim.loc[municipalities_anim['NISCode']==nis_from,'Traffic to ' + nis_to] = mmprox_mun.loc[nis_from, nis_to]

In [None]:
########################
# Uncomment to execute #
########################

# Plot connections

# from mpl_toolkits.axes_grid1 import make_axes_locatable # for plot aesthetics
# from matplotlib import colors

# vmin=0
# vmax=np.max(np.max(mmprox_mun))
# dpi=200

# for nis in to_NIS:
#     # Make figure
#     fig, ax = plt.subplots(figsize = (8,7)) # 1200 pixels x 1200 pixels
#     cax = make_axes_locatable(ax).append_axes("right", size="5%", pad=0.1) # Legend properties
#     ax.set_axis_off()
    
#     fig = municipalities_anim.plot(column='Traffic to ' + nis, ax=ax, cmap='hot',
#                                 legend=True, edgecolor = 'gray', norm=colors.SymLogNorm(linthresh=1, linscale=1, vmin=vmin, vmax=vmax), cax=cax)

#     textbox = 'Traffic to NIS code ' + nis
#     plt.figtext(.15, .25, textbox, backgroundcolor='whitesmoke', fontfamily='monospace', fontsize=14)
#     plt.close()

#     # (Create directory and) save figure
#     path = '../figures/maps/mun/mobility/'
#     chart = fig.get_figure()
#     chart.savefig(path+'mobility-to-NIS-' + nis + '.jpg' ,dpi=dpi, bbox_inches='tight')
#     print("saving NIS", nis)

## Varying over date for fixed destination NIS

### Straight from the data

In [64]:
# Load data for all available dates 

def get_date_from_file(filename):
    suffix_len = len(proximus_mobility_suffix())
    date_len = 8
    date = filename[-suffix_len-date_len:-suffix_len]
    return date
    
data_location = "../../data/raw/mobility/proximus/"

# ordered list of all available dates
dates_YYYYMMDD=[]
for file in os.listdir(data_location):
    if file.endswith(proximus_mobility_suffix()):
        date = get_date_from_file(file)
        dates_YYYYMMDD.append(date)

In [49]:
# Load all data in a big dictionary
# ## COMMENTED OUT FOR SAFETY ###

# mmprox = load_mobility_proximus(dates_YYYYMMDD, data_location)

Loaded dataframe for date 20210107.    


In [61]:
# Clean all data
# TAKES 4 seconds per date
### COMMENTED OUT FOR SAFETY ###

# agg='mun'
# mmprox_clean = dict({})
# for date in dates_YYYYMMDD:
#     mmprox_clean[date] = complete_data_clean(mmprox[date], agg=agg)
#     print(f"Total clean for date {date}.   ", end= '\r')
# print(f"Total clean for date {date}.")

Total clean for date 20210107.   


In [63]:
# Save this (locally) for future reference, because it takes a while to load/clean everything ...
# Also takes a while to save ...
### COMMENTED OUT FOR SAFETY ###

# save_location = #pick location
# save_prefix = 'çleaned-mobility-data_' + agg + '_'
# for date in dates_YYYYMMDD:
#     save_name = save_location + save_prefix + str(date)
#     mmprox[date].to_csv(save_name)
#     print(f"Saved file {save_prefix + str(date) + '.csv'}   ", end= '\r')
# print(f"Saved file {save_prefix + str(date) + '.csv'}")

Saved file çleaned-mobility-data_mun_20210107.   


In [69]:
# Sanity check: 

import datetime

dates_datetime = [datetime.datetime.strptime(date, '%Y%m%d') for date in dates_YYYYMMDD]
total_sum=[]
for date in dates_YYYYMMDD:
    total_sum.append(mmprox_clean[date].sum().sum())
    
figure=plt.figure()
plt.plot(dates_datetime, total_sum)
figure.autofmt_xdate(bottom=.2, rotation=25, ha='center', which=None) # Automatic x-tick fix!
figure.tight_layout()

Simply the 'overall' mobility is not very informative. We are more interested in actually seeing the difference in mobility play out over the days in a geographically intuitive way ...

In [213]:
# Geopandas animation

########################
# Uncomment to execute #
########################

from mpl_toolkits.axes_grid1 import make_axes_locatable # for plot aesthetics
from matplotlib import colors

chosen_nis = '11002' # Antwerpen

# Add column with 'traffic to' values
municipalities_anim = municipalities.copy()
municipalities_anim['Traffic to ' + chosen_nis] = 0

# Choose limits of legend
vmin=0
vmax=0
for date in dates:
    vmax_tmp = np.max(np.max(mmprox_clean[date]))
    if vmax_tmp > vmax : vmax = vmax_tmp

dpi=200

In [245]:
# Create sequence of geopandas maps for the chosen NIS value. Takes >1 second per image.

########################
# Uncomment to execute #
########################

# Suppress showing the images
plt.ioff()

for idx, date in enumerate(dates):
    # Prepare shapefile
    for nis_from in municipalities_anim['NISCode'].values:
        traffic_to = mmprox_clean[date].loc[nis_from, chosen_nis]
        municipalities_anim.loc[municipalities_anim['NISCode']==nis_from,'Traffic to ' + chosen_nis] = traffic_to
    
    # Make figure
    fig, ax = plt.subplots(figsize = (8,7)) # 800 pixels x 700 pixels
    cax = make_axes_locatable(ax).append_axes("right", size="5%", pad=0.1) # Legend properties
    ax.set_axis_off()
    fig = municipalities_anim.plot(column='Traffic to ' + chosen_nis, ax=ax, cmap='hot',
                                legend=True, edgecolor = 'gray', norm=colors.SymLogNorm(linthresh=1, linscale=1, vmin=vmin, vmax=vmax), cax=cax)

    textbox1 = 'Traffic to NIS code ' + chosen_nis
    textbox2 = 'Date: ' + date
    plt.figtext(.15, .25, textbox1, backgroundcolor='whitesmoke', fontfamily='monospace', fontsize=14)
    plt.figtext(.15, .20, textbox2, backgroundcolor='whitesmoke', fontfamily='monospace', fontsize=14)
    plt.close()

    # (Create directory and) save figure
    path = '../figures/maps/mun/mobility/daily/'
    chart = fig.get_figure()
    savename = 'mobility-to-NIS-' + chosen_nis + '_' + date + '_' + str(idx) + '.jpg'
    chart.savefig(path+savename, dpi=dpi, bbox_inches='tight')
    print("saving " + savename, end='\r')
print("saving " + savename)
    
plt.ion()

saving mobility-to-NIS-11002_20210107_330.jpg


### With moving average

uses `mmprox_clean` defined in previous cells

In [70]:
# This is not really essential but might be interesting to do: moving average
# Take moving average values for every date

import datetime
def date_shift(date, shift=0):
    date_original = datetime.datetime.strptime(date, '%Y%m%d')
    date_shifted = date_original + datetime.timedelta(days=shift)
    date_shifted_str = date_shifted.strftime('%Y%m%d')
    return date_shifted_str

def moving_avg_mobility(mmprox_dict, date, window_size=7, verbose=True):
    # Lean towards the past
    shift_left = int((window_size - window_size%2)/2) # gives 3 for 7, and 4 for 8
    shift_right = int((window_size - (window_size+1)%2)/2) # gives 3 for 7, and 3 for 8
    date_range = range(-shift_left, shift_right+1)
    dates=[date_shift(date, shift=i) for i in date_range]
    
    # Take subset of dates: only dates that are available in mmprox_dict
    dates_available = set(dates).intersection(set(mmprox_dict.keys()))
    if dates_available == set():
        raise Exception(f"Cannot average over empty set around date {date}.")
    if len(dates_available) < 3:
        print(f"Warning: dataframe set around date {date} only contains one or two elements.")
    dates_available = sorted(list(dates_available))
    
    # Calculate moving average
    first=True
    for date in dates_available:
        if first:
            mmprox_avg = mmprox_dict[date]
            first=False
        if not first:
            mmprox_avg = mmprox_avg.add(mmprox_dict[date])
    effective_window_size = len(dates_available)
    mmprox_avg = mmprox_avg / effective_window_size
    if verbose:
        print(f"Calculated moving average over {effective_window_size} dates: {dates_available}.")
    return mmprox_avg

In [72]:
# Create dictionary with moving-average values
mmprox_moving_avg=dict({})
for date in dates_YYYYMMDD:
    mmprox_moving_avg[date] = moving_avg_mobility(mmprox_clean, date, verbose=False)
    print(f"Calculated moving average for date {date}.   ", end= '\r')
print(f"Calculated moving average for date {date}.")

In [73]:
mmprox_clean['20200218'].sum().sum()

In [74]:
# Sanity check: 

dates_datetime = [datetime.datetime.strptime(date, '%Y%m%d') for date in dates]
total_sum_avg=[]
for date in dates:
    total_sum_avg.append(mmprox_moving_avg[date].sum().sum())
    
figure=plt.figure()
plt.plot(dates_datetime, total_sum_avg, color='r', alpha=0.4)
plt.plot(dates_datetime, total_sum)
figure.autofmt_xdate(bottom=.2, rotation=25, ha='center', which=None) # Automatic x-tick fix!
figure.tight_layout()

In [125]:
date='20201221'
shift_left=3
shift_right=3
date_range = range(-shift_left, shift_right+1)
for i in date_range:
    date_shift(date)
# dates=[date_shift(date, shift=i) for i in date_range]

# Sketchbook

## Calculating baseline mobility

We want to know the mobility matrix for the average mobility in (at least) three different time frames:
1. Regular business day (perhaps on day per day basis?)
2. Regular weekend day (perhaps Saturday distinct from Sunday?)
3. Vacation day (ideally also on day-per-day basis)

To achieve these data, we use mobility matrices from before the first national lockdown (18 March. The available data comes from weeks 7 (10 February) to week 11 (i.e. 5 complete weeks)
1. 10-14 February, 17-21 February, 2-6 March, 9-13 March
2. 15-16 February, 22-23 February, 7-8 March, 14-15 March
3. 24 February to 1 March (spring break)

Note: it is quite possible that the final week before lockdown cannot be considered as a regular week.

### Visualising the baseline data

In [400]:
# First load the data and inspect the mobility for this date range
# Takes around ... minutes.

# Set to True to reload data
update=False

from datetime import date, datetime, timedelta
dates_prelockdown=[]
dates_prelockdown_datetime=[]

sdate = datetime(2020, 2, 10, 0, 0)   # start date
edate = datetime(2020, 3, 15, 0, 0)   # end date
delta = edate - sdate       # as timedelta

for i in range(delta.days + 1):
    day = sdate + timedelta(days=i)
    dates_prelockdown_datetime.append(day)
    dates_prelockdown.append(date_to_YYYYMMDD(day))
    
# Load data for all these dates
if update:
    data_location = '../../data/raw/mobility/proximus/'
    mmprox_prelockdown, missing_dates = load_mobility_proximus(dates_prelockdown, data_location, return_missing=True)

# date array for xaxis (takes care of missing dates: needs same number of elements in array!)
dates_prelockdown_xaxis = dates_prelockdown_datetime.copy()
for d in missing_dates:
    dates_prelockdown_xaxis.remove(date_to_YYYYMMDD(d, inverse=True))

# Clean data for all dates at arrondissement level
if update:
    agg='arr'
    for d in mmprox_prelockdown:
        mmprox_prelockdown[d] = complete_data_clean(mmprox_prelockdown[d], agg=agg)
        print(f"cleaned data for date {d}", end='\r')
    print(f"cleaned data for date {d}")

    total_mobility_prelockdown=[]

    for d in mmprox_prelockdown:
        # Save array with available dates in datetime format
        date_prelockdown_datetime = date_to_YYYYMMDD(d, inverse=True)
        # Add total mobility to array
        total_mobility_tmp = mmprox_prelockdown[d].sum().sum()
        total_mobility_prelockdown.append(total_mobility_tmp)

### Calculating the baseline mobility per arrondissement

In [401]:
# Make arrays of relevant days
vacation_days=[]
business_days=[]
weekend_days=[]
vacation_dict=dict({})
business_dict=dict({})
weekend_dict=dict({})

vacation_sdate = datetime(2020, 2, 24, 0, 0)   # start date
for i in range(7):
    day = vacation_sdate + timedelta(days=i)
    day_YYYYMMDD = date_to_YYYYMMDD(day)
    vacation_days.append(day_YYYYMMDD)
    if day_YYYYMMDD in mmprox_prelockdown:
        vacation_dict[day_YYYYMMDD] = mmprox_prelockdown[day_YYYYMMDD]
    
for d in mmprox_prelockdown:
    if d not in vacation_days:
        d_datetime = date_to_YYYYMMDD(d, inverse=True)
        if d_datetime.isoweekday() in [6,7]:
            weekend_days.append(d)
            if d in mmprox_prelockdown:
                weekend_dict[d] = mmprox_prelockdown[d]
        else:
            business_days.append(d)
            if d in mmprox_prelockdown:
                business_dict[d] = mmprox_prelockdown[d]

In [501]:
# Calculate average matrices and average total sums
mmprox_baseline_vacation = average_mobility(vacation_dict)
mmprox_baseline_business = average_mobility(business_dict)
mmprox_baseline_weekend = average_mobility(weekend_dict)

vacation_baseline = mmprox_baseline_vacation.sum().sum()
business_baseline = mmprox_baseline_business.sum().sum()
weekend_baseline = mmprox_baseline_weekend.sum().sum()

Unnamed: 0_level_0,11000,12000,13000,21000,23000,24000,25000,31000,32000,33000,...,73000,81000,82000,83000,84000,85000,91000,92000,93000,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11000,534616.571429,20444.142857,15411.285714,7264.714286,10828.142857,7778.714286,2524.428571,3220.571429,298.857143,834.428571,...,1821.571429,374.428571,1089.714286,1321.285714,1366.714286,293.428571,1463.714286,1372.571429,332.142857,0.0
12000,27830.285714,159161.428571,8923.142857,5123.142857,12837.571429,12274.285714,1398.285714,1179.857143,122.857143,371.571429,...,845.428571,210.428571,540.142857,563.714286,493.0,141.428571,645.0,654.142857,150.428571,0.0
13000,21154.142857,8170.714286,221327.0,2407.142857,3761.0,10189.0,1035.0,1345.571429,132.285714,301.142857,...,1461.0,162.285714,592.285714,627.142857,380.428571,139.0,550.0,540.857143,155.285714,0.0
21000,5305.142857,2599.571429,1749.0,637278.857143,68076.571429,8773.714286,20252.142857,2011.714286,178.428571,491.142857,...,884.428571,688.285714,952.428571,1219.142857,1578.0,1113.714286,2494.0,4170.857143,822.142857,0.0
23000,10318.571429,11615.428571,3451.714286,101033.0,382253.857143,18795.857143,18704.857143,3305.714286,328.571429,898.857143,...,1342.714286,697.142857,901.714286,1480.142857,1628.571429,558.285714,2360.428571,3765.571429,844.857143,0.0
24000,9026.142857,11946.714286,8605.571429,15998.714286,23547.142857,295436.428571,8305.0,2446.285714,247.285714,499.714286,...,2847.571429,408.142857,799.857143,1075.0,982.285714,279.857143,1280.0,1916.571429,279.428571,0.0
25000,2416.857143,926.428571,886.285714,38382.714286,22504.714286,9454.285714,308157.285714,1530.285714,78.428571,238.0,...,547.714286,740.285714,946.0,1488.428571,1879.285714,1014.857143,2968.142857,12524.428571,1039.571429,0.0
31000,2437.142857,718.714286,950.857143,1896.285714,2441.714286,1422.714286,689.857143,141473.0,1200.571429,980.0,...,370.285714,107.142857,337.142857,449.857143,350.142857,97.571429,545.857143,433.428571,128.0,0.0
32000,435.428571,130.142857,238.857143,264.571429,395.285714,268.714286,102.714286,2578.714286,17459.857143,1976.714286,...,49.857143,2.285714,54.428571,75.714286,51.285714,0.142857,51.571429,63.285714,31.714286,0.0
33000,870.285714,232.0,416.142857,609.714286,886.857143,610.285714,167.857143,1068.0,1397.428571,55581.428571,...,102.0,50.571429,118.285714,129.428571,121.428571,19.428571,149.857143,163.428571,39.714286,0.0


In [563]:
# Plotting environment
import matplotlib.dates as mdates
fontsize=12
color='darksalmon'
data_label='daily total mobility'
    
fig, ax1 = plt.subplots()
ax1.grid(False)

ax1.set_ylabel('Absolute number of visits (>15 min)', fontsize=fontsize)
ax1.tick_params(axis='y', labelsize=fontsize)
# ax1.set_xlabel('Date', fontsize=fontsize)
ax1.tick_params(axis='x', labelsize=fontsize)
myFmt = mdates.DateFormatter('%m-%d')
ax1.xaxis.set_major_formatter(myFmt)
ax1.set_xlim([dates_prelockdown_datetime[0]-timedelta(1), dates_prelockdown_datetime[-1]+timedelta(2)])
xticks_prelockdown = dates_prelockdown_datetime[0:-1:7] + [dates_prelockdown_datetime[-1]+timedelta(1)]
ax1.set_xticks(xticks_prelockdown)
ax1.set_ylim(.6e7, 1.2e7)

dates_prelockdown_xaxis_shifted = np.array(dates_prelockdown_xaxis)+timedelta(hours=12)
ax1.plot(dates_prelockdown_xaxis_shifted, total_mobility_prelockdown, label=data_label, color=color)
fig.tight_layout()  # otherwise the right y-label is slightly clipped

plt.title('Daily total Belgian mobility 5 weeks pre-lockdown', fontsize=fontsize)
fig.autofmt_xdate(bottom=.2, rotation=90, ha='center', which=None) # Automatic x-tick fix!

# Add coloured bands
week_colour = 'wheat'
weekend_colour = 'goldenrod'
vacation_colour = 'y'
week_baseline_colour = 'k'
weekend_baseline_colour = 'k'
vacation_baseline_colour = 'k'
alpha=0.5
linewidth=1
baseline_alpha=0.5
label_baseline='baseline total mobility'

# Business days
plt.axvspan(sdate + timedelta(0), sdate + timedelta(5), facecolor=week_colour, alpha=alpha)
plt.plot((sdate + timedelta(0), sdate + timedelta(5)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate + timedelta(7), sdate + timedelta(12), facecolor=week_colour, alpha=alpha)
plt.plot((sdate + timedelta(7), sdate + timedelta(12)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate + timedelta(21), sdate + timedelta(26), facecolor=week_colour, alpha=alpha)
plt.plot((sdate + timedelta(21), sdate + timedelta(26)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate + timedelta(28), sdate + timedelta(33), facecolor=week_colour, alpha=alpha)
plt.plot((sdate + timedelta(28), sdate + timedelta(33)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Weekends
plt.axvspan(sdate + timedelta(5), sdate + timedelta(7), facecolor=weekend_colour, alpha=alpha)
plt.plot((sdate + timedelta(5), sdate + timedelta(7)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate + timedelta(12), sdate + timedelta(14), facecolor=weekend_colour, alpha=alpha)
plt.plot((sdate + timedelta(12), sdate + timedelta(14)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate + timedelta(26), sdate + timedelta(28), facecolor=weekend_colour, alpha=alpha)
plt.plot((sdate + timedelta(26), sdate + timedelta(28)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate + timedelta(33), sdate + timedelta(35), facecolor=weekend_colour, alpha=alpha)
plt.plot((sdate + timedelta(33), sdate + timedelta(35)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Vacation days
plt.axvspan(sdate + timedelta(14), sdate + timedelta(19), facecolor=vacation_colour, alpha=alpha)
plt.plot((sdate + timedelta(14), sdate + timedelta(19)), (vacation_baseline, vacation_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate + timedelta(19), sdate + timedelta(21), facecolor=vacation_colour, alpha=alpha)
plt.plot((sdate + timedelta(19), sdate + timedelta(21)), (vacation_baseline, vacation_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Inbetween dotted lines (aesthetical)
plt.plot((sdate + timedelta(5), sdate + timedelta(5)), (business_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha, label=label_baseline)
plt.plot((sdate + timedelta(7), sdate + timedelta(7)), (weekend_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate + timedelta(12), sdate + timedelta(12)), (business_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate + timedelta(14), sdate + timedelta(14)), (weekend_baseline, vacation_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate + timedelta(21), sdate + timedelta(21)), (vacation_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate + timedelta(26), sdate + timedelta(26)), (business_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate + timedelta(28), sdate + timedelta(28)), (weekend_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate + timedelta(33), sdate + timedelta(33)), (business_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Edges
plt.axvspan(sdate + timedelta(35), sdate + timedelta(36), facecolor=week_colour, alpha=alpha)
plt.axvspan(sdate + timedelta(-2), sdate + timedelta(0), facecolor=weekend_colour, alpha=alpha)

plt.legend(fontsize=fontsize)
plt.show()

<IPython.core.display.Javascript object>

**Problems**:

1. Days are clearly different (especially Saturday and Sunday). I need a reference stating which days are similar to which other days.
2. At a small scale (municipality) the data is surely to noisy to find a good baseline
3. When comparing to the baseline value, what to do when pandemic values go _over_ the baseline?
4. The final weekend doesn't really count (people were already holding back due to impending restrictions). Probably better to take this out.
5. Aesthetical: would be nice if the bands are translated 12 hours to the left, but the datetime objects don't easily support that.
6. I am still working with _all kinds_ of transportation. It would be interesting to filter out travel to the home patch, and only sum travel to other patches.
7. Explain very well in the preprint Appendix why it is necessary to take average mobilities and not just actual mobilities. Two reasons: (1) the Proximus data is also just an estimate, so averaging takes out the noise; (2) we want to be able to compare the actual mobility to the baseline mobility in order to estimate the contact behaviour, and comparing on a day-to-day basis definitely gives too much noise (perhaps demonstrate this?)
8. Fractional mobility _also_ goes down for home traffic. How is that?! Depends on the definition of home mobility, certainly.

**Solution**:
1. Don't compare _daily_ traffic because that will surely be too noisy. Instead, compare the average mobility in the particular time slot (business day, weekend day, vacation day) with the average in the same period and fill in this type of mobility.

### Comparing the baseline mobilty with the actual mobility during lockdown

Total _daily mobility_ and _total mobility averaged_ over the relevant time frame (business days, weekends, vacation days) is plotted for 5 weeks: week 12 to 16. Weeks 15 and 16 are Easter vacation weeks. So ...
1. Week 12-14: 16 March to 5 April
2. Week 15-16: 6 April to 19 April

In [456]:
# Again load and clean the data, now for different time period
# Takes about 2 minutes

# Set update to True to reload
update=False

from datetime import date, timedelta, datetime
dates_postlockdown=[]

sdate_postlockdown = datetime(2020, 3, 16, 0, 0)   # start date
edate_postlockdown = datetime(2020, 4, 19, 0, 0)   # end date
delta_postlockdown = edate_postlockdown - sdate_postlockdown    # as timedelta

for i in range(delta_postlockdown.days + 1):
    day = sdate_postlockdown + timedelta(days=i)
    dates_postlockdown.append(date_to_YYYYMMDD(day))
    
# Load data for all these dates
if update:
    data_location = '../../data/raw/mobility/proximus/'
    mmprox_postlockdown, missing_dates = load_mobility_proximus(dates_postlockdown, data_location, return_missing=True)

# date array for xaxis (takes care of missing dates: needs same number of elements in array!)
dates_postlockdown_xaxis = dates_postlockdown_datetime.copy()
for d in missing_dates:
    dates_postlockdown_xaxis.remove(date_to_YYYYMMDD(d, inverse=True))

# Clean data for all dates at arrondissement level
if update:
    agg='arr'
    for d in mmprox_postlockdown:
        mmprox_postlockdown[d] = complete_data_clean(mmprox_postlockdown[d], agg=agg)
        print(f"cleaned data for date {d}", end='\r')
    print(f"cleaned data for date {d}")

In [457]:
def is_weekend(day_YYYYMMDD):
    day_datetime = date_to_YYYYMMDD(day_YYYYMMDD, inverse=True)
    if day_datetime.isoweekday() in [6,7]:
        return True
    else:
        return False

# Not entirely elegant, but OK ...
def is_vacation(day_YYYYMMDD, sday, eday):
    vacation=[]
    now = date_to_YYYYMMDD(sday, inverse=True)
    then = date_to_YYYYMMDD(eday, inverse=True)
    while now <= then:
        vacation.append(date_to_YYYYMMDD(now))
        now += timedelta(1)
    if day_YYYYMMDD in vacation:
        return True
    else:
        return False

In [490]:
vacation_first='20200406'
vacation_last='20200419'

# There must be an easier way ...

d = dates_postlockdown_datetime[0]
mmprox_average_dict = dict({})
while d <= dates_postlockdown_datetime[-1]:
    d_string = date_to_YYYYMMDD(d)
    if is_vacation(d_string, vacation_first, vacation_last):
        mmprox_vacation = 0
        dates_vacation = []
        number_of_days = 0
        while is_vacation(d_string, vacation_first, vacation_last): # First vacation (copy block for every vacation)
            dates_vacation.append(d_string)
            if d_string in mmprox_postlockdown:
                mmprox_vacation += mmprox_postlockdown[d_string]
                number_of_days += 1
            d += timedelta(1) # UPDATE VALUE
            d_string = date_to_YYYYMMDD(d)
        if number_of_days == 0:
            raise Exception(f"No vacation data available between {vacation_first} and {vacation_last}.")
        mmprox_vacation_avg = mmprox_vacation / number_of_days
        for d_tmp in dates_vacation: # Update overall dictionary
            mmprox_average_dict[d_tmp] = mmprox_vacation_avg
    else:
        if is_weekend(d_string):
            mmprox_weekend = 0
            dates_weekend = []
            number_of_days = 0
            while is_weekend(d_string) and not is_vacation(d_string, vacation_first, vacation_last):
                dates_weekend.append(d_string)
                if d_string in mmprox_postlockdown:
                    mmprox_weekend += mmprox_postlockdown[d_string]
                    number_of_days += 1
                d += timedelta(1) # UPDATE VALUE
                d_string = date_to_YYYYMMDD(d)
            if number_of_days == 0:
                raise Exception(f"No weekend data available in weekend right before {d_string}.")
            mmprox_weekend_avg = mmprox_weekend / number_of_days
            for d_tmp in dates_weekend: # Update overall dictionary
                mmprox_average_dict[d_tmp] = mmprox_weekend_avg
        else:
            mmprox_business = 0
            dates_business = []
            number_of_days = 0
            while not is_weekend(d_string) and not is_vacation(d_string, vacation_first, vacation_last):
                dates_business.append(d_string)
                if d_string in mmprox_postlockdown:
                    mmprox_business += mmprox_postlockdown[d_string]
                    number_of_days += 1
                d += timedelta(1) # UPDATE VALUE
                d_string = date_to_YYYYMMDD(d)
            if number_of_days == 0:
                raise Exception(f"No business day data available in week right before {d_string}.")
            mmprox_business_avg = mmprox_business / number_of_days
            for d_tmp in dates_business: # Update overall dictionary. Insert value for EVERY date
                mmprox_average_dict[d_tmp] = mmprox_business_avg

In [495]:
# Save array with total mobility (from raw data), belonging to dates in dates_postlockdown_xaxis
total_mobility_postlockdown=[]

for d in mmprox_postlockdown:
    # Add total mobility to array
    total_mobility_tmp = mmprox_postlockdown[d].sum().sum()
    total_mobility_postlockdown.append(total_mobility_tmp)
    
# Save array with total mobility averaged over particular time slot (processed data)
total_mobility_average=[]

for d in mmprox_average_dict:
    total_mobility_tmp = mmprox_average_dict[d].sum().sum()
    total_mobility_average.append(total_mobility_tmp)

In [423]:
# Save values averaged over relevant time frame

# Initiate arrays of relevant values
averaged_values=[]
vacation_len = 14 # days
business_len = 5
weekend_len = 2

day_idx = 0
while day_idx < len(dates_postlockdown_xaxis):
    day_datetime = dates_postlockdown_xaxis[day_idx]
    day_YYYYMMDD = date_to_YYYYMMDD(day_datetime)
    # Three scenarios
    # Vacation
    if day_YYYYMMDD in easter_vac_days:
        averaged_value = np.mean(total_mobility_postlockdown[day_idx:day_idx+vacation_len])
        averaged_values = np.concatenate((averaged_values, [averaged_value]*vacation_len))
        day_idx += vacation_len
    else:
        # Weekend
        if day_datetime.isoweekday() in [6,7]:
            averaged_value = np.mean(total_mobility_postlockdown[day_idx:day_idx+weekend_len])
            averaged_values = np.concatenate((averaged_values, [averaged_value]*weekend_len))
            day_idx += weekend_len
        # Business day
        if day_datetime.isoweekday() in [1, 2, 3, 4, 5]:
            averaged_value = np.mean(total_mobility_postlockdown[day_idx:day_idx+business_len])
            averaged_values = np.concatenate((averaged_values, [averaged_value]*business_len))
            day_idx += business_len

# plt.plot(dates_postlockdown_datetime, averaged_values)
# plt.plot(dates_postlockdown_datetime, total_mobility_postlockdown, alpha=0.5)

In [560]:
# Plotting environment
import matplotlib.dates as mdates
import datetime
fontsize=12
color_daily='lightsalmon'
alpha_daily=1
color_averaged='darksalmon'
data_label_daily='daily total mobility'
data_label_averaged='averaged total mobility'
    
fig, ax2 = plt.subplots()
ax2.grid(False)

ax2.set_ylabel('Absolute number of visits (>15 min)', fontsize=fontsize)
ax2.tick_params(axis='y', labelsize=fontsize)
# ax2.set_xlabel('Date', fontsize=fontsize)
ax2.tick_params(axis='x', labelsize=fontsize)
myFmt = mdates.DateFormatter('%m-%d')
ax2.xaxis.set_major_formatter(myFmt)
ax2.set_xlim([dates_postlockdown_datetime[0]-datetime.timedelta(1), dates_postlockdown_datetime[-1]+datetime.timedelta(2)])
xticks_postlockdown = dates_postlockdown_datetime[0:-1:7] + [dates_postlockdown_datetime[-1]+datetime.timedelta(1)]
ax2.set_xticks(xticks_postlockdown)
ax2.set_ylim(.6e7, 1.2e7)

dates_postlockdown_datetime_shifted = np.array(dates_postlockdown_datetime)+datetime.timedelta(hours=12)
ax2.plot(dates_postlockdown_datetime_shifted, total_mobility_postlockdown, label=data_label_daily, alpha=alpha_daily, linewidth=1,color=color_daily)
ax2.plot(dates_postlockdown_datetime_shifted, averaged_values, label=data_label_averaged, linewidth=3, color=color_averaged)
fig.tight_layout()  # otherwise the right y-label is slightly clipped

plt.title('Daily total Belgian mobility during lockdown', fontsize=fontsize)
fig.autofmt_xdate(bottom=.2, rotation=90, ha='center', which=None) # Automatic x-tick fix!

# Add coloured bands
week_colour = 'wheat'
weekend_colour = 'goldenrod'
vacation_colour = 'y'
week_baseline_colour = 'k'
weekend_baseline_colour = 'k'
vacation_baseline_colour = 'k'
alpha=0.5
linewidth=1
baseline_alpha=0.5
label_baseline='baseline total mobility'

# Add coloured bands
week_colour = 'wheat'
weekend_colour = 'goldenrod'
vacation_colour = 'y'
week_baseline_colour = 'k'
weekend_baseline_colour = 'k'
vacation_baseline_colour = 'k'
alpha=0.5
linewidth=1
baseline_alpha=0.5
label_baseline='baseline total mobility'

# Business days
plt.axvspan(sdate_postlockdown + datetime.timedelta(0), sdate_postlockdown + datetime.timedelta(5), facecolor=week_colour, alpha=alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(0), sdate_postlockdown + datetime.timedelta(5)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(7), sdate_postlockdown + datetime.timedelta(12), facecolor=week_colour, alpha=alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(7), sdate_postlockdown + datetime.timedelta(12)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(14), sdate_postlockdown + datetime.timedelta(19), facecolor=week_colour, alpha=alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(14), sdate_postlockdown + datetime.timedelta(19)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(28), sdate_postlockdown + datetime.timedelta(33)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Weekends
plt.axvspan(sdate_postlockdown + datetime.timedelta(5), sdate_postlockdown + datetime.timedelta(7), facecolor=weekend_colour, alpha=alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(5), sdate_postlockdown + datetime.timedelta(7)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(12), sdate_postlockdown + datetime.timedelta(14), facecolor=weekend_colour, alpha=alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(12), sdate_postlockdown + datetime.timedelta(14)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(19), sdate_postlockdown + datetime.timedelta(21), facecolor=weekend_colour, alpha=alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(19), sdate_postlockdown + datetime.timedelta(21)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(26), sdate_postlockdown + datetime.timedelta(28)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(33), sdate_postlockdown + datetime.timedelta(35)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Vacation days
plt.axvspan(sdate_postlockdown + datetime.timedelta(21), sdate_postlockdown + datetime.timedelta(26), facecolor=vacation_colour, alpha=alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(26), sdate_postlockdown + datetime.timedelta(28), facecolor=vacation_colour, alpha=alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(28), sdate_postlockdown + datetime.timedelta(33), facecolor=vacation_colour, alpha=alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(33), sdate_postlockdown + datetime.timedelta(35), facecolor=vacation_colour, alpha=alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(21), sdate_postlockdown + datetime.timedelta(35)), (vacation_baseline, vacation_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Inbetween dotted lines (aesthetical)
plt.plot((sdate_postlockdown + datetime.timedelta(5), sdate_postlockdown + datetime.timedelta(5)), (business_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha, label=label_baseline)
plt.plot((sdate_postlockdown + datetime.timedelta(7), sdate_postlockdown + datetime.timedelta(7)), (weekend_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(12), sdate_postlockdown + datetime.timedelta(12)), (business_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(14), sdate_postlockdown + datetime.timedelta(14)), (weekend_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(19), sdate_postlockdown + datetime.timedelta(19)), (weekend_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.plot((sdate_postlockdown + datetime.timedelta(21), sdate_postlockdown + datetime.timedelta(21)), (weekend_baseline, vacation_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Edges
plt.axvspan(sdate_postlockdown + datetime.timedelta(35), sdate_postlockdown + datetime.timedelta(36), facecolor=week_colour, alpha=alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(-2), sdate_postlockdown + datetime.timedelta(0), facecolor=weekend_colour, alpha=alpha)

plt.legend(fontsize=fontsize-2, bbox_to_anchor=(.98, .9), loc='center right')

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x223a1730160>

### Calculating $\pi^{gh}$ per time slot

In the plot above, the total mobility is shown. We are eventually interested not in the total mobility, but in the mobility between regions $g$ and $h$. Interested, in other words in a $G \times G$ matrix with values $\pi^{gh}$ that denote how traffic is _decreased_ compared to the baseline values, i.e. fractions of the values depicted above as resp. the darksalmon-coloured line and the black dashed line.

We need two things:
1. Baseline mobility matrix: $G \times G$ baseline values based on the pre-lockdown period, averaged over _all_ data in the particular type of time frame (business/week/vacation). `mmprox_baseline_*`
2. Actual mobility matrix: baseline values averaged over the _local_ time frame (i.e. just one weekend, or just one business day). `mmprox_average_dict`

In [534]:
# Define the ratio matrix \pi^{gh} for every date in a dictionary

vacation_first='20200406'
vacation_last='20200419'

pi_gh = dict({})
for d in dates_postlockdown:
    if is_vacation(chosen_date, vacation_first, vacation_last):
        pi_gh[d] = mmprox_average_dict[d] / mmprox_baseline_vacation
    elif is_weekend(chosen_date):
        pi_gh[d] = mmprox_average_dict[d] / mmprox_baseline_weekend
    else:
        pi_gh[d] = mmprox_average_dict[d] / mmprox_baseline_business

In [559]:
nis_combos = [('11000', '21000'), ('81000', '21000'), ('72000', '31000'), ('13000', '11000'), ('44000', '44000')]

# Plotting environment
import matplotlib.dates as mdates
import datetime
fontsize=12
color1='dodgerblue'
color2='crimson'
color3='darkolivegreen'
color4='darkgrey'
color5='plum'
colors=[color1, color2, color3, color4, color5]
label1=nis_combos[0][0] + ' to ' + nis_combos[0][1]
label2=nis_combos[1][0] + ' to ' + nis_combos[1][1]
label3=nis_combos[2][0] + ' to ' + nis_combos[2][1]
label4=nis_combos[3][0] + ' to ' + nis_combos[3][1]
label5=nis_combos[4][0] + ' to ' + nis_combos[4][1]
labels=[label1, label2, label3, label4, label5]
    
fig, ax3 = plt.subplots()
ax3.grid(False)

ax3.set_ylabel('Fractional change in mobility', fontsize=fontsize)
ax3.tick_params(axis='y', labelsize=fontsize)
# ax3.set_xlabel('Date', fontsize=fontsize)
ax3.tick_params(axis='x', labelsize=fontsize)
myFmt = mdates.DateFormatter('%m-%d')
ax3.xaxis.set_major_formatter(myFmt)
ax3.set_xlim([dates_postlockdown_datetime[0]-datetime.timedelta(1), dates_postlockdown_datetime[-1]+datetime.timedelta(2)])
xticks_postlockdown = dates_postlockdown_datetime[0:-1:7] + [dates_postlockdown_datetime[-1]+datetime.timedelta(1)]
ax3.set_xticks(xticks_postlockdown)
ax3.set_ylim(0, 1)

dates_postlockdown_datetime_shifted = np.array(dates_postlockdown_datetime)+datetime.timedelta(hours=12)
for idx, nis_combo in enumerate(nis_combos):
    pi_values=[pi_gh[d].loc[nis_combo[0], nis_combo[1]] for d in dates_postlockdown]
    ax3.plot(dates_postlockdown_datetime_shifted, pi_values, label=labels[idx], color=colors[idx], linewidth=2, alpha=0.8)


# ax3.plot(dates_postlockdown_datetime_shifted, total_mobility_postlockdown, label=data_label_daily, alpha=alpha_daily, linewidth=1,color=color_daily)
# ax3.plot(dates_postlockdown_datetime_shifted, averaged_values, label=data_label_averaged, linewidth=3, color=color_averaged)
fig.tight_layout()  # otherwise the right y-label is slightly clipped

plt.title('Changing mobility during lockdown', fontsize=fontsize)
fig.autofmt_xdate(bottom=.2, rotation=90, ha='center', which=None) # Automatic x-tick fix!

# Add coloured bands
week_colour = 'wheat'
weekend_colour = 'goldenrod'
vacation_colour = 'y'
week_baseline_colour = 'k'
weekend_baseline_colour = 'k'
vacation_baseline_colour = 'k'
alpha=0.5
linewidth=1
baseline_alpha=0.5
label_baseline='baseline total mobility'

# Add coloured bands
week_colour = 'wheat'
weekend_colour = 'goldenrod'
vacation_colour = 'y'
week_baseline_colour = 'k'
weekend_baseline_colour = 'k'
vacation_baseline_colour = 'k'
alpha=0.5
linewidth=1
baseline_alpha=0.5
label_baseline='baseline total mobility'

# Business days
plt.axvspan(sdate_postlockdown + datetime.timedelta(0), sdate_postlockdown + datetime.timedelta(5), facecolor=week_colour, alpha=alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(0), sdate_postlockdown + datetime.timedelta(5)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(7), sdate_postlockdown + datetime.timedelta(12), facecolor=week_colour, alpha=alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(7), sdate_postlockdown + datetime.timedelta(12)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(14), sdate_postlockdown + datetime.timedelta(19), facecolor=week_colour, alpha=alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(14), sdate_postlockdown + datetime.timedelta(19)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(28), sdate_postlockdown + datetime.timedelta(33)), (business_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Weekends
plt.axvspan(sdate_postlockdown + datetime.timedelta(5), sdate_postlockdown + datetime.timedelta(7), facecolor=weekend_colour, alpha=alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(5), sdate_postlockdown + datetime.timedelta(7)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(12), sdate_postlockdown + datetime.timedelta(14), facecolor=weekend_colour, alpha=alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(12), sdate_postlockdown + datetime.timedelta(14)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(19), sdate_postlockdown + datetime.timedelta(21), facecolor=weekend_colour, alpha=alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(19), sdate_postlockdown + datetime.timedelta(21)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(26), sdate_postlockdown + datetime.timedelta(28)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(33), sdate_postlockdown + datetime.timedelta(35)), (weekend_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Vacation days
plt.axvspan(sdate_postlockdown + datetime.timedelta(21), sdate_postlockdown + datetime.timedelta(26), facecolor=vacation_colour, alpha=alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(26), sdate_postlockdown + datetime.timedelta(28), facecolor=vacation_colour, alpha=alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(28), sdate_postlockdown + datetime.timedelta(33), facecolor=vacation_colour, alpha=alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(33), sdate_postlockdown + datetime.timedelta(35), facecolor=vacation_colour, alpha=alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(21), sdate_postlockdown + datetime.timedelta(35)), (vacation_baseline, vacation_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Inbetween dotted lines (aesthetical)
# plt.plot((sdate_postlockdown + datetime.timedelta(5), sdate_postlockdown + datetime.timedelta(5)), (business_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha, label=label_baseline)
# plt.plot((sdate_postlockdown + datetime.timedelta(7), sdate_postlockdown + datetime.timedelta(7)), (weekend_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(12), sdate_postlockdown + datetime.timedelta(12)), (business_baseline, weekend_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(14), sdate_postlockdown + datetime.timedelta(14)), (weekend_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(19), sdate_postlockdown + datetime.timedelta(19)), (weekend_baseline, business_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)
# plt.plot((sdate_postlockdown + datetime.timedelta(21), sdate_postlockdown + datetime.timedelta(21)), (weekend_baseline, vacation_baseline), 'k--', linewidth=linewidth, alpha=baseline_alpha)

# Edges
plt.axvspan(sdate_postlockdown + datetime.timedelta(35), sdate_postlockdown + datetime.timedelta(36), facecolor=week_colour, alpha=alpha)
plt.axvspan(sdate_postlockdown + datetime.timedelta(-2), sdate_postlockdown + datetime.timedelta(0), facecolor=weekend_colour, alpha=alpha)

plt.legend(fontsize=fontsize-2, loc='upper right')

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x2239fd99a90>

In [553]:
pi_gh['20200318']

Unnamed: 0_level_0,11000,12000,13000,21000,23000,24000,25000,31000,32000,33000,...,73000,81000,82000,83000,84000,85000,91000,92000,93000,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11000,0.880595,0.677906,0.696152,0.521369,0.633479,0.652833,0.611465,0.619828,0.548757,0.520116,...,0.622649,0.426784,0.459754,0.478625,0.458911,0.725901,0.53781,0.681786,0.519054,inf
12000,0.592908,0.966195,0.723445,0.466354,0.702236,0.672984,0.741336,0.641095,0.603953,0.520492,...,0.503886,0.459063,0.361386,0.409782,0.460446,0.589697,0.483101,0.823368,0.655461,inf
13000,0.639383,0.72613,0.927829,0.617911,0.745919,0.664344,0.725217,0.522603,0.793737,0.637571,...,0.740452,0.836796,0.495369,0.539271,0.589861,0.423022,0.501455,0.689646,0.549954,inf
21000,0.688049,0.680651,0.841395,0.84874,0.710926,0.697401,0.517486,0.677035,0.735308,0.701629,...,0.699887,0.595973,0.80972,0.85273,0.730038,0.666778,0.866961,0.670126,0.785508,inf
23000,0.669027,0.718389,0.666799,0.547047,0.977828,0.691472,0.747849,0.615843,0.496696,0.48328,...,0.543824,0.412541,0.403676,0.520085,0.512351,0.586438,0.577353,0.743101,0.561515,inf
24000,0.676014,0.721805,0.724577,0.462212,0.680286,0.906332,0.774738,0.463969,0.329174,0.487078,...,0.739367,0.493455,0.396571,0.458047,0.418615,0.315161,0.500938,0.679129,0.442331,inf
25000,0.609221,0.6565,0.641554,0.430444,0.715557,0.693019,0.924235,0.372087,0.112204,0.494958,...,0.536411,0.611656,0.500846,0.560054,0.551912,0.56205,0.697406,0.681484,0.726646,inf
31000,0.738816,0.693739,0.706941,0.539687,0.755289,0.634983,0.631435,0.923284,0.771466,0.72449,...,0.339738,0.3136,0.175,0.601524,0.374704,0.65388,0.380319,0.476664,0.354687,inf
32000,0.842848,0.573216,0.566866,0.513283,0.781207,0.630409,0.858693,0.653039,1.005151,0.819238,...,0.593696,4.725,0.481365,0.290566,0.440669,22.4,0.80277,0.752144,0.826126,inf
33000,0.572226,0.617241,0.522417,0.50089,0.547326,0.482397,0.834043,0.692697,0.830525,0.954225,...,0.405882,0.19774,0.224879,0.278146,0.311294,0.597059,0.344328,0.538462,0.261871,inf


## Visualisation population per region

In [73]:
# Visualisation in geopandas
import geopandas as gp

shp_dir = "../../data/raw/GIS/shapefiles/BE/"

# Load different geographical aggregations
country = gp.read_file(shp_dir + "AD_6_Country.shp")
regions = gp.read_file(shp_dir + "AD_5_Region.shp")
provinces = gp.read_file(shp_dir + "AD_4_Province.shp")
arrondissements = gp.read_file(shp_dir + "AD_3_District.shp")
municipalities = gp.read_file(shp_dir + "AD_2_Municipality.shp")

# Add NIS-code to Arrondissement Brussel-Hoofdstad
provinces.loc[provinces['NISCode']=='NA', 'NISCode'] = '21000'

# Create population per aggregation
population = dict({})
pop_loc = '../../data/interim/demographic/'
for agg in ['mun', 'arr', 'prov']:
    filename = pop_loc + 'initN_' + agg + '.csv'
    population[agg] = pd.read_csv(filename, index_col='NIS')[['total']]

In [74]:
arrondissements

Unnamed: 0,ModifDate,NISCode,tgid,Shape_Leng,Shape_Area,NameDut,NameFre,NameGer,geometry
0,2007-01-05,81000,{3E0FC3C6-1C73-48A2-83D1-7DC3A7AA8514},138306.528302,318873300.0,,Arlon,,"POLYGON Z ((750399.997 528112.712 0.000, 75045..."
1,2007-01-05,82000,{EE733AA7-7FC2-4E69-888B-DF7ECFB64C35},274752.161091,1046626000.0,,Bastogne,,"POLYGON Z ((735178.278 562794.608 0.000, 73517..."
2,2007-01-05,83000,{A6C31231-3E61-43CE-84E2-1438698DC257},215795.867725,958051400.0,,Marche-en-Famenne,,"POLYGON Z ((720428.942 587448.112 0.000, 72042..."
3,2007-01-05,84000,{897E2BFE-980D-496E-881E-2ED9ABE005F5},277555.66142,1358428000.0,,NeufchÃ¢teau,,"POLYGON Z ((739947.421 559047.830 0.000, 73994..."
4,2007-01-05,85000,{EDA0B812-D16F-4395-80B2-0189F6AF09F5},206847.516841,777271100.0,,Virton,,"POLYGON Z ((746339.083 526486.275 0.000, 74634..."
5,2007-01-05,91000,{2E974C14-FD3B-407F-884D-479E8F93E9CD},372501.807316,1596279000.0,,Dinant,,"POLYGON Z ((693254.270 554921.425 0.000, 69322..."
6,2007-01-05,92000,{51EBBE59-3B18-4D17-BC43-89B4365D9EE0},251656.364154,1167705000.0,,Namur,,"POLYGON Z ((701084.688 616627.510 0.000, 70107..."
7,2007-01-05,93000,{A89A0808-863F-4357-B492-15A0BED55A66},195937.951463,910837900.0,,Philippeville,,"POLYGON Z ((671144.160 575881.570 0.000, 67113..."
8,2007-01-05,21000,{1AC6671D-002E-4412-B042-71527AA159D9},72160.956955,162419900.0,Brussel-Hoofdstad,Bruxelles-Capitale,,"POLYGON Z ((644970.934 666620.967 0.000, 64496..."
9,2007-01-05,11000,{0876A7BC-F0C1-4719-816A-44AC9D669ABF},184480.414202,1004125000.0,Antwerpen,,,"POLYGON Z ((655739.519 695716.868 0.000, 65571..."


In [79]:
agg = 'mun'
for nis in population[agg].index:
    municipalities.loc[municipalities['NISCode'] == str(nis),'Population'] = int(population[agg].loc[nis, 'total'])
municipalities.head()

Unnamed: 0,ModifDate,City,LanguageSt,NISCode,DistrictCa,ProvinceCa,RegionCapi,CountryCap,tgid,Shape_Leng,Shape_Area,NameDut,NameFre,NameGer,geometry,Population
0,2007-01-05,1,2,83034,1,0,0,0,{D0109DA3-34B8-46C0-92B8-BE65085732FB},85650.096718,122072300.0,,Marche-en-Famenne,,"POLYGON Z ((726984.424 600113.847 0.000, 72699...",17591.0
1,2007-01-05,1,1,23027,0,0,0,0,{2E6FAFC1-84DF-472E-AA01-3499B29786C5},45798.27663,44982130.0,Halle,,,"POLYGON Z ((643229.384 658679.116 0.000, 64323...",40182.0
2,2007-01-05,2,2,91103,0,0,0,0,{26279847-D2CA-49E1-9EAE-0D8595F62FA7},56990.07356,65658760.0,,Onhaye,,"POLYGON Z ((679740.122 601677.244 0.000, 67975...",3230.0
3,2007-01-05,1,2,91034,1,0,0,0,{CD6CED5D-6CD9-4446-B643-B8CDAABAC8BA},71326.001927,100056000.0,,Dinant,,"POLYGON Z ((692802.454 600840.203 0.000, 69279...",13374.0
4,2007-01-05,2,2,82037,0,0,0,0,{92E40A08-FC1A-44D5-94E3-D4D494FFB7C9},76994.149941,165360000.0,,Gouvy,,"POLYGON Z ((768030.137 598110.413 0.000, 76801...",5397.0


In [80]:
from mpl_toolkits.axes_grid1 import make_axes_locatable # for plot aesthetics
from matplotlib import colors

# Make figure
fig, ax = plt.subplots(figsize = (8,7)) # 800 pixels x 700 pixels
cax = make_axes_locatable(ax).append_axes("right", size="5%", pad=0.1) # Legend properties
ax.set_axis_off()
cmap='cividis'

vmin = min(municipalities['Population'])
vmax = max(municipalities['Population'])

fig = municipalities.plot(column='Population', ax=ax, cmap=cmap,
                            legend=True, edgecolor = 'gray', norm=colors.SymLogNorm(linthresh=100, linscale=1, vmin=vmin, vmax=vmax), cax=cax)

textbox = 'Population in arrondissements'
plt.figtext(.11, .25, textbox, backgroundcolor='whitesmoke', fontfamily='monospace', fontsize=14)
# plt.close()

# (Create directory and) save figure
path = '../figures/maps/provinces/time_series_tests/'
chart = fig.get_figure()
# chart.savefig('time-delays_arr_to_' + arr + '.jpg' ,dpi=50, bbox_inches='tight')

<IPython.core.display.Javascript object>

## Explore other ways of analysing mobility data

We want to eventually have a mobility matrix $P$ with elements $P^{gh}$ that show the fraction of the population living in $g$ that are basically spending most of their day in region $h$. With the StatBel data this was rather easily achieved, because a person in region $g$ was simply registered as working in region $h$.

Example: if there are 1000 people in region 1, and 600 of these work in region 1, 300 in region 2 and 100 in region 3, this works out as simple fractions $P^{11} = 0.6, P^{12}=0.3, P^{13}=0.1$. Because of the fact that a person can only go to *one* place, the sum $\sum_{h=1}^G P^{gh}$ will always be unity.

The situation is not so easy for the Proximus data, because of the fact that ...
1. People that don't move all day are not registered at all
2. People can visit multiple places


This means that the total number of visits is not conserved, as it is dependent on the movement of the people. This calls for a different approach that is more complex but certainly also more informative - and an approach that we luckily have data for! Because the total *time* people have to spend certainly *is* conserved.

For every postal code, the number of users are registered (`imisinpostalcode`). PC 2000 (Antwerp) on 16 February 2020, for example, has 16084 registered users (that is to say: 16084 users have PC 2000 as their most likely living place). That means they have a total of $16084 \times 24 \times 60 \times 60 = 1389657600$ seconds to spend that day. Only $1209790773$ of these are registered, which means that $179866827$ seconds have gone unregistered. These seconds are distributed over two classes:
1. Inhabitants of PC 2000 who travel too fast (no connection over 15 minutes can be established)
2. Inhabitants of PC 2000 who have not been connected to a different transmission tower all day (**it works like that for visits, but does it also work like that for estimated staytime? Check this with Gerdy**)

As it is impossible to track the first class, we may *underestimate* the overall mobility by assuming that all missing seconds go into the second class. Total time spent in in PC 2000 is therefore the `est_staytime` added to calculated missing seconds. In general the `est_staytime` is much larger than the missing seconds; in this case the missing time only constitutes 16.4 percent of the total time spent in PC 2000.

Note that this choice may be altered. We may assume for example that it *only* consists of people that are travelling too fast, and therefore only undergo $N_{c,\text{transport}}$ contact.

In [18]:
import pandas as pd
datafile_name = "../../data/raw/mobility/proximus/outputPROXIMUS122747corona20200216AZUREREF001.csv"
raw_mob = pd.read_csv(datafile_name, sep=';', decimal=',', dtype={'mllp_postalcode' : str,
                                                                                     'postalcode' : str,
                                                                                     'imsisinpostalcode' : int,
                                                                                     'habitatants' : int,
                                                                                     'nrofimsi' : int,
                                                                                     'visitors' : int,
                                                                                     'est_staytime' : int,
                                                                                     'total_est_staytime' : int,
                                                                                     'est_staytime_perc' : float})
raw_mob2000 = raw_mob[raw_mob['mllp_postalcode']=='2000']
raw_mob2000[raw_mob2000['postalcode']=='2000']

Unnamed: 0,mllp_postalcode,postalcode,imsisinpostalcode,habitants,nrofimsi,visitors,est_staytime,total_est_staytime,est_staytime_perc
50479,2000,2000,16084,62398.0,15018,55266,913765475,1209790773,75.53087


In [21]:
# Note: the total_est_staytime also includes the times that are masked due to GDPR protocol

raw_mob2000.loc[raw_mob2000['nrofimsi']>=0, 'est_staytime'].sum()

1149114460

In [30]:
# The total percentage of time spent does not add up to 100, due to masked data.

raw_mob2000[raw_mob2000['nrofimsi']>=0]['est_staytime_perc'].sum()

94.98456315199999

**Translating time data to a mobility matrix**

Next we want to use these data to define the mobility matrix $P$, consisting of elements $P^{gh} \leq 1$ that denote the fraction of time spent. Neither the model nor the data can fully take into account that some people visit multiple places every day, nor is this really necessary, as we are working with a metapopulation model anyway. The mobility matrix is therefore calculated as a matrix of fractions calculated from the time spent from place $g$ at a certain place $h$, divided by the total time available to all inhabitants of place $g$.

We must also consider that
1. People typically sleep at home, give or take 8 hours a night. We may assume that in general people are not contageous at night for all practicle purposes. Yet, these 8 hours are counted as seconds spent in the home patch. To get a better picture of how much time people are socially active in their home patch, we choose to take out the equivalent of 8 hours out of time spent in the home patch.
2. Many people disconnect their phone at night (by turning off their phone or enabling flight mode). When they reconnect, this is counted as a visit to the home patch. We may generally presume that people turn off their phone before midnight and turn it back on in the morning, so the 'new visit' is counted every morning. This doesn't really matter, because in this approach we are concerned with the total time spent, not the absolute visit count, and whether or not this time is counted as "time in the home patch" or as "missing time" (**not sure which it is, perhaps ask this?**), they end up being counted in fraction $P^{gg}$ anyway.
3. How do we handle -1 values?
    - The total time is saved, luckily, but if `nrofimsi` is -1, so are `visitors`, `est_staytime` and `est_staytime_perc`. We know how much time is spent in total, so one way to go about would be to distribute this time equally over all destination postal codes. An obvious problem there is that this is certainly not realistic, as it overestimates visits far away, and underestimates visits closeby. This would be an important flaw in the entire spatial approach. **Hopefully we can get our hands on data at arrondissement level, such that this problem is circumvented**
    - For a number of postal codes, also `imsisinpostalcode` is GDPR-protected. We may circumvent this problem by estimating the population from the total estimated staytime, combined with the average fraction between total staytime and actual time in PCs that *do* have `imsisinpostalcode` data. Again, this is an issue and a source of uncertainty that is ideally omitted when we have **arrondissement-level data**.
4. Importantly: this percentage is already quoted in the raw mobility data (as `est_staytime_perc`). *But* the definition of this quantity may be too simplistic (`est_staytime`/`total_est_staytime` $\times 100\%$). Still, it is probably of interest to inspect this quantity as well.

Mathematically, if $T^g$ is the number of people registered in the PC, and $t^{gh}$ the time spent in $h \neq g$, we can define the mobility matrix:

$$
P^{gh} = \frac{t^{gh}}{T^g \times 16 \times 60 \times 60}
$$

Except when $g = h$, then

$$
P^{gg} = \frac{(t^{gg} + \Delta t - T^g \times 8 \times 60 \times 60)}{T^g \times 16 \times 60 \times 60}
$$

where $\Delta t$ are the 'missing seconds'. It should be verified that $\sum_{h=1}^G P^{gh}$ sums up to unity, which is the case because

$$
\sum_{h=1}^G t^{gh} = T^g \times 24 \times 60 \times 60 - \Delta t
$$

Below are the steps to get from the raw data to the mobility matrix $P$.

**Problems**
1. Some postalcodes have -1 users: this can be fixed by comparing the `total_est_staytime` with the `mllp_postalcode` for data that *are* available, and taking the median ratio to estimate the `mllp_postalcode` of GDPR-protected data
3. Some postalcodes are not mentioned because they have no visitors at all. These are added at the very end with a `est_staytime` of zero. As we aggregate on arrondissement level, the effect of this lack of data is negligible.

In [332]:
# Load the datafile
datum='20210127'
location='../../data/raw/mobility/proximus/'
unprocessed_data = load_datafile_proximus(datum, location)
unprocessed_data

Unnamed: 0,mllp_postalcode,postalcode,imsisinpostalcode,habitants,nrofimsi,visitors,est_staytime,total_est_staytime,est_staytime_perc
0,1000,1000,14993.0,52832.0,14250.0,46591.0,9.039503e+08,1.128257e+09,80.119170
1,1000,1020,14993.0,52832.0,373.0,1220.0,4.305160e+06,1.128257e+09,0.381576
2,1000,1030,14993.0,52832.0,606.0,1981.0,7.739741e+06,1.128257e+09,0.685991
3,1000,1040,14993.0,52832.0,949.0,3103.0,1.388068e+07,1.128257e+09,1.230276
4,1000,1050,14993.0,52832.0,1312.0,4290.0,1.691382e+07,1.128257e+09,1.499110
...,...,...,...,...,...,...,...,...,...
291501,Foreigner,9988,381744.0,,220.0,560.0,4.618521e+06,1.821687e+10,0.025353
291502,Foreigner,9990,381744.0,,564.0,1485.0,9.343111e+06,1.821687e+10,0.051288
291503,Foreigner,9991,381744.0,,305.0,798.0,4.450126e+06,1.821687e+10,0.024429
291504,Foreigner,9992,381744.0,,117.0,262.0,3.428636e+06,1.821687e+10,0.018821


In [333]:
# First we estimate the number of people that live in a certain PC if this information is not provided
# because of GDPR protection. Luckily total_est_staytime *is* always provided.

clients_per_pc = pd.pivot_table(unprocessed_data, index='mllp_postalcode', values=['imsisinpostalcode', 'total_est_staytime'], aggfunc='first')

clients_per_pc_available = clients_per_pc.loc[clients_per_pc['imsisinpostalcode']>0]
clients_per_pc_available['people_per_staytime'] = clients_per_pc_available['imsisinpostalcode'] \
    / clients_per_pc_available['total_est_staytime']

# calculate median people per staytime
median_people_per_staytime = clients_per_pc_available['people_per_staytime'].median()
print("Median number of people per staytime:", median_people_per_staytime)

# Change negative values with estimated number of people living somewhere
clients_per_pc.loc[clients_per_pc['imsisinpostalcode']<0, 'imsisinpostalcode'] \
    = clients_per_pc.loc[clients_per_pc['imsisinpostalcode']<0, 'total_est_staytime'] * median_people_per_staytime

# now there are values <=30, giving us an estimate of a value that was previously unknown due to GDPR protection
clients_per_pc.loc[clients_per_pc['imsisinpostalcode']<=30]

Median number of people per staytime: 1.2982857899531083e-05


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0_level_0,imsisinpostalcode,total_est_staytime
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1
1473,20.897481,1609621.0
3051,4.382247,337541.0
3052,11.028808,849490.0
3473,10.73424,826801.0
3501,4.14431,319214.0
3721,7.555517,581961.0
3792,2.181704,168045.0
4252,1.088405,83834.0
4263,0.959368,73895.0
4672,1.574613,121284.0


In [334]:
# We use the number of people to estimate the number of seconds people are at home without having been registered
# These seconds (denoted \Delta t in the theory) may be added to the time spent for people 'travelling' to their own home

# Available time in seconds
clients_per_pc['total_available_time'] = clients_per_pc['imsisinpostalcode'] * 24 * 60 * 60
clients_per_pc['missing_seconds'] = clients_per_pc['total_available_time'] - clients_per_pc['total_est_staytime']

missing_seconds_per_pc = clients_per_pc.copy()
missing_seconds_per_pc
# missing_seconds_per_pc[missing_seconds_per_pc['missing_seconds']<0] # This should be empty, which it is

Unnamed: 0_level_0,imsisinpostalcode,total_est_staytime,total_available_time,missing_seconds
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1000,14993.0,1.128257e+09,1.295395e+09,1.671380e+08
1020,14024.0,1.063961e+09,1.211674e+09,1.477128e+08
1030,28934.0,2.178686e+09,2.499898e+09,3.212113e+08
1040,22926.0,1.726721e+09,1.980806e+09,2.540856e+08
1050,35097.0,2.628077e+09,3.032381e+09,4.043036e+08
...,...,...,...,...
9988,788.0,6.138766e+07,6.808320e+07,6.695535e+06
9990,4953.0,3.871779e+08,4.279392e+08,4.076134e+07
9991,2767.0,2.153407e+08,2.390688e+08,2.372807e+07
9992,287.0,2.130139e+07,2.479680e+07,3.495406e+06


That is **the first type of information** that we needed. We'll save it for when we have a grid of staytimes,
to either add it to the rest of the `est_staytime`$^{gg}$, or to keep it aside (because this behaviour may imply different social interaction). **Note**, by the way, that the `missing_seconds` for foreigners is an order of magnitude larger than the `total_est_staytime`, denoting that most foreigners only spend a fraction of their day connected over 15 minutes to a particular tower of the Belgian telecom grid (e.g. truckers that only briefly stop along the way but are counted as individual users).

Next we want to find an acceptable filler for the -1 values in the `est_staytime` slots that are protected by GDPR.

*pseudocode*
- calculate difference between `total_est_staytime` and the sum of all `est_staytime`s for a particular PC
- For every `mllp_postalcode`, calculate the number of times -1 occurs
- Divide the time difference by this number
- For this `mllp_postalcode`, change the -1 value by the newly calculated number
- Go to next `mllp_postalcode`

When all that is done, the first 'clean' has happened. Then we may proceed to aggregate 

In [340]:
# Simple "raw" grid of "people from g spend x time in h". 0 values means none are registered, -1 means it is protected
staytime_matrix = pd.pivot_table(unprocessed_data, index='mllp_postalcode', columns='postalcode', values='est_staytime').fillna(value=0)

# Total non-GDPR protected time per PC (should be smaller than total_est_time)
est_staytime_noGDPR = staytime_matrix[staytime_matrix>0].fillna(value=0).sum(axis=1)

# Number of times -1 occurs per PC (i.e. number of postal codes for which data is protected)
# Note: this is a lot! The maximum is 700 (which is more than half of all 1148 PCs)
small_number_freq = staytime_matrix[staytime_matrix==-1].fillna(value=0).sum(axis=1).astype(int).abs()

# Average time difference per GDPR-protected postalcode, between total registered time and the sum of the individually registered times
est_staytime_GDPR = (missing_seconds_per_pc['total_est_staytime'] - est_staytime_noGDPR) / small_number_freq

# For most mllp_postalcode, this is not a big value (order of couple hours)
# For foreigners it is of the order of a few days. Reasoning:
#   1. The registered total_est_staytime is large
#   2. The est_staytimes that are registered are low (just over threshold)
# Again this is due to the nature of foreigners visiting Belgium: they travel around much (truckers) and typically don't
# Stay at a particular place for a long time. It would be nice to have an actual source for this rather than a reasoning
est_staytime_GDPR 

mllp_postalcode
1000          63221.349073
1020          53622.230216
1030          71988.330946
1040          76797.535484
1050          88932.543885
                 ...      
9988          46109.561905
9990          47991.168350
9991          38209.456140
9992          52205.828125
Foreigner    279281.846939
Length: 1126, dtype: float64

In [364]:
# Now we add the first and the second "extra time" to the proper place such that we can aggregate in the next stage

mmprox_staytime = pd.pivot_table(unprocessed_data, index='mllp_postalcode', columns='postalcode', \
                                 values='est_staytime').fillna(value=0)
for pc in mmprox_staytime.index:
    if pc != 'Foreigner':
        mmprox_staytime.loc[pc, pc] += missing_seconds_per_pc.loc[pc, 'missing_seconds']
    else:
        mmprox_staytime.loc[pc, 'ABROAD'] += missing_seconds_per_pc.loc[pc, 'missing_seconds']

# ... And add change the -1 values to the estimated values. (Again: this is arguably not
# the best way to go about here, evenly distributing it everywhere ...)

for pc in mmprox_staytime.index:
    mmprox_staytime.loc[pc, mmprox_staytime.loc[pc]<0] = est_staytime_GDPR[pc]
    
mmprox_staytime.head()

postalcode,1000,1020,1030,1040,1050,1060,1070,1080,1081,1082,...,9970,9971,9980,9981,9982,9988,9990,9991,9992,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1000,1071088000.0,4305160.0,7739741.0,13880680.0,16913820.0,27825344.0,18735826.0,21182361.0,1187681.0,1127368.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6987016.0
1020,14191170.0,979405457.0,11075020.0,5630825.0,6477743.0,2731249.0,9533940.0,7905203.0,1156584.0,2690473.0,...,0.0,0.0,0.0,0.0,0.0,0.0,53622.230216,53622.230216,0.0,4357490.0
1030,27820330.0,11496506.0,2083224000.0,34390110.0,20434600.0,6233719.0,9671119.0,7822761.0,1349087.0,2387922.0,...,0.0,0.0,0.0,0.0,0.0,0.0,71988.330946,0.0,0.0,10408673.0
1040,19670360.0,3062275.0,31622370.0,1653443000.0,63950680.0,5696658.0,5936898.0,4079135.0,757054.0,1119099.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12538043.0
1050,34928660.0,3436482.0,11595460.0,62322040.0,2564538000.0,22209525.0,11118688.0,5054392.0,1066980.0,1327043.0,...,88932.543885,0.0,0.0,0.0,0.0,0.0,88932.543885,0.0,0.0,26944174.0


In [370]:
# Next we may continue with the already defined clean (add additional postal codes) and aggregation
mmprox_staytime_mun = mm_aggregate(fill_missing_pc(mmprox_staytime), agg='mun')
mmprox_staytime_mun.head()

postalcode,11001,11002,11004,11005,11007,11008,11009,11013,11016,11018,...,92141,92142,93010,93014,93018,93022,93056,93088,93090,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11001,349477300.0,21249830.0,82434.43,5912289.0,41217.21,41217.21,41217.21,2117607.0,41217.214286,1033245.0,...,0.0,41217.214286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,41217.21
11002,14182720.0,9662644000.0,5738566.0,5430597.0,15897510.0,41747780.0,6958789.0,42223490.0,770540.571341,8381995.0,...,236638.170586,512262.237296,0.0,222966.752868,51750.117978,254385.755109,100857.191695,0.0,178206.015954,32584730.0
11004,539884.8,12330250.0,315454100.0,90573.91,1732551.0,90573.91,90573.91,1791637.0,46005.149573,90573.91,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,815991.8
11005,6900662.0,12728360.0,115987.1,391707600.0,57993.55,57993.55,57993.55,872491.0,0.0,1109016.0,...,0.0,57993.547619,0.0,0.0,0.0,0.0,57993.547619,57993.547619,0.0,1284571.0
11007,51480.48,23953270.0,1449894.0,51480.48,180527600.0,51480.48,51480.48,428623.0,51480.483871,51480.48,...,0.0,51480.483871,0.0,0.0,0.0,0.0,0.0,0.0,0.0,51480.48


In [377]:
mmprox_staytime_mun.div(mmprox_staytime_mun.sum(axis=1), axis=0)

postalcode,11001,11002,11004,11005,11007,11008,11009,11013,11016,11018,...,92141,92142,93010,93014,93018,93022,93056,93088,93090,ABROAD
mllp_postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11001,0.844442,0.051346,0.000199,0.014286,0.000100,0.000100,0.000100,0.005117,0.000100,0.002497,...,0.000000,0.000100,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000100
11002,0.001351,0.920765,0.000547,0.000517,0.001515,0.003978,0.000663,0.004024,0.000073,0.000799,...,0.000023,0.000049,0.000000,0.000021,0.000005,0.000024,0.000010,0.000000,0.000017,0.003105
11004,0.001411,0.032237,0.824732,0.000237,0.004530,0.000237,0.000237,0.004684,0.000120,0.000237,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.002133
11005,0.014567,0.026868,0.000245,0.826856,0.000122,0.000122,0.000122,0.001842,0.000000,0.002341,...,0.000000,0.000122,0.000000,0.000000,0.000000,0.000000,0.000122,0.000122,0.000000,0.002712
11007,0.000221,0.102680,0.006215,0.000221,0.773867,0.000221,0.000221,0.001837,0.000221,0.000221,...,0.000000,0.000221,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000221
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93022,0.000000,0.000359,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000344,0.000450,0.000225,0.001899,0.000893,0.839104,0.010882,0.017972,0.000225,0.000225
93056,0.000000,0.000144,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000144,0.000287,0.003707,0.026068,0.016627,0.011593,0.839493,0.010766,0.005762,0.003771
93088,0.000000,0.000212,0.000071,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000281,0.000351,0.003534,0.002714,0.000141,0.012550,0.010911,0.830042,0.000141,0.002466
93090,0.000000,0.000775,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000387,0.000581,0.000194,0.048522,0.005140,0.003313,0.017502,0.000387,0.844508,0.006572


In [None]:
def load_mobility_timefractions(dates, data_location, complete=False, verbose=True, return_missing=False):
    """
    Load Proximus mobility data (number of visitors or visitor time) corresponding to the requested dates
    
    Input
    -----
    dates: str or list of str
        Single date in YYYYMMDD form or list of these (output of make_date_list function): requested date(s)
    data_location: str
        Name of directory (relative or absolute) that contains all Proximus data files
    complete: boolean
        If True, this function raises an exception when 'dates' contains a date that does not correspond to a data file.
    verbose: boolean
        If True, print statement every time data for a date is loaded.
    return_missing: boolean
        If True, return array of missing dates in form YYYYMMDD as second return. False by default.
    
    Returns
    -------
    mmprox_dict: dict of pandas DataFrames
        Dictionary with YYYYMMDD dates as keys and pandas DataFrames with fractions of total time spent.
    """
    # Check dates type and change to single-element list if needed
    single_date = False
    if isinstance(dates,str):
        dates = [dates]
        single_date = True
    
    missing = check_missing_dates(dates, data_location)
    load_dates = set(dates).difference(missing)
    dates_left = len(load_dates)
    if dates_left == 0:
        raise Exception("None of the requested dates correspond to a Proximus mobility file.")
    if missing != set():
        print(f"Warning: some or all of the requested dates do not correspond to Proximus data. Dates: {sorted(missing)}")
        if complete:
            raise Exception("Some requested data is not found amongst the Proximus files. Set 'complete' parameter to 'False' and rerun if you wish to proceed with an incomplete data set (not using all requested data).")
        print(f"... proceeding with {dates_left} dates.")

    # Initiate dict for remaining dates
    mmprox_dict=dict({})
    load_dates = sorted(list(load_dates))
    for date in load_dates:
        datafile = load_datafile_proximus(date, data_location)
        mmprox_temp = datafile[['mllp_postalcode', 'postalcode', values]]
        mmprox_temp = mmprox_temp.pivot_table(values=values,
                                              index='mllp_postalcode',
                                              columns='postalcode')
        mmprox_temp = mmprox_temp.fillna(value=0)
        if values == 'total_est_staytime':
            mmprox_dict[date] = mmprox_temp.convert_dtypes().abs()
        else:
            mmprox_dict[date] = mmprox_temp.convert_dtypes()
        if verbose==True:
            print(f"Loaded dataframe for date {date}.    ", end='\r')
    print(f"Loaded dataframe for date {date}.")
    
    if not return_missing:
        return mmprox_dict
    else:
        return mmprox_dict, sorted(missing)

In [49]:
# This shows all places where `imsisinpostalcode` is <30
raw_mob_low_mllp= pd.pivot_table(raw_mob, index='mllp_postalcode', values='imsisinpostalcode', aggfunc='first')
raw_mob_low_mllp[raw_mob_low_mllp['imsisinpostalcode']<=30]

Unnamed: 0_level_0,imsisinpostalcode
mllp_postalcode,Unnamed: 1_level_1
1473,-1
3051,-1
3052,-1
3473,-1
3501,-1
3721,-1
3792,-1
4252,-1
4263,-1
4672,-1


In [77]:
# Here are entries with negative total_est_staytime (buggy entries)
raw_mob_neg_staytime = pd.pivot_table(raw_mob, index='mllp_postalcode', values='total_est_staytime')
raw_mob_neg_staytime[raw_mob_neg_staytime['total_est_staytime']<0]

Unnamed: 0_level_0,total_est_staytime
mllp_postalcode,Unnamed: 1_level_1
1030,2116281040
1050,1666890116
1070,1844333656
1180,1782237825
4000,2081928297
9000,1485889850
