**CONTENT**

The calibration process is slightly different for the spatial model. In this notebook we will first explore the various challenges and options concerning calibration of the spatial model. Next, this is the scratch environment for the creation of new functions, if need be.

**OPEN TASKS**

*short term*  
1. Think about another way to downgrade the number of contacts (which is now done with the `prevention` parameter). Or as Tijs says it:
>die 'downgrade' de contacten omdat slechts 1/3 contacten tot transmissie lijkt te leiden. Dit wil ik zelf wegdoen in een volgende modelversie omdat hier ook het effect van het weer inzit ed. Als je een andere oplossing kan bedenken is die meer dan welkom. Ik dacht eraan om bvb te spelen met de contactintensiteiten in de modellen (bvb calibratie op enkel 5 min plus) of tijdens de lockdown alle contacten eruit onder 1h in lengte. Die optie zit in de contact matrix inlaadfunctie.
2. Aggregate Belgian arrondissements in three regions: metropolitan, urban and rural
3. 

*long term*  
1. test

**OPEN QUESTIONS**

1. What are the parameters sigma_H_in and extraTime that the MCMC is running over?
2. What does `emcee.Ensemblesampler` do exactly? Reference: [this link](https://emcee.readthedocs.io/en/stable/user/sampler/)
3. 

# Load packages

In [2]:
# Established packages
import os
import numpy as np
import pandas as pd
import geopandas as gp
import datetime
import math
import xarray as xr # labels in the form of dimensions, coordinates and attributes
import matplotlib.pyplot as plt
import zarr

# Custom package covid19model
from covid19model.models import models
from covid19model.models.utils import name2nis, social_policy_func, save_sim, open_sim
from covid19model.data import model_parameters #, sciensano, google
from covid19model.visualization.output import population_status, infected, show_map, show_graphs

# Download function for complete calibration
from covid19model.optimization.run_optimization import full_calibration


# OPTIONAL: Load the "autoreload" extension so that package code can change
%load_ext autoreload
# OPTIONAL: always reload modules so that as you change code in src, it gets loaded
# This may be useful because the `covid19model` package is under construction
%autoreload 2

# Explore `0.1-twallema-calibration-stochastic.ipynb`

*Steps*  
1. Hospitalisation data from 15-21 March is taken
2. Parameters that are being varied over are `sigma_H_in`, `extraTime` and `beta`. `sigma_H_in` is the uncertainty on `H_in`, `extraTime` is the time between the initialisation of the simulation and 
3. Bounds for flat priors are given
4. Particle Swarm Optimisation (`MCMC.fit_pso`) is executed to find the maximum likelihood estimates of model parameters mentioned above
5. Resulting extraTime is added as model attribute and the other parameter values are used as initial values for the MCMC (typically: four slightly different initial points (nwalkers)). The MCMC happens in two dimensions with four walkers  (paths)
6. Ensemble sampler is used from the `emcee` sampler. I'm not entirely sure what this means
7. It appears that the beta parameter is *first* calibrated, and only *later on* the compliance parameters. The reason is that beta is best calibrated to data before any measures ('pure beta')

# Explore `run_optimization.py`

This function summarises much of the notebook described above.

**input**
Contains a single function called `full_calibration` that takes in the following arguments:
1. `model`: initialised model object (such as the output of the `COVID19_SEIRD_sto_spatial` function)
2. `timeseries`: `pandas` Series Sciensano data to fit with date in index. Take from private data (if such is already available)
3. `spatial_unit`: name of the resulting files (e.g. Gent, Flanders, ...). Simply for *naming*
4. `start_date`: YYYY-MM-DD string for first day in the data to fit on
5. `end_beta`: YYYY-MM-DD string for last day to fit beta on
6. `end_ramp`: YYYY-MM-DD string for last day to fit compliance parameters on
7. `fig_path`: directory to save output figures in
8. `samples_path`: directory to save samples in. This is of the shape
```
    samples_dict={'calibration_data':states[0][0], 'start_date':start_date,
                  'end_beta':end_beta, 'end_ramp':end_ramp,
                  'maxiter': maxiter, 'popsize':popsize, 'steps_mcmc':steps_mcmc,
                  'R0':R0, 'R0_stratified_dict':R0_stratified_dict,
                  'lag_time': lag_time, 'beta': samples_beta['beta'],
                  'l': flat_samples_ramp[:,1].tolist(),'tau':flat_samples_ramp[:,2].tolist(),
                  'prevention':flat_samples_ramp[:,3].tolist()}
```
9. `maxiter`: maximal number of particle swarm steps
10. `popsize`: maximal number of particles in the swarm
11. `steps_mcmc`: iterations of the Monte Carlo simulation. 5000 steps in the national model takes about half an hour. Default is 10000

**output**

**notes**
1. `H_in` is hardcoded: if we want to choose e.g. the number of exposed people we need to fill in another string value here
2. Prior bounds of the PSO are hardcoded and may be adjusted (but probably these boundaries are wide enough)
3. 

In [14]:
# Load model parameters
spatial = 'arr'
initN, Nc_home, Nc_work, Nc_schools, Nc_transport, Nc_leisure, Nc_others, Nc_total = model_parameters.get_interaction_matrices(spatial=spatial)
params = model_parameters.get_COVID19_SEIRD_parameters(spatial=spatial)

# Load initial state: one thirty year old exposed individual in every arrondissement
G, N = initN.shape[0], initN.shape[1]
E = np.zeros([G,N])
for g in range(G):
    E[g,3] = 1
states = dict({'S' : initN, 'E' : E})

# Load model
model = models.COVID19_SEIRD_sto_spatial(states, params, discrete=True, spatial=spatial)