# Validation: Forecasting Held-Out Data

This notebook:

* loads a library defined model. 
* loads the data according to the data loader.
* holds-out the final few days of the data, and see's if they can be predicted. 

In [5]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
### Initial imports
import logging
import numpy as np
import pandas as pd
import pymc3 as pm
import theano.tensor as T
import matplotlib.pyplot as plt

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from epimodel.pymc3_models.utils import geom_convolution
from epimodel.pymc3_models import cm_effect

%matplotlib inline

In [54]:
Regions = ["PT", "IS", "HU", "HR", "BE", "NL", "DK", "DE", "AT", "CZ", "GE", "FR", "ES", "GB", "PL", "GR", "CH", "FI", "NO", "SE", "SI", "SK"]
Features = ['Gatherings limited to', 'Business suspended',
       'Schools and universities closed', 'General curfew',
       'Healthcare specialisation', 'Minor distancing and hygiene measures',
       'Phone line', 'Mask wearing', 'Asymptomatic contact isolation']
data = cm_effect.loader.Loader('2020-02-10', '2020-04-05', Regions, Features)

# Probability of being in Confirmed cases X days after transmission, from 0
# Generated from Poisson dist., must sum to 1.0
DelayProb = np.array([0.00, 0.01, 0.02, 0.06, 0.10, 0.13, 0.15, 0.15, 0.13, 0.10, 0.07, 0.05, 0.03])
print(f"Delayprob sum={DelayProb.sum()}, E[DP]={np.sum(DelayProb*np.arange(len(DelayProb)))}")

INFO:epimodel.regions:Name index has 6 potential conflicts: ['american samoa', 'georgia', 'guam', 'northern mariana islands', 'puerto rico', 'united states minor outlying islands']


Delayprob sum=1.0, E[DP]=6.78


In [56]:
# note that this model should be the exact same as in the jupyter notebook, but the testpoint
# isn't the same. I'm not quite sure why, and am unaware if its an issue. 
with cm_effect.models.CMModelV2(data) as model:
    model.build_all()

INFO:epimodel.pymc3_models.cm_effect.models:Checking model test point
INFO:epimodel.pymc3_models.cm_effect.models:
CMReduction_log__            12.45
BaseGrowthRate_log__         -1.61
RegionGrowthRate_log__        6.27
RegionScaleMult_log__       -20.22
DailyGrowth_log__          1400.25
InitialSize_log__           -70.87
Observed_missing              0.00
Observed                 -15098.43
Name: Log-probability of test_point, dtype: float64



In [57]:
model.run(1000)

Auto-assigning NUTS sampler...
INFO:pymc3:Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
INFO:pymc3:Initializing NUTS using adapt_diag...


CMReduction_log__            12.45
BaseGrowthRate_log__         -1.61
RegionGrowthRate_log__        6.27
RegionScaleMult_log__       -20.22
DailyGrowth_log__          1400.25
InitialSize_log__           -70.87
Observed_missing              0.00
Observed                 -15098.43
Name: Log-probability of test_point, dtype: float64


Multiprocess sampling (2 chains in 2 jobs)
INFO:pymc3:Multiprocess sampling (2 chains in 2 jobs)
NUTS: [Observed_missing, InitialSize, DailyGrowth, RegionScaleMult, RegionGrowthRate, BaseGrowthRate, CMReduction]
INFO:pymc3:NUTS: [Observed_missing, InitialSize, DailyGrowth, RegionScaleMult, RegionGrowthRate, BaseGrowthRate, CMReduction]
Sampling 2 chains, 0 divergences:   7%|▋         | 206/3000 [00:03<00:44, 63.25draws/s] 


RuntimeError: Chain 1 failed.

In [None]:
_ = model.plot_traces()