# Calibration of multiple parameters for ASPICS model, using ABC method

This jupyter notebook is based on the previous efforts from DyME and Prof Nick Malleson (University of Leeds)

- [RAMP-UA Initiative](https://github.com/Urban-Analytics/RAMP-UA/blob/d5973dff007645f1700cded93aaf72298ef84c61/experiments/calibration/abc-1.ipynb)

- [Calibrating Agent-Based Models Using Uncertainty Quantification Methods](https://github.com/Urban-Analytics/uncertainty/blob/master/hm_abc_simple_example.ipyn)

As SPC (Synthetic Population Catalyst) is a tool that helps urban modelling researchers to get synthetic population datasets at national level (currently limitated to England). This tool opens up new challenges/possibilities where external models (multi-level) like Agent-based models -ABM now can be tested in multi regions. However in models with location parameters striclty dependend on the population interactions, internal validation and calibrations process are seen as a relevant and requiered to properly tune this national behaivor. 

### ToDO to make progress in this experiment
- [] Read the Synt Pop file - Translate to snaphot then ASPICS can read the new dataset.
- [] Read and plot the attributes we need, we could plot
- [] Read the baseline use as priors - Areas to test Leeds ( ideally West Yorkshire), Liverpool, Devon, Manchester (Grand Manchester)

## Background Concepts

- Uncertanity of ABM
- Methods for Calibration
- ABC 

In [33]:
import math
import pandas as pd
import sys, os
import matplotlib.pyplot as plt
import numpy as np
import random
from run_model import OpenCLRunner

sys.path.append('../')
import synthpop_pb2
import convert_snapshot


The following function is based on [SPC scripts](https://github.com/alan-turing-institute/uatk-spc/blob/main/python/protobuf_to_csv.py) the idea is to read the .pb file created with the tool. However we need to make a translation from the proto file to snapshot which will integarte the data in the way ASPICS need it. 

```
def convert_to_csv(input_path):
    """Export some per-person attributes to CSV."""
    # Parse the .pb file
    print(f"Reading {input_path}")
    pop = synthpop_pb2.Population()
    f = open(input_path, "rb")
    pop.ParseFromString(f.read())
    f.close()

    # Based on the per-person information you're interested in, you can extract
    # and fill out different columns
    people = []
    for person in pop.people:
        # The Person message doesn't directly store MSOA. Look up from their household.
        msoa11cd = pop.households[person.household].msoa11cd

        record = {
            "person_id": person.id,
            "household_id": person.household,
            "msoa11cd": msoa11cd,
            "age_years": person.demographics.age_years,
            # Protobuf enum types show up as numbers; this converts to a string
            "pwkstat": synthpop_pb2.PwkStat.Name(person.employment.pwkstat),
            "diabetes": person.health.has_diabetes,
            "employment": person.employment.sic1d07,
        }

        # Add a column for the duration the person spends doing each activity
        for pair in person.activity_durations:
            key = synthpop_pb2.Activity.Name(pair.activity) + "_duration"
            record[key] = pair.duration

        people.append(record)

    df = pd.DataFrame.from_records(people)
    return(df)
```

```
## Reading the previous function
input_path = 'SPC_data/rutland.pb'
if __name__ == "__main__":
    df = convert_to_csv(input_path)
```

## Read the baseline data. Defined as prior to calibrate the model to a given area
Real observations (number of cases, deaths or hospital admission in the given area)
They need to be made cumulative as this is how they will be compared to the model.

#### Rutland area as test run due it size
The data for no of cases and the gam_cases data were created using [Ramp-UA - Observation Data](https://github.com/Urban-Analytics/RAMP-UA/tree/master/experiments/calibration/observation_data)

In [18]:
# New per day:
gam_cases = pd.read_csv(os.path.join("baseline_data", "gam_rutland_cases.csv"), header=0, names=["Day", "Cases"], )

# Cumulative
OBSERVATIONS = pd.DataFrame( {"Day": gam_cases['Day'], "Cases": gam_cases.cumsum()['Cases']} )

assert OBSERVATIONS.tail(1)['Cases'].values[0] == sum(gam_cases['Cases'])
print(f"Total cases: {sum(gam_cases['Cases'])}")

Total cases: 697


## Run ASPIC using the default parameters

The following cells provide a set of plots to define how the model run with the default parameeters ( manually calibrated for Devon area). In this example we use Rutland.

Before everything we will need to translate the .pg file to the snapshot requeried by ASPCIS [Usage guide](docs/usage_guide.md)

In [23]:
%run ../convert_snapshot.py -i SPC_data/rutland.pb -o ../data/snapshots/Rutland/cache.npz

Reading SPC_data/rutland.pb
Code block took: 0.14636 s
Removing 6029 people from 70 households because the household has > 10 people
Collapsing flows for 33446 people


100%|██████████| 33446/33446 [00:01<00:00, 24826.25it/s]


Code block took: 1.35073 s
Finalizing all coordinates
Code block took: 0.07968 s
Creating snapshot
Code block took: 0.26521 s
Wrote ../data/snapshots/Rutland/cache.npz


Great now we have the cache.npz in `data/snapshots`, go and take a quick look.

In [27]:
sys.path.append('../')

In [None]:
PARAMETERS_FILE = os.path.join("../../","model_parameters", "default.yml")
PARAMS = OpenCLRunner.create_parameters(parameters_file=PARAMETERS_FILE)
OPENCL_DIR = "../../microsim/opencl"
SNAPSHOT_FILEPATH = os.path.join(OPENCL_DIR, "snapshots", "cache.npz")
assert os.path.isfile(SNAPSHOT_FILEPATH), f"Snapshot doesn't exist: {SNAPSHOT_FILEPATH}"

In [None]:


ITERATIONS = 10  # Number of iterations to run for ( Initially suggestes as 100)
NUM_SEED_DAYS = 10  # Number of days to seed the population
USE_GPU = False
STORE_DETAILED_COUNTS = False
REPETITIONS = 2 #Initially suggested as 5

assert ITERATIONS < len(OBSERVATIONS), \
    f"Have more iterations ({ITERATIONS}) than observations ({len(OBSERVATIONS)})."

# Initialise the class so that its ready to run the model.
# This isn't actually necessary immediately as the `run_opencl_model_multi` function is a static method
# so doesn't read any of the class parameters, but the init is necessary
# for calibration later when some parameters can't be passed to the run function directly
OpenCLRunner.init(
    iterations = ITERATIONS, 
    repetitions = REPETITIONS, 
    observations = OBSERVATIONS,
    use_gpu = USE_GPU,
    store_detailed_counts = STORE_DETAILED_COUNTS, 
    parameters_file = PARAMETERS_FILE, 
    opencl_dir = OPENCL_DIR, 
    snapshot_filepath = SNAPSHOT_FILEPATH,
    use_healthier_pop = False
)