# Microsim Analysis

Analysis and visualisation of the outputs from the [microsim_model.py](./microsim_model.py). This includes preprocessing the pickled data about individuals' disease states and the danger scores associated with venues across the time period of the simulation. If the model was run multiple times, data from all runs are first aggregated (using mean and standard deviation).

## Running the script

The current script (microsim/dashboard.py) is standalone: it can be run after microsim/microsim_model.py has produced ouput or if the user has third party output. The script expects the output to be in basedir/datadir/output, with basedir being Python's current working directory (usually RAMP-UA). Datadir is set to devon_data by the line: ```data_dir = os.path.join(base_dir, "devon_data")``` - change this in case of a different directory.

To run the script from the command line, navigate to basedir and run:

```python microsim/dashboard.py```

This should automatically pop up an html page in your default browser displaying the interactive dashboard (created using bokeh). You can zoom, pan and reset using the toolbar next to (or on top of) each figure. You can get information by hovering over points and hide/unhide points and lines by clicking on their entry in the legend.

The html page is also saved in the data directory (as dashboard.html), so if it doesn't open automatically you can manually open it in a browser. You can also share the file with others; you do not need Python (or any other software other than a browser) in order to view and interact with the html dashboard.

## What the script does

### Initialisation

In addition to bokeh (https://docs.bokeh.org/en/latest/index.html#), the script needs the following libraries:

In [None]:
import os
import pickle
import pandas as pd
import numpy as np
import geopandas as gpd
import imageio
from shapely.geometry import Point
import json

### Reading data

Each time the model is run it outputs data into a new sub-directory under (basedir)/data/output/ numbered incrementally. E.g.:

- data/output/0
- data/output/1
- ...

The script assumes the current working directory is (basedir) and builds the paths from there:

In [None]:
base_dir = os.getcwd()  # get current directory (usually RAMP-UA)
data_dir = os.path.join(base_dir, "data") # go to data dir

In each run folder (/0, /1 etc), it reads in the following data:
- individuals.pickle (contains MSOA, condition for each individual and day)
- (venue).pickle files (contains danger scores for each venue and day) e.g. Retail.pickle, PrimarySchool.pickle etc

The script also reads in the following data (once):
- school and retail data (contains ID, postcode/MSOA and lat/lon coordinates)
- shp file of England MSOAs (for choropleth)
- postcode to MSOA conversion file (PCD_OA_LSOA_MSOA_LAD_AUG19_UK_LU.csv, for retail file)

### Preprocessing data

The conditions and venues are defined in a dictionary for easy look up and looping across conditions/venues.

In [None]:
conditions_dict = {
  "susceptible": 0,
  "presymptomatic": 1,
  "symptomatic": 2,
  "recovered": 3,
  "dead": 4,
}

locations_dict = {
  "PrimarySchool": "PrimarySchool",
  "SecondarySchool": "SecondarySchool",
  "Retail": "Retail",
  "Work": "Work",
  "Home": "Home",
}

Variables for plotting are read in from pickle files, preprocessed and stored in dictionaries. The dictionary key is either the condition of the venue type, referring to the look up dictionaries above. The stored variable for a given key is a dataframe or series.
- msoacounts_dict (2D) - number of people per MSOA and per day (for given condition)
- totalcounts_dict (1D) - number of people per day, summed across all MSOAs (for given condition)
- cumcounts_dict (1D) - number of people per day, summed across given time period (for given condition)
- dangers_dict (2D) - danger score per venue and per day (for given venue type)
- dangers_msoa_dict (2D) - average danger score per MSOA and per day (for given venue type). Average danger = sum of danger scores in MSOA / nr of venues in MSOA

There are 3 version of the above dictionaries: 
- ending in *dict* (as above): mean value across runs, used for plotting
- ending in *dict_3d*: raw data for all runs stored as 3D array (run being the third dimension), could be used to check data
- ending *dict_std*: standard deviation across runs, could be used for error bars

### Plotting with bokeh

Bokeh uses the general structure below:

In [None]:
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# create a column data source: a dictionary that points to the variables to plot, can be shared between plots
source = ColumnDataSource(data=dict(x1=x, y1=y))

# output to static HTML file
output_file("outputname.html")

# create a new plot with a title and axis labels
p1 = figure(title="title", x_axis_label='x', y_axis_label='y')

# add a renderer e.g. line, point..., refer to column data source keys (here x1 and y1)
p1.line(x1, y1, source = source, legend_label="label", line_width=2)

# show the results
show(p1)

A toolbar is added by default (and can be customized e.g. to add hover) for each plot, allowing the user to zoom, pan, reset, save etc. 

The script produces 3 types of plots (at the moment):

#### Line plot

Different conditions or venue types are plotted as lines. One of the axes is a count (number of people or danger score), the other can be time (days) or location (msoa or venue).

![title](lineplot.jpg)

#### Heatmap

Counts (number of people or danger score) are plotted as colours. The axes are time (days) and location (msoa or venue).

![title](heatmap.jpg)

#### Choropleth

Map of MSOAs (possibly masked to restrict them to a relevant area) coloured in proportion to daily values (total number of people with given condition, or average danger score of venues) within each area. A slider above the choropleth allows you to select/move between days.

![title](choropleth.jpg)