# Microsim Analysis and dashboard

Analysis and visualisation of the outputs from the [microsim_model.py](./microsim_model.py). This includes preprocessing the pickled data about individuals' disease states and the danger scores associated with venues across the time period of the simulation. The user can choose which scenario(s), repetition(s) and days to process and visualise. 

## Running the script

The current script (microsim/dashboard.py) is standalone: it can be run after microsim/microsim_model.py has produced ouput or if the user has third party output. It takes input parameters from a yml file (default: model_parameters/default_dashboard.yml).

The script expects the output to be in basedir/datadir/output/scenario_dir. Basedir is Python's current working directory (usually RAMP-UA). Datadir and scenario_dir can be specified in the yml file. Other parameters that can be changed in the yml file include the start and end day of the simulation (default: 0 to last day) and start and end run (default 0 to 'number of folders in scenario_dir - 1').

The current version of the script accepts up to 10 scenarios.

```YAML
dashboard: # Parameters for the dashboard (visualisation), all are optional
  output_name: test_dashboard # name for dashboard output file (.html will be added)
  data_dir: devon_data # Root directory to load data from
  start_day: 0    # Start day
  end_day: 80      # End day (inclusive)
  start_run: 0   # Iteration/run to start
  end_run: 1     # Last iteration/run to include

  scenario_name: # names for scenario
  - Control
  - Alternative
  scenario_dir: # directory for scenario (subdir of /output), same order as names above
  - sc0
  - sc1 
```

To run the script from the command line, navigate to basedir and run:

 ```python microsim/dashboard.py```

The script's behaviour depends on the input parameters. 

- If only **1 scenario*** was specified, it will generate an output_name (default: dashboard) .html file which should automatically pop up in your default browser. The values displayed in the plots and maps are the average value across the different repetitions.

- If **2 scenarios** are specified, the script will generate a output_name.html file as before, but this time the values displayed are the difference between the (averages of the) second and first scenario (Alternative - Control in the yml above). A second dashboard, output_name_scenarios.html, will be created to compare selected summary measures per scenario (these correspond to the values one would obtain when running only one scenario).

- If **more than 2 scenarios** are specified, the script will only generate the output_name_scenarios.html file as above.

Each interactive dashboard (created using bokeh) allows you to zoom, pan and reset using the toolbar next to (or on top of) each figure. You can get information by hovering over points and hide/unhide points and lines by clicking on their entry in the legend.

The html pages are saved in the data directory, so if they don't open automatically you can manually open them in a browser. You can also share the file with others; you do not need Python (or any other software other than a browser) in order to view and interact with the html dashboard.

## What the script does

### Initialisation

In addition to bokeh (https://docs.bokeh.org/en/latest/index.html#), the script needs the following libraries:

In [1]:
import os
import pickle
import pandas as pd
import numpy as np
import geopandas as gpd
import imageio
from shapely.geometry import Point
import json

### Reading data

Each time the model is run it outputs data into a new sub-directory under (basedir)/data/output/ numbered incrementally. E.g.:

- data/output/scenario/0
- data/output/scenario/1
- ...

The script assumes the current working directory is (basedir) and builds the paths from there based on the data and scenario directory named provided in the yml file:

In each run folder (/0, /1 etc), it reads in the following data:
- individuals.pickle (contains MSOA, condition for each individual and day)
- (venue).pickle files (contains danger scores for each venue and day) e.g. Retail.pickle, PrimarySchool.pickle etc

The script also reads in the following data (once):
- school and retail data (contains ID, postcode/MSOA and lat/lon coordinates)
- shp file of England MSOAs (for choropleth)
- postcode to MSOA conversion file (PCD_OA_LSOA_MSOA_LAD_AUG19_UK_LU.csv, for retail file)

### Preprocessing data

The conditions and venues are defined in a dictionary for easy look up and looping across conditions/venues.

In [2]:
conditions_dict = {
      "susceptible": 0,
      "exposed": 1,
      "presymptomatic": 2,
      "symptomatic": 3,
      "asymptomatic": 4,
      "recovered": 5,
      "dead": 6,
}

locations_dict = {
  "PrimarySchool": "PrimarySchool",
  "SecondarySchool": "SecondarySchool",
  "Retail": "Retail",
  "Work": "Work",
  "Home": "Home",
}

Variables for plotting are read in from pickle files, preprocessed and stored in dictionaries. The dictionary key is either the condition of the venue type, referring to the look up dictionaries above. The stored variable for a given key is a dataframe or series.

- msoacounts_dict (2D) - number of people per MSOA and per day (for given condition)
- totalcounts_dict (1D) - number of people per day, summed across all MSOAs (for given condition)
- cumcounts_dict (1D) - number of people per day, summed across given time period (for given condition)
- agecounts (2D) - number of people per age category and per day (for given condition)
- uniquecounts (1D) - number of people  with 'final' disease status across time period e.g. someone who is presymptomatic, symptomatic and recoverd is only counted once as recovered

- dangers_dict (2D) - danger score per venue and per day (for given venue type)
- dangers_msoa_dict (2D) - average danger score per MSOA and per day (for given venue type). Average danger = sum of danger scores in MSOA / nr of venues in MSOA

There are 3 version of the above dictionaries: 
- ending in *dict* (as above): mean value across runs, used for plotting
- ending in *dict_3d*: raw data for all runs stored as 3D array (run being the third dimension), could be used to check data
- ending *dict_std*: standard deviation across runs, could be used for error bars

### Plotting with bokeh

Bokeh uses the general structure below:

In [3]:
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# create a column data source: a dictionary that points to the variables to plot, can be shared between plots
source = ColumnDataSource(data=dict(x1=x, y1=y))

# output to static HTML file
output_file("outputname.html")

# create a new plot with a title and axis labels
p1 = figure(title="title", x_axis_label='x', y_axis_label='y')

# add a renderer e.g. line, point..., refer to column data source keys (here x1 and y1)
p1.line(x1, y1, source = source, legend_label="label", line_width=2)

# show the results
show(p1)

NameError: name 'ColumnDataSource' is not defined

A toolbar is added by default (and can be customized e.g. to add hover) for each plot, allowing the user to zoom, pan, reset, save etc. 

The script produces 3 types of plots (at the moment):

#### Line plot

Different conditions, venue types or age categories are plotted as lines. One of the axes is a count (number of people or danger score), the other can be time (days) or location (msoa or venue). 

For one scenario, the values are 'raw' data (the mean across repetitions):

![title](lineplot.jpg)

When 2 scenarios have been specified, the values are the difference between the 'raw' data from both scenarios (scenario 1 - scenario 0). A value of zero means no difference between both scenarios. 

![title](agecat.jpg)

#### Heatmap

Counts (number of people or danger score) are plotted as colours. The axes are time (days) and location (msoa or venue).

![title](heatmap.jpg)

#### Choropleth

Map of MSOAs (possibly masked to restrict them to a relevant area) coloured in proportion to daily values (total number of people with given condition, or average danger score of venues) within each area. A slider above the choropleth allows you to select/move between days.

![title](choropleth.jpg)

#### Histograms

For comparison between scenarios, dodged bar charts show counts (number of people or danger score) on the y axis per category (condition or venue) and scenario (different colours). 

![title](histogram.jpg)