# RAMP - Urban Analytics: Microsim User Guide

## Introduction

This microsimulation models aims to simulate the spread of a pathogen through a population, taking into account where people live and the places they visit (e.g. retail, work, school). In this version of the model, we simulate COVID-19/SARS-CoV-2 transmission in England, but the model could be repurposed for other infectious diseases and/or geographical areas. Potential uses for the model include simulating the effect of starting and stopping different interventions (e.g. school closures) and predicting the impact of asmptomatic transmission.

This notebook is a work in progress, as is the model itself. Please check back later for updated versions of the code and documentation.

## Overview of the model's components

This section shows how the different components (classes, functions, input data etc) fit together to simulate how danger scores for individuals and locations change over time.

### Input data

#### Individuals and households

SPENSER (Synthetic Population Estimation and Scenario Projection Model) is used to create population estimates at the household level for England (https://lida.leeds.ac.uk/research-projects/spenser-synthetic-population-estimation-and-scenario-projection-model/
). In other words, it generates a set of representative individuals (with a given age, sex and ethnicity) assigned to households. A household is one ore more individuals assumed to be living in the same 'home' (a physical space such as a house or flat).  

#### Health status

Individuals are given one of four possible health/disease states:
 - 0: Susceptible -- never had disease or no longer immune.
 - 1: Exposed (pre-symptomatic) -- individual has been infected and can infect others, but not (yet) showing symptoms.
 - 2: Exposed (symptomatic) -- individual has been infected, can infect others, and is showing symptoms.
 - 3: Recovered -- individual has immunity.
 - 4: Died -- indivdual has died.

#### Activities

Individuals are assigned a job ('work', high level category for now), a primary or secondary school and/or shopping behaviour ('retail'). Other activities will be added in future iterations as required. Each of these activities is assigned a duration (fraction of day the individual spends doing this) and one or more possible locations (e.g. individual has 1 office but may visit multiple shops). A 'flow' variable gives the probability of an individual going to each location (fractions which sum to 1 per activity). E.g. if there are three local shops that an individual is most likely to visit, each will have a 'probability' of ~0.33.

#### Travel mode and time 

To be added. We will include the risks associated with travelling by different modes.

#### 'Danger' scores

When 'infectious' individuals visit locations, they increase the 'danger' score for that location. This translates into an increased likelihood ('risk') that another 'susceptible' individual travelling to the same location will become 'exposed'. 

The danger score assigned to each location is calculated as

$danger_l = \sum_{i=1}^{nr-infectious} ( duration_{il} * flow_{il} * danger$_$multiplier )$

The risk that individuals get from a location is calculatd as

$risk_i = \sum_{l=1}^{nr-locations} ( duration_{il} * flow_{il} * danger_l * risk$_$multiplier )$

Danger_multiplier and risk_multiplier are user-defined weights (default set to 1). With these default weights, then if an infected individual spends all of their time in one location, they will give that location a 'danger' of 1.0. If another individual spends all of their time in that location, they will receive a 'risk' of 1.0.

### Classes and functions

When running the code, a [**`Microsim`**](./microsim/microsim_model.py) object is created which holds information about the individuals in the population (demographics, health and their activities, including where they go and for how long). The component `activity_location` points to a number of different [**`ActivityLocation`**](./microsim/activity_location.py) objects. These hold information about the locations of different activities (mainly an identifier, flows and danger score).

After the initial `Microsim` object is created, it can then be updated through a series of steps (days). At each step, the danger scores assigned to all venues are updated based on how many infectious individuals have visited. Next, each individual gets an updated risk based on where they have been, and their disease status is updated accordingly.

#### Microsim functions
![title](microsim_defs.jpg)

#### ActivityLocation functions
![title](actloc_defs.jpg)

#### Resulting objects
![title](Objs.jpg)

## Example with dummy data

**INCLUDE A NOTE ABOUT HOW TO INSTALL THE LIBRARIES ETC. Probably just point to the [README](./READEME.md).**

This section follows an example dummy population of 17 individuals. To start the model using the defaults, run microsim_model.py from the root directory:

```
python microsim/microsim_model.py 
```

The Command Line Interface Creation Kit (Click) is used to start the script, possibly with additional parameters iterations (number of steps/days) and data_dir (directory where data is stored, usually root/data. To see the arguments that are avialable, run with the `--help` flag: `python microsim/microsim_model.py --help`

```
python microsim/microsim_model.py --help
Usage: microsim_model.py [OPTIONS]

Options:
  --iterations INTEGER      Number of model iterations. 0 means just run the
                            initialisation

  --data_dir TEXT           Root directory to load data from
  --do_visualisations TEXT  Whether to generate plots and associated data
                            (default True)

  --debug TEXT              Whether to run some more expensive checks (default
                            False)

  --help                    Show this message and exit.
```

This calls the **run** function and passes on the optional argumants iterations and data_dir. This will create a **Microsim** object (in this case restricting MSOAs to Devon only) and run **step** to advance the model for as many iterations as requested. 

In [None]:
def run(iterations, data_dir):
    num_iter = iterations
    # Restrict MSOAs to Devon only - read in list
    devon_msoas = pd.read_csv(os.path.join(data_dir, "devon_msoas.csv"), header=None, names=["x", "y", "Num", "Code", "Desc"])
    m = Microsim(study_msoas=list(devon_msoas.Code), data_dir=data_dir) # Create Microsim object
    # Step the model
    for i in range(num_iter):
        m.step() # Run 1 more iteration

This function also saves some data to pickle files in an *output* directory in the *data* directory: the initial microsim object (*m0.pickle*), an *Individuals.pickle* file with disease states across time and files for each venue with danger scores across time (e.g. *Work.pickle*, *Retail.pickle* etc). These are useful for further analysis and plotting.

Most classes and functions should be well documented. For more information, see the help documentation. E.g.:

```
m = Microsim()
help(m)
```

The first step when creating the `Microsim` object (see the `__init__()` function) is to read in data about the population as created by SPENSER (see earlier) via **read_msm_data**. These data are collections of csv files located in data/msm_data, which is turned into dataframes *individuals* and *households* (linked via HID, the household identifier).

A list of unique and sorted MSOAs *all_msoas* is extracted from this data using **extract_msoas_from_indiviuals** (from *individuals.Area*).

Next, **check_study_area** will remove individuals and households outside the predefined study area if required. If *study_msoas* has been specified by the user, it will remove all rows in the *individuals* and *households* dataframes where the *Area* column is not a member of the *study_msoas* list. If *study_msoas*  is empty (not defined), the *individuals* and *households* dataframes are kept as they are and the parameter *study_msoas* is set to *all_msoas*.

In [None]:
        self.individuals, self.households = Microsim.read_msm_data()
    
        self.all_msoas = Microsim.extract_msoas_from_indiviuals(self.individuals)
        
        self.study_msoas, self.individuals, self.households = \
            Microsim.check_study_area(self.all_msoas, study_msoas, self.individuals, self.households)

For each type of 'activity' (retail, work, primary and secondary school and home), an **ActivityLocation** object will be created (code in *activity_location.py*). These ActivityLocation objects are stored in a dictionary, which makes it possible to run through all activities and calculate risks and dangers using the same code.

In [None]:
        self.activity_locations: Dict[str, ActivityLocation] = {} # initialise empty dictionary

For each 'activity' (e.g shopping), we need to store the following things:

- *locations* dataframe containing all the places where the activity can take place (e.g. a list of shops), possibly with further details (such as shop name, geographical coordinates etc). Importantly, this dataframe will have a *Danger* column which records its danger score (based on how many  infectious people have visited the location recently).

- Three columns in the *Individuals* dataframe e.g. *Retail_Venues*, *Retail_Flows* and *Retail_Duration*
 -  *\_Venues* which location(s) each individual is likely to do that activity. This can be a single value (e.g. home) or a list (e.g. retail, where one individual may visit multiple shops) of indexes to the *locations* dataframe. E.g. one individual might have as shop venues [2,54,19]. Those numbers refer to the row numbers of locations in the retail locations dataframe. So venue '2' is the third venue in the list of all the locations associated with retail.
 -  *\_Flows*: how likely the individual is to do to the activity at each of the venues. If venues is a list then flows are also a list (in the same order). E.g. for the individual mentioned above this might be flows=[0.8,0.1,0.1] which means they are most likely to go to venue with index 2, and less likely to go to the other two.
 -  *\_Duration*: proportion of time people spend doing this activity (sums to 1 for all activities)
 
The duration data for all activities comes from input data in data/\*tu_health/\*Complete.txt which is read in using `attach_time_use_and_health_data`. In addition to time use (columns *pwork*, *pschool* etc), this file also contains information about individuals' occupation (used to create the *workplaces* variable, see below) and health. All this data is added to the individuals dataframe. 

Next the same function also creates the necessary information for the activity of staying at home. For this activity, the individual will be assigned a virtual house based on their household ID (so they can possibly infect others in the same household while at home) and a flow of 1 (assume they only have 1 virtual house). The function appends *Home_Venues* (the household ID) and *Home_Flows* (set to 1) columns to the *individuals* dataframe. It also adds a *Danger* column to the *households* dataframe (the locations dataframe for the home activity) to start tracking each house's danger score. 

For going to work, the individual is assigned a virtual office based on their job title (*workplaces* variable, which is the *location* dataframe for the work activity) and a flow of 1 (assume they only have 1 virtual office and job) through the `add_work_flows()` function. In other words, all accountants work in the same virtual office, all police officers in the same virtual police station etc. 

In case of Retail and Schools, the flow data are estimations based on a spatial interation model, read in via the `read_retail_flows_data()` or `read_school_flows_data()` function. Next, all individuals in each MSOA are assigned the appropriate flows via the `add_individual_flows()` function.

In [None]:
        # Home
        home_name = "Home"
        self.individuals, self.households = Microsim.attach_time_use_and_health_data(self.individuals, home_name, self.study_msoas)
        self.activity_locations[home_name] = ActivityLocation(name=home_name, locations=self.households,
                                                              flows=None, individuals=self.individuals,
                                                              duration_col="phome")

        # Retail
        retail_name = "Retail"
        stores, stores_flows = Microsim.read_retail_flows_data(self.study_msoas)
        self.individuals = Microsim.add_individual_flows(retail_name, self.individuals, stores_flows)
        self.activity_locations[retail_name] = \
            ActivityLocation(retail_name, stores, stores_flows, self.individuals, "pshop")

        # Schools (primary and secondary)
        primary_name = "PrimarySchool"
        secondary_name = "SecondarySchool"
        schools, primary_flows, secondary_flows = \
            Microsim.read_school_flows_data(self.study_msoas)
        self.individuals = Microsim.add_individual_flows(primary_name, self.individuals, primary_flows)
        self.activity_locations[primary_name] = \
            ActivityLocation(primary_name, schools.copy(), primary_flows, self.individuals, "pschool")
        self.individuals = Microsim.add_individual_flows(secondary_name, self.individuals, secondary_flows)
        self.activity_locations[secondary_name] = \
            ActivityLocation(secondary_name, schools.copy(), secondary_flows, self.individuals, "pschool")

        # Work
        work_name = "Work"
        possible_jobs = sorted(self.individuals.soc2010b.unique())  # list of possible jobs in alphabetical order
        workplaces = pd.DataFrame({'ID': range(0, 0+len(possible_jobs))})  # df with all possible 'virtual offices'
        Microsim._add_location_columns(workplaces, location_names=possible_jobs)
        self.individuals = Microsim.add_work_flows(work_name, self.individuals, workplaces)
        self.activity_locations[work_name] = ActivityLocation(name=work_name, locations=workplaces, flows=None,
                                                              individuals=self.individuals, duration_col="pwork")        

Next, individuals are assigned an initial disease (SEIR) status

In [None]:
        self.individuals = Microsim.add_disease_columns(self.individuals)  # Add some necessary columns
        self.individuals = Microsim.assign_initial_disease_status(self.individuals)

Add step functionality and plotting

## Assumptions

This section discusses assumptions, simplifications etc