## Data

Lets choose a dataset and read it in. The data sets are 'conference', 'hospital', and 'looped_primary_school', all taken from sociopatterns.org. In these studies, participants wore radiofrequency identification (RFID) sensors that detect face-to-face proximity of other participants within $1-1.5$ meters in $20$-second intervals. Each data-set lists the identities of the people in contact, as well as the $20$-second interval of detection. These data have been formatted to appear in a table with the start and end times of the interaction listed instead. To exclude contacts detected while participants momentarily walked past one another, only contacts that are detected in at least two consecutive intervals are considered interactions. 

In [3]:
import pandas as pd
data='looped_primary_school'
original_df=pd.read_csv('data/'+data+'.txt', sep='\t', header=None, names=['ID1','ID2','start_time','end_time'])
# read it again with edge directions reversed (so that the disease can go in both directions)
reverse_df=pd.read_csv('data/'+data+'.txt', sep='\t', header=None, names=['ID2','ID1','start_time','end_time'])    
#put them together
df=pd.concat([original_df,reverse_df]) 

print(df.head())

    ID1   ID2  end_time  start_time
0  1538  1539     56580       56520
1  1538  1546     35340       35180
2  1538  1546     37720       37480
3  1538  1546     55960       55900
4  1538  1546     56060       56000


To make the simulation both simple and fast it is beneficial to keep the data in the following format:

In [5]:
ID_list=list(set(df['ID1']))     
contacts_dict={}
for node in ID_list:
    node_df=df[(df['ID1']==node)]
    names=node_df['ID2'].tolist()
    start_times=node_df['start_time'].tolist()
    end_times=node_df['end_time'].tolist()
    contacts_dict[node]=[]
    for i in range(len(names)):
        contacts_dict[node].append([names[i],start_times[i],end_times[i],0])


In addition to the data we need to choose the parameters that describe the disease model. These are all held in one dictionary: 

In [6]:
parameters={'beta':0.001,
            'l_mode':22, 
            'l_dispersion':1.1,
            'i_mode':2,
            'i_shape':5,
            'asymptomatic_proportion':0.0,
            }

## Transmissibility $\beta$

$\beta$ is the probability that transmission occurs during any one second of contact between an infectious individual and a susceptible individual.

## Latent period mode $\hat{\Delta}_{E}$


## Latent period dispersion $\sigma_{g}^{(E)}$

The duration of the latent period may vary between individuals depending on their age, gender, or other characteristics \cite{blythe1988distributed,lloyd2001destabilization,lloyd2001realistic,10.1371/journal.pmed.0020174}. While the latent period and the incubation period are not the same, we assume that the biological factors determining their length to be similar, i.e. the processes described in \cite{10.7554/eLife.30212}, and thus we assume that the distribution of latent periods is log-Normal \cite{sartwell1950distribution}. In the simulation, the latent duration for each infected individual is drawn from a log-Normal distribution with mode $\hat{\Delta}_{E}$ and dispersion factor $\sigma_{g}^{(E)}$ (the geometric standard deviation of the distribution). We use $\sigma=\sigma_{g}^{(E)}$ and $\mu= \sigma^{2}+\log(\hat{\Delta}_{E})$ to get the standard parameters for the log-Normal distribution.


## Infections period mode $\hat{\Delta}_{I}$


## Perseverance $1/k_{I}$

Once infected, the behavioral response of individuals may vary; some might leave the system (or take other measures to prevent infection) immediately, whereas some may remain a risk to others for a more prolonged duration \cite{doi:10.1093/aje/kwt196}. In the simulation, the duration of the infectious period of each individual is randomly selected from a gamma distribution with a mode of $\hat{\Delta}_{I}$ hours. We define perseverance as $1/k_{I}$ where $k_{I}$ is the shape parameter of the gamma distribution. By choosing the scale parameter of the Gamma distribution to be $\theta=\hat{\Delta}_{I}/(k_{I}-1)$ we ensure that the mode does not change while increasing the perseverance fattens the the tail of the distribution.

## Asymptomatic proportion $a$

Some members of the population may show no signs of infection (up to $28\%$ reported for influenza \cite{leung2015review} and $32\%$ for rhinovirus \cite{jacobs2013human}), or might just ignore them completely, in which case their behavior does not change. At the beginning of the simulation, a random sample of the population are chosen to be asymptomatic. These individuals, who make up a fraction $a$ of the total population, have an infectious period of $24$ hours. We also acknowledge that immunocompromised individuals are asymptomatic and infectious for extremely long periods of time \cite{10.1371/journal.pone.0148258}, however, we consider these cases to be too rare to incorporate into the model.


![alt text](Figs/Disease_simulation_flow_chart.png "Title")