# Stochastically Generated Clinical Visits
## A Multi-Graph Micro-Simulation Example

### Introduction

During a privacy and anonymity software course we were challenged to provide an example of graph data, analogous to datasets found in social networks, that is native to health care. In this notebook we develop an example dynamic graph through the simulation of patient visits to clinical staff.

In a fixed population, where all members are patients, and a subset of the population are clinical staff, the visits of patients with clinical staff forms a dynamic graph. The fixed nodes of the graph are all the individual members of the population.
$$
\begin{align}
N & = \left\lbrace \text{people} \right\rbrace
\end{align}
$$
A subset of these nodes correspond to the clinicians in the population
$$
\begin{align}
N_\text{staff} & = \left\lbrace \text{staff} \right\rbrace\\
    & \subseteq N
\end{align}
$$
On each day the are a set of vertexes that transiently exist representing the patients visiting clinicians on that day.
$$
\begin{align}
V_\text{day} & = \left\lbrace \left( \text{day}, \text{patient}, \text{staff} \right) : \text{patient visits staff on day} \right\rbrace
\end{align}
$$
The entirety of all visits that occurred is then the union of all the transient vertex sets over all the days
$$
\begin{align}
V & = \bigcup_{\text{days}} V_\text{day}
\end{align}
$$
However this construction naively assumes that any patient can visit any clinician on any given day. This assumption is flawed, and leads overestimating the anonymity of the data. In the most rudimentary form, patients and clinicians are subdivided into clinics. Each clinic is effectively a smaller subpopulation, constraining the clinicians a patient can see, and the patients a clinician can see. Furthermore, the clinicians work according to a shift schedule, so that the clinicians that can be seen on any given day is constrained to a smaller subset. Finally clinicians generally do not treat themselves. In the context of this simulation we accomplish this by ensuring no clinician is a patient at the clinic in which they are employed.

The implication for anonymity is that the effective probability of one person is scaled down by the clinic size. For example in a population of $1000000$ people with $1000$ clinics, the events of a single person is effectively a sample from $1$ in $1000$, not $1$ in $1000000$, because there are only $1000$ patients per clinic on average.

The engine of the simulation has been wrapped in a module called `MicroSimulation`, which can be found on GitHub [here](https://github.com/aaronsheldon/tableau-examples/blob/master/MicroSimulation.jl).

In [1]:
include("MicroSimulation.jl")

MicroSimulation

Before running the stochastic event simulation a population must be created, randomly assigning clinics to patients and staff. We do this using the `Registration` constructor.

In [2]:
?MicroSimulation.Registrations

```
Registrations(personcount, staffcount, cliniccount)
```

A list of persons generated by incrementally assigning a person identifier, randomly assigning a clinic identifier from `1:cliniccount` to the first `staffcount` persons, and randomly assigning a second clinic identifier to every person. The clinic a staff is employed at will never be the same as the clinic at which the staff is a patient.

# Fields

  * `personidentifiers`: identifiers of the people.
  * `staffclinics`: identifiers of the clinic the person provides care.
  * `patientclinics`: identifiers of the clinic at which the person is a patient.
  * `personcount`: number of people.
  * `staffcount`: number of clinical staff.
  * `cliniccount`: number of clinics employing staff, and caring for patients.

# Example

```
julia> rs = MicroSimulation.Registrations(10, 4, 2)
MicroSimulation.Registrations([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 1, 1, 2, 0, 0, 0, 0, 0, 0], [2, 2, 2, 1, 2, 2, 1, 1, 1, 1], 10, 4, 2)
```


This simulation consists of $4100000$ patients, of which $13000$ are also clinical staff, distributed across $2700$ clinics. This is based off of a rough aggregation of the fiscal 2016/17 visits statistics.

| Statistic     | Value      |
|---------------|------------|
| Population    | $4067175$  |
| Patients      | $3578579$  |
| Providers     | $12486$    |
| Clinics       | $2687$     |
| Patient Days  | $24438296$ |
| Provider Days | $1423060$  |
| Visits        | $26360227$ |

In [10]:
rs = MicroSimulation.Registrations(4100000, 13000, 1000)

MicroSimulation.Registrations([1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  4099991, 4099992, 4099993, 4099994, 4099995, 4099996, 4099997, 4099998, 4099999, 4100000], [703, 46, 289, 83, 488, 759, 386, 799, 605, 500  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [926, 539, 424, 203, 969, 913, 533, 113, 155, 923  …  1, 347, 64, 791, 776, 510, 331, 525, 191, 459], 4100000, 13000, 1000)

Given the randomly generated assignments of clinics to patients and staff we can now stochastically generate visit events by sequentially sampling the conditional probabilities. This is done by call the Events constructor.

In [4]:
?MicroSimulation.Events

```
Events(daterange, registrations, expectedvisits, expectedload)
```

A list of events stochastically generated by the microsimulation at the time of instantiation of the object. The process is parametertized by `expectedvisits`, the mean expected number of clinic visits a single person will have in the `daterange`, and by `expectedload` the expected number of patients a single clinician can attend to in a single day. These parameters are used to calculate the number of staff that are rostered on a single day.

# Fields

  * `days`: days of the patient visits.
  * `patientidentifiers`: identifiers of the patients.
  * `staffidentifiers`: identifiers of the staff caring for the patients.

# Example

```
julia> es = MicroSimulation.Events(Date(2014, 01, 01):Date(2014, 12, 31), MicroSimulation.Registrations(400, 10, 2), 4, 28)
MicroSimulation.Events(Date[2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-02, 2014-01-02, 2014-01-02, 2014-01-02, 2014-01-03  …  2014-12-29, 2014-12-30, 2014-12-30, 2014-12-30, 2014-12-30, 2014-12-31, 2014-12-31, 2014-12-31, 2014-12-31, 2014-12-31], [47, 78, 188, 267, 298, 286, 302, 387, 395, 13  …  365, 3, 56, 121, 232, 5, 35, 187, 224, 252], [5, 5, 6, 6, 5, 7, 7, 7, 7, 6  …  9, 7, 7, 7, 1, 6, 8, 6, 8, 6])
```


We simulate a single year's events, where each person has $6$ visits on average, and each clinical staff can attend to $19$ patients in a single day on average. Together this defines the average number of staff that must be scheduled to work on any single day in any single clinic.
$$
\begin{align}
\text{scheduled staff at clinic} & = \frac{\text{number of patients at clinic} \times \text{expected visits per patient}}{\text{days simulated} \times \text{expected daily client load per staff}} 
\end{align}
$$

In [11]:
evs = MicroSimulation.Events(Date(2016, 4, 1):Date(2017, 3, 31), rs, 6, 19)

MicroSimulation.Events(Date[2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01  …  2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31], [150, 248, 286, 349, 503, 508, 564, 607, 671, 704  …  4099320, 4099431, 4099579, 4099609, 4099643, 4099685, 4099692, 4099725, 4099745, 4099817], [3645, 4221, 3100, 12200, 3381, 12923, 11723, 855, 454, 7634  …  6784, 12345, 1598, 5646, 11022, 2927, 6824, 11581, 5306, 1074])