<h1 style="text-align: center;">Individual-based Simulation Models of Public Health</h1>

<p style="text-align: center;">James Collins</p>
<p style="text-align: center;">July 25, 2019</p>

Hi everyone.  

My name is ...

Today I'll be talking about TITLE. In particular, I'll be talking about the process of simulation modeling and implementation.

I prefer an open style of presentations, so feel free to stop me at any time if you have questions.

# The Simulation Science Team

I lead the engineering efforts on the simulation science team. What we do, in many ways, is similar to what happens on many other teams throughout the institute: public health modeling.

**Slide**

In particular, we try to analyze the impacts of interventions on health systems. We look at outcomes like the change in disease burden, risk exposure, and the health system costs.

**Slide**

I like to always keep in mind why we're doing what we do because it helps guide model development. 

We try to provide an agenda-free evidence base for people trying to make hard decisions.  Decisions like which studies to fund, which intervention programs to roll out, and sometimes where there are important data gaps that should be filled.

**Slide**

We produce this evidence using computer simulations of health systems.

I'm not going to go much in to the intervention side of things today. 
If you join us next week, for Nathaniel's presentation, he'll dive into some detail about one of our recent simulation models and show you some results.  

Instead, I'm going to talk more generally about the modeling process and how our models are implemented.  

Let's start with a simple question:

- **What we do**: Analyze the impacts of interventions on population health and health systems.

- **Why we do it**: To give policy and decision makers evidence for intervention comparisons.

- **How we do it**: Computer simulations!

# What is Simulation?

What is a simulation?

No, seriously. Simulation is a kind of field that defies easy definition. What do you all think?  

<img src="astronaut.png" style="display: block; margin-left: auto; margin-right: auto; width: 30%">

This is one of my favorite examples of simulation. Here you see an astronaut training for space walk missions in his space suit underwater.  

One of the reasons people do simulations is when the actual system is somehow hard to measure or interact with.  

This is an example of a physical simulation of a real system.  It's also an example of a human-in-the-loop simulation, which means that some part of the evolution of the system is due to unmodeled human behavior.

<img src="flight-sim.jpg" style="display: block; margin-left: auto; margin-right: auto; width: 30%">


Another common example are training flight simulators. The museum of flight has a cool one that I definitely recommend.  

This is a hybrid computer/physical simulation where a great deal of modeling is done on the computational side while still providing the pilot with realistic flight controls.


<img src="typhoon.gif" style="display: block; margin-left: auto; margin-right: auto; width: 30%">


Here's an example of something considerably closer to what we do.  This is a simulation of typhoon Mawar in 2005. It's a purely computational simulation of an actual weather event.  It's driven by data and mathematical models of pressure, temperature, currents, and many other things.

The simulations we produce are also purely computational simulations, so we're going to focus in on what that process typically looks like.

<img src="sim_flow1.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

Computational Simulation is a complex process, but let's look at a simplified version of how this might go (and one that coincidentally mirrors how my team works).

We'll talk through this process at a high level first, and then dive into some details.

<img src="sim_flow2.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">


Computational simulation pretty much always starts with a real world system. 

To carry us through here, let's think about the world of global health: how people are born and die, how much disability they experience, and how their interactions with the health system, infrastructure, transportation, marketing, etc, shape their health outcomes.

<img src="sim_flow3.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

Meanwhile, diligent scientists and doctors and buereaucrats and college students are out collecting data about all these interactions with experiments and observational studies and surveys, building their own models and writing papers.

<img src="sim_flow4.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

Here at the institute, we're gathering all that data through our worldwide collaborator network, you all are extracting, cleaning, and labeling that data, and using your own statistical models and simulations to produce descriptive and predictive outputs about disease prevalence and risk exposure and mortality and health expenditure.

And seriously, thanks. My work would be totally impossible without all the work you all do.

<img src="sim_flow5.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

Meanwhile, in this **definitely apochryphal** overview, Chris Murray says he wants to be able to count how many hamburgers a kid eats, see if they really do brush their teeth when their parents tell them to, and determine whether giving them an egg every day will make them healthier.  

More seriously, we wanted to take all this incredible descriptive analysis available from the GBD and try to make models of how interventions like new vaccines, different medication schedules, and yes, giving kids eggs every day, could alter population health outcomes. 

These are complex questions because they really do depend on things like how many healthcare facilities are nearby, how heterogenous the population is with respect to risk exposure and disease prevalence, and how those things all relate to each other. 

We decided early on to use a modeling paradigm using discrete-time, individual-based, Monte Carlo simulations. We'll come back to that in a bit. The important part is that this is a reasonably common modeling strategy for intervention analysis. 

The models are still difficult to put together, but we had a few pilot models in mind (a model of vaccines for diarrheal diseases and a model for opportunistic screening of blood pressure followed by a prescription program).

<img src="sim_flow6.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

Knowing that this was a common modeling paradigm for intervention analysis, we went out looking for tools.  Don't reinvent the wheel if you don't have to is probably the best piece of programming advice I can give you.  

We had several major considerations, but two important ones were whether we could leverage all the data produced in the GBD and whether the tool was flexible enough to support a variety of models and interventions. 

After several months we came to the sad (but fortuitous for me!) decision that the easiest thing to do would be to build our own. This is always a dangerous proposition. 

There are thousands of hyper-specific, probably incorrect individual-based models programmed by beleagured grad students floating around on hard drives and academic servers and, occassionally, on github.  

Building good software is hard, and a very different skill from building good public health models. Luckily the team hired an incredibly deft engineer named Alec Deason and he single-handedly built out a great deal of the infrastructure at the heart of our current software ecosystem.

<img src="sim_flow7.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">


There are other important aspects of computational modeling that I'm going to gloss over a bit here. 

Importantly, the model by itself is something we can reason about and make predictions from. It is usually made of math equations, flow charts and diagrams, and narrative descriptions. The predictions we make from the vital in validation of our implementations.  

Additionally, the production of observations from the executable models is also delicate work that we spend a lot of time on.  For our purposes, we'll pretend it's a trivial task and that we have immediate access to everything in the simulation.  

<img src="sim_flow8.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">


We're going to focus down on this section of model and talk in more detail about how my team does model development and implementation.  Let's re-orient.

<img src="modeling1.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">


So we've squashed some of our boxes and bubbles here.  The previous diagram is a bit idealized at the start of a complex project. Model building and implementation are typically moving hand in hand early on as we figure out what's feasible and what's not.

When we first got started, everything was incredibly labor intensive. Our first models took more than a year to produce. We didn't know yet exactly what was going on. We didn't understand our data sources. 

We built a bunch of prototypes and throw them away. We built more prototypes, and a few stuck.  

One of the key problems early on is a lack of a common language for talking about models and their implementations. For those of you who've only come into a well established modeling project, you may one day find that all this jargon we use takes lots of intentional effort to create. And it's one of the most vital things to the success of a project.

<img src="modeling2.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">


Here's a picture that more closely describes our current process.

We now build 4-6 models concurrently with turnaround times of 2-6 months, depending on the complexity of the research questions and of the real world systems we're trying to model. 

The key piece of this improvement was creating a clear (but porous) boundary between the model development process and the model implementation process. Then we focused (and are continuing to focus) on communication and developing a common language. This common language is built around categories and patterns we've found in previous models and their implementations. 

On the model building side, this language is formalized in templates and procedures and requirements documents full of diagrams. On the model implementation side, this looks like software abstractions that mirror the modeling language. 

As my purview is mostly engineering, I'm going to focus us in particularly on the software and model implementation side for the rest of this talk. 

I do want to mention that Christine Allen, one of our former researchers, took a paper on our model development process to a conference this last May that was extremely well recieved. Unlike software development, there are far fewer best practices around model development, and we should all talk a lot more about it.

<h1 style="text-align: center;">Model implementation with <span style="font-family:Courier; color: blue">vivarium</span> and <span style="font-family:Courier; color: blue">vivarium_public_health</span></h1>

I mentioned before that one of the early decisions we made was to build our models using a individual-based, discrete-time, Monte Carlo approach. 

This is a generic modeling approach. And one of Alec's early pieces of wisdom was to separate the modeling approach from the public health aspects. 

As a software library, vivarium itself knows nothing about public health. It's simply a framework for building discrete-time, Monte Carlo simulations. Even the individual-based part of our work is not strictly enforced by vivarium itself.

This has made the library incredibly stable over time. Stability is vital in research software.  It reduces error rates, makes systems easier to reason about, and makes reproducible research much, much easier.

All the public health behavior is captured in a toolbox we've called, aptly, vivarium_public_health. 

Despite this separation between the modeling paradigm and public health, I'm going to use a public health model to talk through what a simulation actually looks like because I find it much easier to understand a concrete example.

<h1 style="text-align: center;">So what is a <span style="font-family:Courier; color: blue">vivarium</span> simulation?</span></h1>

Let's dig in. So what is a vivarium simulation?

... just kidding.  

It's my job to tell you this time.

<img src="simulation.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

A vivarium simulation is a series of time steps.  

Let's think about a simulating a bunch of people. I've represented them two ways here.  

Beautiful, realistic portraits of individuals and the big abstact letter X.  Both of them are representations of the **state** of the simulation.  

The state of the simulation is not the same as the state of an individual. Rather it is the combined state of all individuals.  

In our pictoral representation it's a column of stick figures and tombstones.  In our mathematical representation, it's a vector of, say, 1s and 0s with all the living people represented by 1s. 

The simulation produces a trajectory of the state over time and the model gives us rules for how that state changes each time step.

In practice, each individual has many attributes such as age and sex and whether they're sick, which makes X a matrix instead of a vector.  In the matrix, each row represents an individual and each column represents an attribute. All individuals share the same set of attributes.

All the discrete time moniker means is that time proceeds in discrete chunks of a pre-determined size.

Let's go a little deeper then.  What exactly is a time step?

<img src="timestep.png" style="display: block; margin-left: auto; margin-right: auto; width: 80%">

  
Notice the little x here.  I've zoomed down to a single person (or row in our state matrix).  

Our time steps work essentially as a loop over ever person in our simulation.

For every person, we we make a series of decisions.  

<h1 style="text-align: center;">Cool. So what is a decision?</h1>

A decision is the process used to ask and answer questions.  

Some decisions are deterministic. "How much older do I get?" for instance. Well, our time steps are a fixed width, so a person's age increases by the size of the time step as long as they're alive. 

Some questions are harder to answer though. "Do I die?", for instance. Well, maybe. Maybe not. 

For these kinds of questions, we have to turn to some early 20th century mathematicians.

<img src="montecarlo.jpg" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

No seriously. 

The Monte Carlo technique is named for the casino in Monaco. 

Much of the development of Monte Carlo techniques was done by mathematicians working on or around the Manhattan Project. Since the work was secret, it needed a code name, and one of the mathematicians had an uncle who liked to visit the Monte Carlo museum.  

So.  Ya know.  Naming is hard.

So how does the Monte Carlo technique work?

<img src="decision1.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

Surprisingly, it's pretty straightforward. 

We have a question and likely we have some data we can bring to bear to help answer that question. 

We use those two to create a probability that the thing happens. This is not a distribution, but an actual number between 0 and 1. 

We then do some computer magic to get a random number to "sample" that probability and determine our answer.

Okay, so I'm being a little glib and hand-wavy here. Let's look at a concrete example.

<img src="decision2.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

We have a question: does the individual die sometime in the next time step.  

We also have all this marvelous GBD data about mortality. 

We go to our survival analysis text books (ask Drew if you need a copy) and find a precise mathematical way to ask our question: 

What is the probability that the time of my death is within the next time step given that it's definitely after right now? 

It even has a nice math equation using our data if we're careful about our assumptions. 

Then we ask the computer for a uniform random number between 0 and 1.  If it's less than our calculated probability, the individual dies. If it's more, they don't die.

TADA. That's really all there is to it. It's essentially deciding uncertain things using a weighted coin.



### To quickly recap

Let's recap quickly.

**Slide**

A sim is ...

**Slide**

A time step is ..

**Slide**

A decision is ..

So I want to stop here for a bit and take some questions if you've got them. 

- A **simulation** is a series of **discrete time steps** of a known size.

- A **time step** involves going to each individual and making a series of **decisions**.

- A **decision** is the process for asking and answering a question about what happens to an individual, either deterministically, or with a **Monte Carlo sample** informed by some data and a random number.

<h1 style="text-align: center;">What do our models actually look like?</h1>

Given the sort of framework for modeling we've set up, you may be asking where these decisions come from.  Let's look at that question with some examples.

<img src="conceptmodel1.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

This is what we call a Concept Model diagram. It's probably our most frequently used modeling tool. It is a semi-formal depication of the causal relationships in our models.  

All the way on the right is the demographic model. A demographic model is at the heart of every simulation we produce.  It dictates the shape of our starting population and how the population counts and demographics change over time.

Just to the left we have all the causes that are **explicitly** included in the model.  All other causes of death are tracked implicitly in the mortality component. More on that later.

The arrows between the causes and mortality indicate, we'll a causal relationship between an individual's disease status and their mortality rate.

Next from the right are risk factors **explicitly** included in the model. Here we have two of the child growth failure risks.  Whether or not an individual is experiencing child growth failure changes how likely a person is to have a bout of diarrhea or to get pneumonia. The red arrow between wasting and PEM indicates a PAF of 1 relationships, which is GBD's favorite way of annoying me.  PAF of 1 relationships are special, and sometimes unique, relationships between particular risk factors and causes.

Finally, on the very left we have our fabled egg-a-day intervention with proposed causal effects on stunting and wasting.  

This is on the simpler end of models we've worked on, though it's already much more complex than we have time to dig into. It is a fair representation of the kinds of models we consider though. A comparison between this model and a maternal intervention formed the basis for Derrick Tsoi's ( former sim science pbf) thesis work.

For the remainder of our time, we're going to look at a much simpler model and its constituent parts.

<img src="conceptmodel2.png" style="display: block; margin-left: auto; margin-right: auto; width: 70%">

In this model, we've stripped away many things.  There's no intervention here. There's also no fertility model, meaning we'll be working with a closed cohort. However, there is still more than enough here to discuss.

Again, I want to stop a second for questions.  