Skip to content

Latest commit

 

History

History
227 lines (132 loc) · 16.3 KB

FAQ.rst

File metadata and controls

227 lines (132 loc) · 16.3 KB

FAQ

This document contains answers to frequently (and some not so frequently) asked questions. If there are others you'd like to see included, please email us at covasim@idmod.org.

If your system can run scientific Python (Numpy, SciPy, and Matplotlib), then you can probably run Covasim. Covasim requires 1 GB of RAM per 1 million people, and can simulate roughly 5-10 million person-days per second. A typical use case, such as a population of 100,000 agents running for 500 days, would require 100 MB of memory and take about 5-10 seconds to run.

Yes. On a single-node setup, it is quite easy: in fact, MultiSim objects will automatically scale to the number of cores available. This can also be specified explicitly with e.g. msim.run(n_cpus=24).

For more complex use cases (e.g. running across multiple virtual machines), we recommend using Celery; please email us for more information.

The recommended way to save a simulation is simply via sim.save(filename). By default, this does not save the people (sim.people), since they are very large (i.e., 7 KB without people vs. 7 MB with people for 100,000 agents). However, if you really want to save the people, pass keep_people=True.

To load, you can use cv.load(filename) or cv.Sim.load(filename) (they are identical except for type checking). Under the hood, Covasim uses sc.saveobj() from Sciris, which in turn is a gzipped pickle. If you need a more "portable" format, you can use sim.to_json(), but note that this will only save the results and the parameters, not the full sim object.

Typically, parameters are held constant for the duration of the simulation. However, it is possible to modify them dynamically – see the cv.dynamic_pars() "intervention".

These are referred to as importations. You can set the n_imports parameter for a fixed number of importations each day (or make it time-varying with cv.dynamic_pars(), as described above). Alternatively, you can infect people directly using sim.people.infect(). Since version 3.0, you can also import specific strains on a given day: e.g., cv.Sim(strains=cv.strain('b117', days=50, n_imports=10).

Most parameters can be set quite simply, e.g.:

import covasim as cv
sim = cv.Sim(beta=0.008)

or:

import covasim as cv
pars = dict(beta=0.008, verbose=0)
sim = cv.Sim(pars)

However, prognoses parameters are a bit different since they're a dictionary of dictionaries of arrays. Usually the easiest solution is to create the simulation first, and then modify these parameters before initializing the sim:

import covasim as cv
sim = cv.Sim()
sim['prognoses']['death_probs'][-1] *= 2 # Double the risk of death in the oldest age group

Another option is to create the parameters first, then modify them and provide them to the sim:

import covasim as cv
prognoses = cv.get_prognoses()
prognoses['death_probs'][-1] *= 2
sim = cv.Sim(prognoses=prognoses)

One thing to be careful of is that since the prognoses are used when the population properties are set, you must make any changes to them before you initialize the sim (i.e. sim.initialize()). If you want to change prognoses for an already-created simulation, it is best to call sim.init_people() to ensure the sim parameters (sim.pars) are synchronized with the people parameters (sim.people.pars).

There are a few options. For many cases, the default options work reasonably well, i.e. sim = cv.Sim(pop_type='hybrid', location='eswatini'). If you want to use location that is not currently supported, there is generally a lot of data required (census data, school enrolment rates, workplace size and participation rates, etc.). Detailed contact networks are generally created using the SynthPops library.

Another option is to adapt the functions in population.py for your purposes. Covasim can also read in fairly generic representations of populations; for example you could create a random network and then modify the edge list (i.e. sim.people.contacts) to reflect the network you want. Please email us for more information.

Possible, but not easy. Your best option is to create a single simulation where the contact network structure reflects the different regions. Please email us for more information.

Actually, you can! R's reticulate package lets you easily interface between Python and R. For example:

library(reticulate)
cv <- import('covasim')
sim <- cv$Sim()
sim$run()
sim$plot()

(NB: if the above doesn't bring up a figure, try adding plt <- import('matplotlib.pyplot') and plt$show().)

The terms are a bit confusing and may be refactored in a future version of Covasim. The pop_size parameter actually controls the number of agents in the simulation. In many cases this is the same as the "total population size" or "scaled population size" being simulated, i.e., the actual number of people. The "actual number of people" (not agents) is available in the simulation as sim.scaled_pop_size. If (and only if) pop_scale is greater than 1, the total population size will be greater than the number of agents. Some examples might help make this clearer:

Example 1. You want to simulate a population of 100,000 people. This will only take a few seconds to run, so you set pop_size = 100e3 and pop_scale = 1. In this example the population size is 100,000, the scaled population size is 100,000, the number of agents is 100,000, and the number of people being represented is also 100,000. Life is simple and you are happy.

Example 2. You want to simulate a population of 1,000,000 people. This would take too long to run easily (several minutes per run), so you set pop_size = 200e3 and pop_scale = 5 with dyamic rescaling on (rescale = True). In this example the (simulated) population size is 200,000, the (final) scaled population size is 1,000,000, the number of agents is always 200,000, and the (final) number of people being represented is 1,000,000. Since dynamic rescaling is on, when the simulation starts, one agent represents one person, but only 200,000 people are included in the simulation (the other 800,000 are not infected and are not exposed to anyone who is infected, so are not represented in the sim). As more and more people become infected – say, 10,000 infections – 200,000 people is no longer enough to accurately represent the epidemic, since 10,000 infections out of 200,000 people is prevalence of 5%, whereas the real prevalence is 1% (10,000 infections out of 1,000,000 people). Dynamic rescaling kicks in (rescale_threshold = 0.05, the current prevalence level), and half of the infected people are converted back to susceptibles (rescale_factor = 2). There are now 5,000 infected agents in the model, corresponding to 10,000 infected people, i.e. one agent now counts as (represents) two people. This is equivalent to saying that for any given agent in the model (e.g., an infected 57-year-old woman who has 2 household contacts and 8 workplace contacts), there is another identical person somewhere else in the population.

Example 3. As in example 2, but you turn dynamic rescaling off. In this case, from the very beginning of the simulation, one agent represents 5 people (since pop_scale = 5). This is basically the same as running a simulation of 200,000 agents with pop_scale = 1 and then multiplying the results (e.g., cumulative number of infections) by a factor of 5 after the simulation finishes running: each infection counts as 5 infections, each death counts as 5 deaths, etc. Note that with dynamic rescaling off, the number of seed infections should be divided by pop_scale in order to give the same results

TLDR? Except for a few corner cases (e.g., calculating transmission trees), you should get nearly identical results with and without dynamic rescaling, so feel free to use it (it's turned on by default). That said, it's always best to use as small of a population scale factor as you can, although once you reach roughly 200,000 agents, using more agents shouldn't make much difference.

This example illustrates the three different ways to simulation a population of 100,000 people:

import covasim as cv

s1 = cv.Sim(n_days=120, pop_size=200e3, pop_infected=50, pop_scale=1,  rescale=True,  label='Full population')
s2 = cv.Sim(n_days=120, pop_size=20e3,  pop_infected=50, pop_scale=10, rescale=True,  label='Dynamic rescaling')
s3 = cv.Sim(n_days=120, pop_size=20e3,  pop_infected=5,  pop_scale=10, rescale=False, label='Static rescaling')

msim = cv.MultiSim([s1, s2, s3])
msim.run(verbose=-1)
msim.plot()

Note that using the full population and using dynamic rescaling give virtually identical results, whereas static scaling gives slightly different results.

The results are reported for the date of the test which came back positive, not the the date of diagnosis. This reason for this is that in most places, this is how the data are reported – if they do 100 tests on August 1st, say, and there is a 2-4 day test delay so 5 of these tests come back positive on each of August 2nd, 3rd, 4th, then in most places, this would be reported as 100 tests on August 1st, 15 diagnoses on August 1st (even though the lab work was done over August 2-4), and 85 negative tests on August 1st. The reason for doing it this way – both in real world reporting and in the model – is because otherwise you have a situation where if there is a big change in the number of tests from day to day, you could have more diagnoses on that day than tests. However, in terms of the model, the test delay is still being correctly taken into account. Specifically, sim.people.date_pos_test is used to (temporarily) store the date of the positive test, which is what's shown in the plots, but sim.people.date_diagnosed has the correct (true) diagnosis date for each person. For example:

import covasim as cv
tn = cv.test_num(daily_tests=100, start_day=10, test_delay=10)
sim = cv.Sim(interventions=tn)
sim.run()
sim.plot(to_plot=['new_infections', 'new_tests', 'new_diagnoses'])

shows that positive tests start coming back on day 10 (the start day of the intervention), but:

>>> np.nanmin(sim.people.date_diagnosed)
20.0

shows that the earliest date a person is actually diagnosed is on day 20 (the start day of the intervention plus the test delay).

Yes, although oscillatory modes are not a natural state of the system – you can get them with a combination of high infection rates, low testing rates, and high contact tracing rates with significant delays. This will create little clusters that grow stochastically until someone gets tested, then most of the cluster gets traced and shut down, but a few people usually escape to start the next cluster.

One of the trickest aspects of working with agent-based models is getting the random number stream right. Covasim uses both numpy and numba random number streams. These are usually initialized automatically when a simulation is created/run (via cv.set_seed(seed), which you can call directly as well), but anything that disrupts the random number stream will result in differences between two simulation runs. This is also why seemingly trivial changes (e.g., adding an intervention that doesn't actually do anything) can cause simulation trajectories to diverge.

In addition, random number streams sometimes change with different library versions. For example, due to a bugfix, random number streams changed between numba 0.48 and 0.49. Therefore, simulation run with numba 0.48 or earlier won't (exactly) match simulations run with numba 0.49 or later.

If you're having trouble reproducing results between simulations that should be the same, check: (a) the Covasim version, (b) the numpy version, (c) the numba version, and (d) the SynthPops version (if using). If all these match but results still differ, then a useful debugging strategy can be to insert print(np.random.rand()) at various points throughout the code to see at what point the two versions diverge.

The webapp is limited by the results needing to be returned before the request times out. However, when running directly via Python, you are limited only by your computer's RAM (and your patience) in terms of simulation duration or population size.

If you are running on Windows, because of the way Python's multiprocessing library is implemented, you must start the run from inside a __main__ block (see discussion here). For example, instead of this:

import covasim as cv
sims = [cv.Sim(pop_infected=100, beta=0.005*i, label=f'Beta factor {i}') for i in range(5)]
msim = cv.MultiSim(sims)
msim.run()
msim.plot()

do this:

import covasim as cv
sims = [cv.Sim(pop_infected=100, beta=0.005*i, label=f'Beta factor {i}') for i in range(5)]
msim = cv.MultiSim(sims)

if __name__ == '__main__':
    msim.run()
    msim.plot()

When parallelizing inside Jupyter notebooks, sometimes a "Duplicate signature" error will be encountered. This is because of how multiprocessing conflicts with Jupyter's internal threading (see discussion here). One solution is to move msim.run() (or other parallel command) to a separate .py file, and not have it be part of the notebook itself. This problem should be fixed in version 2.0 though, so if you're using an older version, consider upgrading.