The inputs and outputs of the `CovidSim` model

This is WIP. Know something not documented here? Please add and open a PR!

Table of contents

The geography

CovidSim simulates disease spread in a geographical region, which in principle can be at any scale, but in practice is a region or country.

In consequence, the model must be told the geography of a region, such as its population density, plus other specific information. This information is specified as a mixture of parameters and input population density files.

Main command-line arguments

A typical run specifies (i) files that contain simulation parameters (the /A, /P and /PP option) , (ii) a population density file for the country we're simulating (the /D option) and (iii) the name of output files that summarise the results of the simulation (the /O option).

CovidSim
    /c:NumThreads
    /A:AdminParamFile
    /PP:PreParameterFile
    /P:ParameterFile
    /O:OutputFilesPrefix
    [/D:PopulationDensityFile]
    [/L:NetworkFileToLoad | /S:NetworkFileToSave]
    SetupSeed1 SetupSeed2 RunSeed1 RunSeed2

Explanation of the arguments with examples:

/c:32 the number of parallel threads to use
/A:sample_admin.txt a file specifying parameters determining the geography to be modelled
/PP:preUS_R0=2.0_BM.txt a file that defines transmission and calibration parameters for a specific run
/P:p_NoInt.txt a file that defines intervention parameters for a specific run
/O:./output/NoInt_R0=1 specifies the prefix pathname for a collection of output files that contain simulation data. The output files are tabular tsv data (but with the extension .xls)
[/D:pop_usa_adm2.txt] a population density file for a specific geography (e.g. a country)
[/L:NetworkFileToLoad | /S:NetworkFileToSave]. For efficiency, we can run and, as a side-effect, generate a [network file](./model-glossary.md#Network\ file) that assigns people to places. The [network file](./model-glossary.md#Network\ file) may then be re-used for subsequent runs (with different input parameters for the same geography). The network file is a non-portable .bin. Generate this file with the /S option and re-use it (in a subsequent run) with the /L option.
SetupSeed1 SetupSeed2 Random number generator seeds used when initialising the model, including creating the networkfile (large positive integers).
RunSeed1 RunSeed2 Random number generator seeds used when running the model. These can be varied to do multiple runs with the same network file (large positive integers).

Additional command-line arguments

CovidSim
    /c:NumThreads
    /A:AdminFile
    /PP:PreParameterFile
    /P:ParameterFile
    /O:OutputFilesPrefix
    [/D:PopulationDensityFile]
    /CLP1:100000
    /CLP2:0
    /M:US_LS2018.bin
    [/L:NetworkFileToLoad | /S:NetworkFileToSave]
    [/AP:AirTravelFile]
    [/s:SchoolFile]
    [/R:R0scaling]
    SetupSeed1 SetupSeed2 RunSeed1 RunSeed2

Explanation of additional arguments:

[/AP:AirTravelFile] Air travel data for a specific geography (unused currently)
[/s:USschools.txt] School information for a specific geography (currently only used for US).
[/R:1.1]. Spcifies the reproduction number (R0), as a multiplier of 2. R0, for a disease is the number of secondary cases in susceptibles per infected case. These commandline parameter is read into P.R0scaling which scales the R0 parameter (specified in the parameter file), which is useful when we want repeated that only vary R0). For COVID-19, /R:1.4 to /R:1.6 is suitable.
/CLP1:100000, /CLP2:0 etc. are special parameters that interact with wildcards #1, #2 etc. in the intervention parameter file (and less often the pre-parameter file). Wildcard #n is replaced by the value of CLPn. This is useful to vary parts of parameter files at run-time (e.g. to undertake sensitivity analysis) without needing to generate entirely new parameter files.

Input files

The main inputs files are parameter files and population density files (for specific geographies).

Parameters

There are a very large number of parameters to CovidSim. This repo is undergoing active development and rationalisation. The parameters are currently not self-documenting.

Parameter values are read in from parameter files by function ReadParams, which matches up a parameter description string to the according variable in the source code. The only method to determine the precise meaning of a specific parameter is to read the code.

Parameter files

The parameters are specified in admin, pre-parameter and intervention parameter files. Both files have the same format.

Admin and pre-parameter files contain parameters whose values are common to a series of runs (i.e. defining geographiies and transmission parameters). Parameter files group intervention parameters whose values are more likely to differ between a series of runs.

The format is a sequence of:

[Description of Parameter]
value

If you see multiple numbers below the parameter description, then disregard them. The simulation uses only the numbers immediately below the parameter description.

An example parameter file is ./data/param_files/p_NoInt.txt.

Population density file

A binary geography-specific file used to assign people to cells. Currently these files are generated and provided by Imperial College.

An example population density file is ./data/populations/wpop_eur.txt.

The information contained in this file includes:

longitude	latitude	number of people	country code	admin unit code
-156.68333	71.325	30	46	4602017
-156.76666	71.3	1	46	4602017
...	...	...	...	...

How are population density files produced?

Physical geography data: each geography has a shape file (.shp) of polygons and meta-data (.dbf) with GPS coordinates. Admin units are a set of polygons.

Human geography data specifies where people live on the same scale as a CovidSim's microcell (1/120th of a degree).

Imperial College combines the physical and human data to calculate population densities per polygon. This process produces the population density file.

A companion to the population density file is a metafile that maps admin unit codes to string descriptions (e.g., codes to US state names).

School files

The first line of a school file has (1 + 2n) integer values, where n is the number of school types. The values are:

Index 0: The number of types of schools. E.g. a geography might two school place types (primary and secondary).
Index 1 + 2i: The total number of schools of type i
Index 2 + 2i: The number of age bands in schools of type i

E.g., if a geography has 2 school types then the first line of the school file might be:

2 100 3 50 4

representing 2 school types, with 100 of type 0 (which as 3 age classes) and 50 of type 1 (which has 4 age classes).

The remainder of the file has a row per school. E.g.:

longitude	latitude	place type index	#people in the school	#people in age band 1	# people in age band 2	...	# people in age band n
-156.68333	71.325	0	80	30	46	...	4
-123.32	70.35	0	32	23	3	...	6
...	...	...	...	...	...	...	...

The place type index for schools is 0.

Output files

Simulation output files are produced by each run. Switches in parameter files can control the precise nature of the outputs (e.g., at country level, or at admin unit level, or both etc.). E.g.

[Do Severity Analysis]
1

then severity.xls is generated.

A run is extinct if the disease dies out, otherwise a run is non extinct.

Outputs can be averaged over all extinct (avE suffix) and non-extinct (avNE suffix) runs. Currently, we are simulating large epidemics that essentially become deterministic and therefore we focus on avNE files.

We pay most attention to avNE (average of non-extinct realisations) files.

Below is an incomplete specification of the output file formats.

`name.avNE.xls`

Contains time-stamped (e.g., daily) statistics for the simulation over the whole country.

column	meaning
t	sample time – specified in the preparam file by Sampling timestep - generally day in 2020 (t=1 -> Jan 1)
S	total number of susceptibles in the population
L	total number of latently infected people in the population
I	total number of infectious people in the population
R	total number of recovered people in the population
D	total number of deaths in the population
incI	incidence of infections at that timestep
incR	incidence of recoveries
incFC	incidence of false cases, i.e. false positives
incC	incidence of cases
incDC	incidence of detected cases
incTC	incidence of treated cases
incH	incidence of hospitalisations – again, probably can ignore this as was written specifically for the Ebola model and we’re using a different approach here.
cumT	cumulative number of treated cases
cumTmax	the maximum number of cumulative treated cases from the runs being averaged over
cumTP	cumulative number of privately treated cases
cumV	cumulative number of vaccinations
cumVmax	the maximum number of cumulative vaccinations from the runs being averaged over
Extinct	Is the run extinct or not?
rmsRad	root mean square radius of infections from seed point
maxRad	maxium radius of an infection from the seed point
v*	a sequence of columns containing the variance of the above quantities in the same order (excluding the time step)
value 1	Number of non-extinct runs
value 2	Number of extinct runs
value 3	R0 in households
value 4	R0 in places
value 5	R0 of spatial transmission
value 6	Mean peak height
value 7	Variance of peak height
value 8	Mean peak time
value 9	Variance of peak time

`name.avNE.adunit.xls`

Contains time-stamped statistics per [admin unit](./model-glossary.md#Admin\ unit) (hopefully with headers matching the codes in a population index file).

column	meaning
t	time
I(admincode) ...	Incidence of infection in each admin unit (the number of columns equals the number of admin units used)
C(admincode) ...	Incidence of cases in each admin unit.
DC(admincode) ...	Incidence of detected cases in each admin unit
T(admincode) ...	Incidence of treated cases in each admin unit
value ...	A sequence of column values of the population of each admin unit

`name.avNE.age.xls`

column	meaning
t	time
I(age band) ...	incidence of cases in each age band
C(age band) ...	incidence of critical cases in each age band
D(age band) ...	incidence of deaths in each age band

`name.avNE.severity.xls`

Contains statistics on the prevalence of the infection.

column	meaning
t	time
PropSchClosed	proportion of schools closed
PropSocDist	unknown
mild	total number of mild cases at time t
ILI	total number of influenza-like illness cases at time t (assume represents GP demand)
SARI	total number of severe acute respiratory illness cases at time t (assume represents hospital demand)
Crit	total number of critical cases (assume represents ICU demand)
CritRecov	total number of critical cases who are well enough to be out of ICU but still need a hospital bed
incMild	incidence of mild cases
incILI	incidence of ILI cases
incSARI	incidence of SARI cases
incCrit	incidence of critical cases
incCritRecov	incidence of critical cases still in hospital but no longer requiring ICU
incDeath	incidence of death
cumMild	cumulative number of mild cases
cumILI	cumulative number of ILI cases
cumSARI	cumulative number of SARI cases
cumCrit	cumulative number of critical cases
cumCritRecov	cumulative number of critical cases still in hospital but no longer requiring ICU
v*	a sequence of columns containing the variance of the above quantities in the same order (excluding the PropSchClosed, PropSocDist)

`name.avNE.severity.adunit.xls`

As per name.avNE.serverity.xls, excluding PropSchClosed and PropSocDist, and with each quantity listed for each admin unit in turn.

R summary visualisations

Some R scripts provide basic visualisations of model runs.

If the R software is installed and output files of model runs have been created in folder folder, they can be visualised using the commands

Rscript Rscripts/PlotsSpatial.R [folder-where-the-data-is]
Rscript Rscripts/CompareScenarios.R [folder-where-the-data-is]

This will create .pngs visualising the data in a new subfolder called Plots.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inputs-and-outputs.md

inputs-and-outputs.md

The inputs and outputs of the `CovidSim` model

The geography

Main command-line arguments

Additional command-line arguments

Input files

Parameters

Parameter files

Population density file

How are population density files produced?

School files

Output files

`name.avNE.xls`

`name.avNE.adunit.xls`

`name.avNE.age.xls`

`name.avNE.severity.xls`

`name.avNE.severity.adunit.xls`

R summary visualisations

Files

inputs-and-outputs.md

Latest commit

History

inputs-and-outputs.md

File metadata and controls

The inputs and outputs of the CovidSim model

The geography

Main command-line arguments

Additional command-line arguments

Input files

Parameters

Parameter files

Population density file

How are population density files produced?

School files

Output files

name.avNE.xls

name.avNE.adunit.xls

name.avNE.age.xls

name.avNE.severity.xls

name.avNE.severity.adunit.xls

R summary visualisations

The inputs and outputs of the `CovidSim` model

`name.avNE.xls`

`name.avNE.adunit.xls`

`name.avNE.age.xls`

`name.avNE.severity.xls`

`name.avNE.severity.adunit.xls`