This project is a Python disease modelling framework used by the AuTuMN tuberculosis modelling project. It is currently being applied to COVID-19 as well. This project is used by the Monash Univeristy Epidemiological Modelling Unit.
See this guide for information on how to set up this project. See here for steps on how to add a new COVID model.
All of Autumn's features can be accessed from the command line. You can run commands as follows:
python -m apps <YOUR COMMANDS>
To see a list of options, try:
python -m apps --help
├── .github GitHub config
├── apps Specific applications of the framework
├── autumn AuTuMN framework module
├── data Data to be used by the models
| ├─ inputs Input data for the models
| └─ outputs Module run outputs (not in source control)
|
├── docs Documentation
├── scripts Utility scripts
| ├─ aws Scripts to run tasks on AWS
| ├─ buildkite Configuration of Buildkite pipelines
| └─ massive Scripts to run tasks on MASSIVE (not used)
|
├── summer SUMMER framework module
├── tasks Cloud computing tasks
├── tests Automated tests
├── .gitignore Files for Git to ignore
├── plots.py Streamlit entrypoint
└── requirements.txt Python library dependencies
You can run all the scenarios for specific application using the run
command. For example, to run the "malaysia" region of the "covid" model, you can run:
python -m apps run covid malaysia
Model run outputs are written to data/outputs/run
and can be viewed in Streamlit (see below).
You can run a model MCMC calibration as follows
python -m apps calibrate MODEL_NAME MAX_SECONDS RUN_ID
For example, to calibrate the malaysia COVID model for 30 seconds you can run:
python -m apps calibrate malaysia 30 0
The RUN_ID argument can always be "0" for local use, it doesn't really matter.
Model calibration outputs are written to data/outputs/calibrate
and can be viewed in Streamlit (see below).
We use Streamlit to visualise the output of local model runs. You can run streamlit from the command line to view your model's outputs as follows:
streamlit run plots.py
If you want to view the outputs of a calibration, run:
streamlit run plots.py mcmc
We have a suite of automated tests that verify that the code works. Some of these are rigorous "unit" tests which validate functionality, while others are only "smoke" tests, which verify that the code runs without crashing. These tests are written with pytest.
You are encouraged to run the tests locally before pushing your code to GitHub. Automated tests may be run via PyCharm or via the command line using pytest:
pytest -v
Tests are also run automatically via GitHub Actions on any pull request or commit to the master
branch.
The codebase can be auto-formatted using Black:
./scripts/format.ps1
Input data is stored in text format in the data/inputs/
folder. All input data required to run the app should be stored in this folder, along with a README explaining its meaning and provenance. Input data is preprocessed into an SQLite database at runtime, inside the autumn.inputs
module. A unique identified for the latest input data is stored in data/inputs/input-hash.txt
. If you want to add new input data or modify existing data, then:
- add or update the source CSV/XLS files
- adjust the preprocess functions in
autumn.inputs
as required - rebuild the database, forcing a new file hash to be written
To fetch the latest data, run:
python -m apps db fetch
You will need to ensure that the latest date in all user-specified mixing data params is greater than or equal to the most recent Google Mobility date.
To rebuild the database with new data, run:
python -m apps db build --force
Once you are satisfied that all your models work again (run the tests), commit your changes and push up:
- The updated CSV files
- The updated
input-hash.txt
file - Any required changes to model parameters (eg. dynamic mixing dates)
We often need to run long, computationally expensive jobs. We are currently using Amazon Web Services (AWS) to do this. The scripts and documentation that allow you to do this can be found in the scripts/aws/
folder. The following jobs are run in AWS:
- Calibration: Finding maximum likelihood parameters for some historical data using MCMC
- Full model runs: Running all scenarios for all accepted MCMC parameter sets
- PowerBI processing: Post-processing of full model runs for display in PowerBI
All outputs, logs and plots for all model runs are stored in AWS S3, and they are publicly available at this website. Application should be uploaded if the app crashes midway.
Each job is run on its own server, which is transient: it will be created for the job and will be destroyed at the end.
The AWS tasks are run using Luigi, which is a tool for building data processing pipeline. The Luigi tasks can be found in the tasks
folder.
We have a self-serve job-runner website available here, build on the Buildkite platform. This website can be used to run jobs in AWS. Buildkite runs on a small persistent server in AWS. Buildkite configuration is stored in scripts/buildkite/
.