Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COVID-19 bindings with Python #222

Open
dentarthur opened this issue Mar 21, 2020 · 4 comments
Open

COVID-19 bindings with Python #222

dentarthur opened this issue Mar 21, 2020 · 4 comments

Comments

@dentarthur
Copy link

Python Bindings for covid-19

tl;dr

Lots of data scientists who work with the SciPy and NumFocus datastack are likely to be very interested in the epidemiological models influencing current life and death policy debates that are likely to require widespread use of more granular Agent Based models of local as well as global social distancing transmission networks over the next 18 months. FLAMEGPU2 could be very relevant.

From Supplementary notes to reference 5 (2006) of "Imperial College COVID-19 Response Team" model in the news right now, worldwide:

"The simulation was written in C using the OpenMP 2.0 SMP libraries. The US simulation used
some 55GB of RAM, and each realization ran in 1-2 hours on 8 CPU Opteron 854 based servers,
and in 2-5 hours on 16 CPUs of the NCSA SGI Altix 3700 system (‘cobalt’). The longer run
times on the latter system were mainly due to increased memory latency on that larger system.
Overall, approximately 20,000 CPU hours were used to generate the US results, and
approximately 8000 hours to generate the GB results."

https://static-content.springer.com/esm/art%3A10.1038%2Fnature04795/MediaObjects/41586_2006_BFnature04795_MOESM28_ESM.pdf

https://time.com/5804555/coronavirus-lockdown-uk/

https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf

Reference 5 from the current model above, with direct doi link:

  1. Ferguson, N., Cummings, D., Fraser, C. et al. Strategies for mitigating an influenza pandemic. Nature 442, 448–452 (2006).
    https://doi.org/10.1038/nature04795

Hopefully you are are are already fully aware of possible implications for relevance of FLAMEGPU2 and python bindings and availability of support for a larger team to develop them more rapidly. If not, I hope these direct links help speed things up.

I have no relevant qualifications to assist but am adding some uninformed notes below.

Notes

More References

The only other reference to the actual model implemenntation I noticed in the above paper on current parameters was reference 6, with direct doi link added below.

  1. Modeling targeted layered containment of an influenza pandemic in the United States
    M. Elizabeth Halloran, Neil M. Ferguson, Stephen Eubank, Ira M. Longini Jr., Derek A. T. Cummings, Bryan Lewis, Shufu Xu, Christophe Fraser, Anil Vullikanti, Timothy C. Germann, Diane Wagener, Richard Beckman, Kai Kadau, Chris Barrett, Catherine A. Macken, Donald S. Burke, and Philip Cooley
    PNAS March 25, 2008 105 (12) 4639-4644;
    https://doi.org/10.1073/pnas.0706849105

Review (critique) from: https://www.endcoronavirus.org/
Chen Shen, Nassim Nicholas Taleb and Yaneer Bar-Yam, Review of Ferguson et al "Impact of non-pharmaceutical interventions...", New England Complex Systems Institute (March 17, 2020).
https://necsi.edu/review-of-ferguson-et-al-impact-of-non-pharmaceutical-interventions

Lot's of people like me, will need to acquire some basic epidemiological concepts to understand what the models are about, which is useful, even if not strictly essential, for implementing them on GPUs. I will be starting here:

https://www.nature.com/articles/nrmicro1845

doi:10.1038/nrmicro1845 available at Sci-Hub

https://sci-hub.tw/10.1038/nature04017

Book chapters: https://www.bmj.com/about-bmj/resources-readers/publications/epidemiology-uninitiated

Epidemiology software

Epidemiologists are rather busy at the moment but are well organized for improving their computational tools.

My understanding is that they work with R statistical software using tools such as:

https://www.repidemicsconsortium.org/

Thibaut Jombart, David M. Aanensen, Marc Baguelin, Paul Birrell, Simon Cauchemez, Anton Camacho, Caroline Colijn, Caitlin Collins, Anne Cori, Xavier Didelot, Christophe Fraser, Simon Frost, Niel Hens, Joseph Hugues, Michael Höhle, Lulla Opatowski, Andrew Rambaut, Oliver Ratmann, Samuel Soubeyrand, Marc A. Suchard, Jacco Wallinga, Rolf Ypma, Neil Ferguson,
OutbreakTools: A new platform for disease outbreak analysis using the R software,
Epidemics,
Volume 7,
2014,
Pages 28-34,
ISSN 1755-4365,
https://doi.org/10.1016/j.epidem.2014.04.003.
(http://www.sciencedirect.com/science/article/pii/S1755436514000206)
Abstract: The investigation of infectious disease outbreaks relies on the analysis of increasingly complex and diverse data, which offer new prospects for gaining insights into disease transmission processes and informing public health policies. However, the potential of such data can only be harnessed using a number of different, complementary approaches and tools, and a unified platform for the analysis of disease outbreaks is still lacking. In this paper, we present the new R package OutbreakTools, which aims to provide a basis for outbreak data management and analysis in R. OutbreakTools is developed by a community of epidemiologists, statisticians, modellers and bioinformaticians, and implements classes and methods for storing, handling and visualizing outbreak data. It includes real and simulated outbreak datasets. Together with a number of tools for infectious disease epidemiology recently made available in R, OutbreakTools contributes to the emergence of a new, free and open-source platform for the analysis of disease outbreaks.
Keywords: Software; Free; Bioinformatics; Epidemiology; R; Epidemics; Public health; Infectious disease

https://doi.org/10.1016/j.epidem.2014.04.003

Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol. 2013 Nov 1;178(9):1505-12. doi: 10.1093/aje/kwt133. Epub 2013 Sep 15. PMID: 24043437; PMCID: PMC3816335.
https://doi.org/10.1093/aje/kwt133

They work with Jupyter R notebooks and publish interactive web visualization dashboards for real-time analysis using the R library "Shiny" and python servers such as:

https://nextstrain.org/

This world is not isolated from the world of Python datascience as the two languages interoperate directly with rpy2 accessing R from python and reticulate accessing Python from R.

Python Datascience

There are far more data scientists able to help with python tools than there are epidemiologists and they are also well organized with python now dominant as a "glue" language, not only with R, but also between pretty well everything.

In particular I believe it should be relatively straight forward for them to enable rapid use of ABMs based on CSXMS far more easily using ordinary arithmetic expressions in python rather than remaining accessible only to people who can and will be willing to wrap C or C++ functions in XML.

https://numba.pydata.org/

Allso I asssume PyTables could snapshot data from GPU memory to standard Common Data Format hdf5 files a lot more efficiently than XML.

GPUs

GPUs on gamer PCs probably outnumber data scientists as much as data scientists outnumber epidemiologists.

It is already feasible to mobilize those resources for science projects and there are a LOT more people keen to help right now:

https://boinc.berkeley.edu/

Conclusion

The potential role of FLAMEGPU suggests its urgency has grown dramatically since python bindings were first raised as an issue here on Jan 14:

#174

There are probably lots of people who can help out there right now looking for projects to help with and massive funding available for staffing up.

Hope this helps.

Arthur

@mondus
Copy link
Member

mondus commented Mar 24, 2020

@dentarthur First of all thanks for your detailed issue, comments and links. I am already aware of the potenial for FLAME GPU for epidemiology research and have a collaborator at Sheffield who has developed models for TB.

We are making good progress with FLAME GPU2. In particular have found a way to get run-time compilation which is necessary for python bindings. This is still a couple of months away in terms of development especially with some development attention shifted to support remote teaching.

If you are interested in developing a FLAME GPU model I would suggest starting with FLAME GPU 1. Anything you develop will very easily translate to FLAME GPU 2 once we get the alpha release out. After a release I am keen to have a solid epidemiology example which people can base models from.

@dentarthur
Copy link
Author

Thanks for clear response.

  1. As soon as I get time (hopefully a couple of weeks) I will more carefully study the literature and docs on both FLAME and FLAME GPU 1 with a view to future use for Agent Based Stock Flow consistent micro foundations of macroeconomic models of the business cycle (different from EURACE but also studying that). If notes of what I understand or misunderstand and questions or comments on what I think needs improving in literature and/or documentation might be helpful please let me know where to post a link to any such future notes.

  2. I cannot help with solid epidemology example myself. I got in touch because it suddenly occurred to me that FLAME GPU might have become suddenly relevant to the current pandemic and that you and others might not be aware of how relevant it could be even though it has been used for pandemic modeling as mentioned in the literature.

  3. I hope you ask your collaborator in Sheffield familiar with epidemiological models to carefully study the links in this issue and my elaboration below.

  4. My central point may have been buried in the detail. It is the possibility that additional resources such as developers for rapidly completing FLAME GPU with python bindings (as well as for remote teaching) might be available right now from people who urgently want to be able to work with the particular very important solid epidemiogical model I provided links for. My uninformed impression is that FLAME GPU is particularly suitable for that because there may be a very urgent need for replication of simulation results with much more finely grained models accessible to a much wider range of data scientists (via python bindings on GPUs rather than large HPC clusters with C/C++ developers).

  5. Major public health decisions affecting millions of people are being taken right now based on simulation runs from the 16 March model at my third link, which has been mentioned in worldwide media (Time magazine, second link). Capability to do simulation runs using this model is not widely accessible for replicating results. Figure 3 at p 11 of 20 suggests such major decisions to trigger or lift "social distancing" suppression strategies could be taken every 2 or 3 months up to at least the end of next year. The second para on p15 suggests that such decisions could be needed on a much more granular local district level in many countries. Page 16 explains that the change in UK strategy over 5 days informed by this model concerns calculations about 250,000 deaths in the UK and 1.1-1.2 million in the US. Helping with such decisions could be very urgent.

  6. Your collaborator in Sheffield familiar with epidemiological modelling should be able to make a quick judgment whether or not there are people who should be urgently contacted about the possible relevance and usefulness of helping to accelerate development here. The actual epidemiologists working on the current pandemic certainly won't have time to consider such possibilities at the moment. But other people able to help accelerate your deveopment might. Support for remote teaching is obviously a very high priority with the shutdown of face to face teaching. It suddenly became a shutdown of all UK schools 5 days after a decision against such a shut down - with both decisions based on this model. It may turn out that support for more granular epidemiological modelling is likewise of very high priority (and also that python bindings are very relevant to remote teaching). I simply don't know about the urgency of possible relevance to covid-19 public health policy right now and you might not either. But your collaborator might well understand both the model at the links I provided and FLAME GPU well enough to know or at least should be able to find out who to ask for a quick check.

  7. I won't be attempting to generate any actual simulation until well after your projected ETA for python bindings. But I do hope to begin at least thinking about preparing CSXM dataflow and state machine models compatible with FLAME GPU ready for eventual generation with python bindings. Am unlikely to attempt generating anything before able to to do it from python (as are many others ;-)

  8. Will take your advice to orient towards FLAME GPU 1 with a view to translate to v2 when released with python bindings.

  9. Am also assuming that the use of XML schemas is intended for automated verification of CSXM models (eg for FPGAs) but presumably also enables design initially for FLAME on a single CPU to be translated from there to HPC cluster and then to FLAME GPU without needing to start from scratch (whereas working back the other way would not allow a simplified model to just run on PCs with no GPU if it was initially designed for FLAME GPU).

@mondus
Copy link
Member

mondus commented Apr 30, 2020

@dentarthur Thanks for you detailed comments. The FLAME GPU team submit an application to the RAMP call for covid modelling assistance. It looks like there might be some suitable models where FLAME GPU can be applied.

Please feel free to raise questions about FLAME GPU on the issue tracker. To answer some of your questions above.

  1. Yes I also believe that FLAME GPU is suitable to epidemiology modelling. We will be exploring this through the RAMP collaboration.

  2. I am working hard on runtime compilation support necessary for Python bindings.

  3. The plan is to release backward compatibility for FLAME GPU 1 models. FLAME GPU 1 will not however get Python bindings.

  4. FLAME GPU uses slightly different XML and a completely different template engine to FLAME HPC to generate code. It is hoped that with FLAME GPU 2 there will be an opportunity to merge the code with FLAME 2 for HPC at some stage. Regardless FLAME and FLAME GPU share the same underlying principle some models can be fairly easily translated between the two.

@dentarthur
Copy link
Author

Glad to see this!!!

Re my 4:

The particular model I was referring to (Imperial College covid-19 response team Report 9) has now been released:

https://github.com/mrc-ide/covid-sim

Re my 5:

Here is an example of a more granular Agent Based Model also recently released:

https://github.com/BDI-pathogens/OpenABM-Covid19/blob/master/documentation/covid19_model.pdf

Described as Individual Based Model, IBM, rather than ABM. Perhaps because the "Individuals" are more passive like micro cells in the micro-simulation model of Report 9 above rather than being active "Agent" actors with their own behaviour.

This illustrates my point about the sort of more granular modelling that will be needed for localized decisions about triggering or lifting suppression strategies locally during successive waves of covid-19 over more than a year until vaccine.

Re your 7 (and my 7 and 9 mention of CSXMs). Given that use of XML does not relate to some stable external constraint as I had assumed in 9, I will elaborate on potential relevance of CSXMs to the particular type of model described as IBM above.

I have not studied the C code for OpenABM-Covid19 above nor yet done the "more careful study of literature and docs" re Flame etc I mentioned in my 1.

But my prejudice that CSXMs are highly relevant is reinforced by what I have read so far.

As I see it Python bindings are very important for making Flame GPU accessible at all and you are working hard on that.

Use of XML is not such a big barrier as C++ but if there is some way to directly write executable models of ABM simulations in terms of UML Activity and State Machine diagrams of CSXMs on networks that would be a very big breakthrough so that people who understand the epidemological models but do not understand HPC GPU clusters can develop scalable simulations.

Deployment of Digital Tracing apps on mobile phones will require processing of literally tens of millions of updates about potential transmission contacts (CSXM messages) between Infected and Susceptible Individuals and Locations (SXM Agents) on several network graphs for work, school and other social contacts. This will be used for real time decisions to automatically impose or lift quanrantine restrictions on literally millions of individuals as well as to modify local "social distancing" policies.

Simulation modelling of the effects of different filters and parameters will clearly need HPC GPU clusters.

The conceptual gap between the high level domain model and the low level considerations for optimizing task and data parallel performance on GPU clusters is enormous.

CSXMs look to me like an ideal intermediate level that can be understood both by epidemological domain experts and GPU coders so that the models are designed to work efficiently with GPU clusters. (Small messages, larger Agents, phases etc).

My vantage point is based on ignorance rather than expertize. Knowing nothing much about GPUs I can "sort of get" why messages should be small and Activity (or Dataflow) diagrams should be tagged with phases when designing an ABM simulation for use with Flame CSXMs. A modeler should only need to be told that sort of thing rather than needing to know anything at all about C++ or GPUs or data or task parallelism. That works for me with the concept of CSXMs

I suspect that intermediate level of CSXMs would also be very helpful for people who understand what they want to model for epidemiological purposes far better than I do.

The language for communication between Domain experts and coders is UML (which uses XML for interchange but only behind the scenes). The docs I have read here repeatedly remind users to think in terms of CSXMs while actually descibining details about XML files and distracting from any focus on CSXMs. The docs I would like to be reading would be explaining how to express a design thought of in terms of CSXMs using UML graphical design tools.

Working with a team that designs UML profiles for use with standard graphical model tools (Eclipse etc) to produce CSXM models for input to Flame GPU might make a big difference for RAMP proposals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants