- MIMIC-III Clinical Database: https://mimic.physionet.org/
- eICU Collaborative Research Database: https://eicu-crd.mit.edu/
- General Internal Medicine (GIM) dataset: https://lks-chart.github.io/gim-data-dictionary/
- Medical Imaging in Cervical Spine Trauma (MICST) dataset
- Getting the data: https://github.com/MIT-LCP/2019_toronto_health_hack/blob/master/talks/2019-10-03-toronto.ipynb
The datasets are hosted on Google Cloud, which requires a Gmail account to manage permissions.
- Create a Gmail account, if you don't already have one. It will be used to manage your access to the resources.
- Give your gmail address to the session hosts.
BigQuery is a database system that makes it easy to explore data with Structured Query Language ("SQL").
-
At the top of the console, select
tdothealthhack-teamas the project. This indicates the account used for billing. -
"Pin" a project to the resources menu to view available datasets. In the Resources menu on the left, click "Add data", "Pin a project", then add the following project names:
physionet-dataandtdothealthhack-data. -
You should be able preview the data available on these projects using the graphical interface.
-
Now try running a query. For example, try counting the number of rows in the demo eICU patient table:
SELECT count(*) FROM `physionet-data.eicu_crd_demo.patient`
Several tutorials are provided below. Requirements for these notebooks are: (1) you have a Gmail account and (2) your Gmail address has been added to the appropriate Google Group by the workshop hosts.
- 01-accessing-the-data.ipynb
- 02-explore-patients.ipynb
- 03-severity-of-illness.ipynb
- 04-summary-statistics.ipynb
- 05-prediction.ipynb
- Understanding Electronic Health Records with BigQuery ML
- Datathon Tutorial
- MIMIC-IV
Datasets can also be queried directly from R. You should be able to run the following RMarkdown notebook in a local version of RStudio: https://github.com/MIT-LCP/2019_toronto_health_hack/blob/master/tutorials/mimic-iii/mimic-iii-los.Rmd