Skip to content

MIT-LCP/bidmc-datathon

Repository files navigation

BIDMC Datathon (29 February 2020)

This repository contains resources for the BIDMC Datathon 2020.

Contents

  1. Getting started
  2. Documentation
  3. Databases on BigQuery
  4. Analysing data with Google Colab
  5. Notebooks that we prepared earlier
  6. Sample projects

1. Getting started

The datasets are hosted on Google Cloud, which requires a Gmail account to manage permissions.

  1. Create a Gmail account, if you don't already have one. It will be used to manage your access to the resources.
  2. Give your gmail address to the session hosts.

2. Documentation

We will be working with two critical care datasets during the event: MIMIC-III and the eICU Collaborative Research Database.

3. Databases on BigQuery

BigQuery is a database system that makes it easy to explore data with Structured Query Language ("SQL"). There are several datasets on BigQuery available for you to explore, including eicu_crd (the eICU Collaborative Research Database) and mimiciii_clinical (the MIMIC-III Clinical Database).

You will also find "derived" databases, which include tables derived from the original data using the code in the eICU and MIMIC code repositories. These are helpful if you are looking for something like a sepsis cohort or first day vital signs.

  1. Open BigQuery.

  2. At the top of the console, select bidmc-datathon as the project. This indicates the account used for billing.

  3. "Pin" a project to the resources menu to view available datasets. In the Resources menu on the left, click "Add data", "Pin a project", then add the following project names: physionet-data and bidmc-datathon.

  4. You should be able preview the data available on these projects using the graphical interface.

  5. Now try running a query. For example, try counting the number of rows in the demo eICU patient table:

    SELECT count(*)
    FROM `physionet-data.eicu_crd_demo.patient` 

4. Analysing data with Google Colab

Python is an increasingly popular programming language for analysing data. We will explore the data using Python notebooks, which allow code and text to be combined into executable documents. First, try opening a blank document using the link below:

5. Python and R notebooks that we prepared earlier

Several tutorials are provided below. Requirements for these notebooks are: (1) you have a Gmail account and (2) your Gmail address has been added to the appropriate Google Group by the workshop hosts.

BIDMC Q1: Data extraction for the English vs. Non-English Speaker project (MIMIC/R) Open In Colab

BIDMC Q1: Exploratory analysis in English vs. Non-English Speakers (MIMIC/Python) Open In Colab

BIDMC Q1: Exploratory analysis in English vs. Non-English Speakers (MIMIC/R) Open In Colab

Decision trees for mortality prediction (eICU) Open In Colab

Renal replacement therapy (eICU) Open In Colab

Exploring the patient table (eICU): Open In Colab

Severity of illness (eICU): Open In Colab

Summary statistics (eICU) Open In Colab

Timeseries for a single patient (eICU) Open In Colab

Mortality prediction (eICU) Open In Colab

Acute kidney injury (eICU) Open In Colab

Weekend effect on mortality (MIMIC) Open In Colab

Project work (eICU) Open In Colab

6. Sample projects

These papers and repositories may be helpful for reference. They are definitely not perfect! Code may be untidy, poorly documented, buggy, outdated etc. Think about how they can be improved, adapted, etc. For example, you could:

  • replicate the study on a different dataset (e.g. MIMIC vs eICU)
  • improve the methodology
  1. The association between mortality among patients admitted to the intensive care unit on a weekend compared to a weekday

To be continued...

Releases

No releases published

Packages

No packages published