Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial update to Section 3 #2

Merged
merged 35 commits into from
Jun 6, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
cf76aae
Add files via upload
peterhcharlton Jun 3, 2016
c590ae5
Update README.md
peterhcharlton Jun 3, 2016
dbda9fc
Update README.md
peterhcharlton Jun 3, 2016
c57e5da
Update README.md
peterhcharlton Jun 3, 2016
ac02830
Create README
peterhcharlton Jun 3, 2016
7d310db
Add files via upload
peterhcharlton Jun 3, 2016
e066a49
Rename README to README.md
peterhcharlton Jun 3, 2016
23af0ce
Update README.md
peterhcharlton Jun 3, 2016
9a83722
Create README.md
peterhcharlton Jun 3, 2016
8f635b9
Create Code_BP_AKI
peterhcharlton Jun 3, 2016
0e17db8
Update README.md
peterhcharlton Jun 3, 2016
83e8954
Update README.md
peterhcharlton Jun 3, 2016
31392f6
Create README.md
peterhcharlton Jun 3, 2016
8297f43
Add files via upload
peterhcharlton Jun 3, 2016
4c64a66
Update README.md
peterhcharlton Jun 3, 2016
7b2a2dd
Create README.md
peterhcharlton Jun 3, 2016
e1d45e6
Add files via upload
peterhcharlton Jun 3, 2016
f4faa0f
Update README.md
peterhcharlton Jun 3, 2016
1cb9c19
Create chapter04_code.R
peterhcharlton Jun 3, 2016
e4cb73a
Create README.md
peterhcharlton Jun 3, 2016
d8d6504
Update README.md
peterhcharlton Jun 3, 2016
31a3c5c
Rename R code - Mortality Prediction in ICU.R to chapter3_05.R
peterhcharlton Jun 6, 2016
8adb43e
Rename SQL Query - Mortality Prediction in ICU.sql to chapter3_05.sql
peterhcharlton Jun 6, 2016
3dd9a91
Rename chapter3_05.sql to chapter3_05_query.sql
peterhcharlton Jun 6, 2016
991cc5b
Rename chapter3_05.R to chapter3_05_analysis.R
peterhcharlton Jun 6, 2016
924355a
Rename Afib case study data extraction code.m to chapter3_07_data_ext…
peterhcharlton Jun 6, 2016
817986e
Rename Afib case study_database query.sql to chapter3_07_database_que…
peterhcharlton Jun 6, 2016
d31fd8c
Rename Afib case study_propensity score analysis code.r to chapter3_0…
peterhcharlton Jun 6, 2016
f298ff3
Rename Afib case study_propensity score matching code.r to chapter3_0…
peterhcharlton Jun 6, 2016
3f7b75d
Rename Matlab IED_transition.m to chapter3_08_IED_transition.m
peterhcharlton Jun 6, 2016
8a7566a
Rename Matlab MCMC_solver.m to chapter3_08_MCMC_solver.m
peterhcharlton Jun 6, 2016
83035f0
Rename Matlab health_forecast.m to chapter_3_08_health_forecast.m
peterhcharlton Jun 6, 2016
5bb13b1
Rename Matlab2R health_forecast.m to chapter3_08_2R_health_forecast.m
peterhcharlton Jun 6, 2016
d9ee92d
Rename Code_BP_AKI to chapter_3_09_code
peterhcharlton Jun 6, 2016
dcf6969
Delete rpeakdetect.m
peterhcharlton Jun 6, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 25 additions & 1 deletion section3/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,25 @@
# section 3 content
# Section 3: Case Studies

## Summary

This section presents twelve case studies of secondary analyses of electronic health records. The case studies cover a range of research areas: from analysis of organisational change over a period of years to assessment of the clinical effectiveness of specific interventions; from prediction of patient outcomes to continuous estimation of physiological parameters. The range of methodologies demonstrated is no less broad: from machine learning techniques to propensity score matching; from natural language processing to physiological signal processing. Approaches to commonly encountered hurdles such as patient cohort identification and outcome extraction are illustrated on publicly available data, with analytical code supplied. Consequently, the case studies form a valuable resource for clinicians and data scientists, beginners and experts alike.

## Contents

The code used in this section of the book is provided to allow users to replicate the case studies. You may wish to focus on those case studies which are written in the programming language(s) with which you are most familiar. Here are the programming languages used in each of the case studies:

| Chapter | Title | Programming Languages |
|---|---|---|
| 3.1 | Introduction | *none* |
| 3.2 | Trend Analysis: Evolution of tidal volume over time for patients receiving invasive mechanical ventilation | |
| 3.3 | Instrumental Variable Analysis of Electronic Health Records | R |
| 3.4 | Mortality prediction in the ICU based on MIMIC-II: Results from the Super ICU Learner Algorithm (SICULA) project | R |
| 3.5 | Mortality Prediction in the ICU | SQL, R |
| 3.6 | Data Fusion Techniques for Early Warning of Clinical Deterioration | SQL, Matlab |
| 3.7 | Comparative effectiveness: Propensity Score Analysis | SQL, Matlab, R |
| 3.8 | Markov Models and Cost Effectiveness Analysis: Applications in Medical Research | Matlab |
| 3.9 | Blood Pressure and the Risk of Acute Kidney Injury in the ICU: Case-Control vs. Case-Crossover Designs | *insert* |
| 3.10 | Waveform Analysis to Estimate Respiratory Rate | Matlab |
| 3.11 | Signal processing: False Alarm Reduction | |
| 3.12 | Improving Patient Cohort Identification Using Natural Language Processing | SQL |
| 3.13 | Hyperparameter Selection | |
22 changes: 22 additions & 0 deletions section3/chapter03/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# 3.3 Instrumental Variable Analysis of Electronic Health Records

This directory contains the code used in the following publication:

Penna N. D. *et al.* **Instrumental Variable Analysis of Electronic Health Records**, in *Secondary analysis of Electronic Health Record Data*, Springer, [Under Review]

Code is provided in R format.

## Summary of Publication

Sources of variation in treatments received that are exogenous to patients can be used to estimate causal effects from observational data. We present an example of this methodology that estimates the effect of critically ill patients being cared for in “non-target ICUs” due to capacity constraints - a process known as *boarding*.

## Replicating this Publication

The work presented in this case study can be replicated as follows:

*insert*


***
Part of the wider **[Secondary Analysis of Electronic Health Records](https://github.com/MIT-LCP/critical-data-book)** book
***
148 changes: 148 additions & 0 deletions section3/chapter03/analysis_main.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
title: "Boarders Analysis Jan 30, 2016"
output: pdf_document
---

TODO:
*Days survived -- add explanation as to why we defined it with 45 days as the cut-off*
*Show that the % of boarders who become non-boarders later is very small, and the change in this over the years*

We start by importing the data into R.

Patients with a recorded age > 90 in MIMIC-III are assigned random ages greater than 90, so we recode them with age = 90.

Boarders are defined as patients who spent any portion of their ICU stay outside of the MICU.

```{r}
dat = read.csv("~/Downloads/boarders_periods1-3_LAST_ICU_Jan21_v11.csv", header = TRUE)

dat$icustay_admit_age[which(dat$icustay_admit_age > 90)] <-90
dat$days_survived[which(dat$days_survived > 45)] <-45

dat$boarder = 0
dat$boarder[which(dat$icustay_los_boarder > 0)] <-1

```

We start by defining a data subset *dat_all*. This includes all rows except those with missing or errant data (169 rows excluded [2.0%]).

```{r}

dat_all <- subset(dat, ( !is.na(elixhauser_28day) &
(los_days_prior_to_icu >= 0) &
(days_survived >= 0) &
(west_initial_remaining_beds >= 0)
)
)

nrow(dat)
nrow(dat_all)

```

Next, we run our instrumental variable analysis using *icustay_expire_flag_mod* (death within 24 hours of the end of the ICU stay) as the outcome, and *west_initial_remaining_beds* (the number of remaining MICU beds) as our instrument.

Our instrument is controlled for team census size since there is an intuitive inverse relationship between team census size and the number of remaining beds, and it is conceivable that team census size could affect the outcome.

We also control for

* Age
* Gender
* Length of hospital stay prior to ICU admission
* Severity of illness (OASIS)
* Comorbidities (Elixhauser score)
* Number of boarders under the care of the MICU
* Year

```{r}

library(SemiParBIVProbit)

boarder_smooth.eq <- boarder ~ s(west_initial_remaining_beds) + s(icustay_admit_age) +
gender + s(OASIS) + s(elixhauser_28day) + s(west_initial_team_census) +
s(west_initial_outboarder_count) + s(los_days_prior_to_icu) +
factor(transfers.intime_year)

expire_smooth.eq <- icustay_expire_flag_mod ~ boarder + s(icustay_admit_age) +
gender + s(OASIS) + s(elixhauser_28day) + s(west_initial_team_census) +
s(west_initial_outboarder_count) + s(los_days_prior_to_icu) +
factor(transfers.intime_year)

bpN_smooth_ICU_smooth <- SemiParBIVProbit(list(
boarder_smooth.eq,
expire_smooth.eq),
data = dat_all)

RR(bpN_smooth_ICU_smooth, 'boarder', n.sim=2000)

```

We also use a Cox proportional hazards model to examine the effect of boarding on survival.

In this first Cox analysis we use *dat_all*, keeping in mind that there exists a strong selection bias in favor of boarders being less acutely ill.

```{r}

library(survival)

dat_all$start = 0
S <- Surv(
time = dat_all$start,
time2 = dat_all$days_survived,
event = dat_all$icustay_expire_flag_mod)

model <- coxph(S ~ boarder + poly(icustay_admit_age, 3) + gender + poly(OASIS, 3) +
poly(elixhauser_28day, 3) + west_initial_team_census +
west_initial_outboarder_count + los_days_prior_to_icu +
factor(transfers.intime_year),
data = dat_all )

exp(coef(model))
exp(confint(model))

```

In the above model, even *without* accounting in any way for the strong selection bias, there is trend toward decreased survival among boarders.

Now, in order to minimize the selection bias present in our sample, we define a data subset *dat_2pop*. This includes only those rows where there were either *no MICU beds available* or *>2 MICU beds available* at the time a patient was admitted or transferred to the ICU (4293 rows excluded [50.9%]). These represent moments in time where there is no selection bias with respect to which patients will become boarders. In the first group (no MICU beds available), *any* new patient will have to become a boarder. In the second group (3 or more MICU beds available), there is no active or impending MICU capacity constraint that would promote boarding.

```{r}

dat_2pop <- subset(dat, ( ( ((west_initial_remaining_beds == 0) & (boarder == 1)) |
((west_initial_remaining_beds > 2) & (boarder == 0))
) &
( !is.na(elixhauser_28day) &
(los_days_prior_to_icu >= 0) &
(days_survived >= 0)
)
)
)

nrow(dat)
nrow(dat_2pop)

```

We now re-run the same Cox proportional hazards model using *dat_2pop*.

```{r}

dat_2pop$start = 0
S <- Surv(
time = dat_2pop$start,
time2 = dat_2pop$days_survived,
event = dat_2pop$icustay_expire_flag_mod)

model <- coxph(S ~ boarder + poly(icustay_admit_age, 3) + gender + poly(OASIS, 3) +
poly(elixhauser_28day, 3) + west_initial_team_census +
west_initial_outboarder_count + los_days_prior_to_icu +
factor(transfers.intime_year),
data = dat_2pop )


exp(coef(model))
exp(confint(model))

```

This demonstrates an even stronger effect of boarding on survival. Of note, the result also meets statistical significance.
22 changes: 22 additions & 0 deletions section3/chapter04/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# 3.4 Mortality prediction in the ICU based on MIMIC-II: Results from the Super ICU Learner Algorithm (SICULA) project

This directory contains the code used in the following publication:

Pirracchio R. **Mortality prediction in the ICU based on MIMIC-II: Results from the Super ICU Learner Algorithm (SICULA) project**, in *Secondary analysis of Electronic Health Record Data*, Springer, [Under Review]

Code is provided in R format.

## Summary of Publication

MIMIC II dataset offers a unique opportunity to develop and validate new severity scores. Non-parametric approaches are needed to model ICU mortality. Prediction of hospital mortality based on the Super Learner achieves significantly improved performance, both in terms of calibration and discrimination, as compared to conventional severity scores.

## Replicating this Publication

The work presented in this case study can be replicated as follows:

*insert*


***
Part of the wider **[Secondary Analysis of Electronic Health Records](https://github.com/MIT-LCP/critical-data-book)** book
***
51 changes: 51 additions & 0 deletions section3/chapter04/chapter04_code.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Require(SuperLearner)

SL.library &lt;-

c(&quot;SL.glmnet&quot;,&quot;SL.glm&quot;,&quot;SL.stepAIC&quot;,&quot;SL.nnet&quot;,&quot;SL.polymars&quot;,&quot;SL.randomForest&quot;,&quot;SL.gam&quot;,&quot;

SL.ipredbagg&quot;,&quot;SL.gbm&quot;,&quot;SL.rpartPrune&quot;)

# RUN Super Learner only with predictors included in the SAPS = X

Y=outcome

X=set of predictors

fitSL&lt;-CV.SuperLearner(Y=Y, X=X, V=10, family = binomial(), SL.library=SL.library, method

= &quot;method.NNLS&quot;, id = NULL, verbose = FALSE,

cvControl=list(stratifyCV=TRUE,shuffle=TRUE,V=10))

predictSL&lt;- fitSL$SL.predict

predictions &lt;- cbind(fitSL$SL.predict,fitSL$library.predict)

labels &lt;- fitSL$Y

folds &lt;- fitSL$folds

pdf(file=&quot;FIT.pdf&quot;)

plot(fitSL,package=&quot;ggplot2&quot;,constant=qnorm(0.975),sort=TRUE) # CV risk estimation for

each candidate and SL

dev.off()

result_AUC&lt;-as.data.frame(matrix(data=NA,ncol=3,nrow=(length(SL.library)+1)))

for (i in 1:(length(SL.library)+1))

{

result_AUC[i,]&lt;-c(AUC_IC(i)$cvAUC,AUC_IC(i)$ci[1],AUC_IC(i)$ci[2])

}

colnames(result_AUC)&lt;-c(&quot;AUC&quot;,&quot;L-95%CI&quot;,&quot;U-95%CI&quot;)

rownames(result_AUC)&lt;-c(&quot;Super Learner&quot;,SL.library)

save(result_AUC,file=&#39;resutlAUC.RData&#39;)
22 changes: 22 additions & 0 deletions section3/chapter05/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# 3.5 Mortality Prediction in the ICU

This directory contains the code used in the following publication:

Lee J. *et al.* **Mortality Prediction in the ICU**, in *Secondary analysis of Electronic Health Record Data*, Springer, [Under Review]

Code is provided in SQL, R format.

## Summary of Publication

This case study describes how to construct mortality prediction models using typical clinical data available in MIMIC-II. Several predictive models are utilized and compared.

## Replicating this Publication

The work presented in this case study can be replicated as follows:

*insert*


***
Part of the wider **[Secondary Analysis of Electronic Health Records](https://github.com/MIT-LCP/critical-data-book)** book
***
Loading