# Multiomics BMI Paper — Gut Microbiome-based Obesity Classifier in the Arivale Cohort: DeLong's Test

***by Kengo Watanabe***  

This Jupyter Notebook (with R kernel) performed DeLong's test for the gut microbiome-based obesity classifiers in the Arivale cohort (as the sub-notebook).  

Input files:  
* Arivale classifier predictions: 221010_Multiomics-BMI-NatMed1stRevision_Microbiome-RFclassifier-ver5_Arivale-wenceslaus_\[BMI/MetBMI\]class-BothSex.tsv  

Output figures and tables:  
* Intermediate tables for the main notebook (ROC curve, test result)  

Original notebook (memo for my future tracing):  
* wenceslaus:\[JupyterLab HOME\]/220621_Multiomics-BMI-NatMedRevision/221010_Multiomics-BMI-NatMed1stRevision_Microbiome-RFclassifier-DeLong-ver5_Arivale-wenceslaus.ipynb  

In [1]:
library("tidyverse")
options(repr.plot.width=5, repr.plot.height=5)#Default=7x7

#CRAN
for (package in c("pROC")) {
    #install.packages(package)
    eval(bquote(library(.(package))))
    print(str_c(package, ": ", as.character(packageVersion(package))))
}

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.5     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.4     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.0.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Type 'citation("pROC")' for a citation.


Attaching package: ‘pROC’


The following objects are masked from ‘package:stats’:

    cov, smooth, var




[1] "pROC: 1.18.0"


## 1. Prepare classifier predictions

In [None]:
#Import classifier predictions
fileDir <- "./ExportData/"
ipynbName <- "221010_Multiomics-BMI-NatMed1stRevision_Microbiome-RFclassifier-ver5_Arivale-wenceslaus_"
fileName <- "BMIclass-BothSex.tsv"
temp <- read_delim(str_c(fileDir,ipynbName,fileName), delim="\t")
print(str_c("nrow: ",nrow(temp)))
head(temp)

predict_meas <- temp

In [None]:
#Import classifier predictions
fileDir <- "./ExportData/"
ipynbName <- "221010_Multiomics-BMI-NatMed1stRevision_Microbiome-RFclassifier-ver5_Arivale-wenceslaus_"
fileName <- "MetBMIclass-BothSex.tsv"
temp <- read_delim(str_c(fileDir,ipynbName,fileName), delim="\t")
print(str_c("nrow: ",nrow(temp)))
head(temp)

predict_met <- temp

## 2. Generate ROC object

In [None]:
#Generate ROC object with 95% CI for sensitivity
roc_meas <- roc(predict_meas$BMI_class_code, predict_meas$`BMI_class_predicted-probability`,
                ci=TRUE, of="se", conf.level=0.95, boot.n=10000, boot.stratified=TRUE,
                specificities=seq(0, 1, 0.01))
roc_meas

In [None]:
#Generate ROC object with 95% CI for sensitivity
roc_met <- roc(predict_met$MetBMI_class_code, predict_met$`MetBMI_class_predicted-probability`,
               ci=TRUE, of="se", conf.level=0.95, boot.n=10000, boot.stratified=TRUE,
               specificities=seq(0, 1, 0.01))
roc_met

In [None]:
#Visualization with 95% CI for sensitivity
plot.roc(roc_meas, legacy.axes=TRUE, col="black", print.auc=TRUE, print.auc.y=0.4,
         ci=TRUE, ci.type="shape", ci.col=rgb(red=0, green=0, blue=0, alpha=0.2),
         identity=TRUE, identity.col="red")
plot.roc(roc_met, add=TRUE, legacy.axes=TRUE, col="blue", print.auc=TRUE, print.auc.y=0.2,
         ci=TRUE, ci.type="shape", ci.col=rgb(red=0, green=0, blue=1, alpha=0.2),
         identity=TRUE, identity.col="red")

> –> This is just a reference. 95% CI for sensitivity should be removed in the final figure, because DeLong's test assesses AUC, not sensitivity at a threshold.  

In [None]:
#Visualization
plot.roc(roc_meas, legacy.axes=TRUE, col="black", print.auc=TRUE, print.auc.y=0.4,
         ci=FALSE, identity=TRUE, identity.col="red")
plot.roc(roc_met, add=TRUE, legacy.axes=TRUE, col="blue", print.auc=TRUE, print.auc.y=0.2,
         ci=FALSE, identity=TRUE, identity.col="red")

> –> Final neat figure will be generated in Python.  

## 3. 95% CI of AUC

In [None]:
print("BMI")
print("AUC 95% CI (DeLong):")
ci.auc(roc_meas, conf.level=0.95, method="delong")
print("AUC 95% CI (10000 bootstrap):")
ci.auc(roc_meas, conf.level=0.95, method="bootstrap", boot.n=10000, boot.stratified=TRUE)

print("MetBMI")
print("AUC 95% CI (DeLong):")
ci.auc(roc_met, conf.level=0.95, method="delong")
print("AUC 95% CI (10000 bootstrap):")
ci.auc(roc_met, conf.level=0.95, method="bootstrap", boot.n=10000, boot.stratified=TRUE)

## 4. DeLong's test

In [None]:
#DeLong's test
test_delong <- roc.test(roc_meas, roc_met, method="delong",
                        alternative="two.sided", paired=FALSE, conf.level=0.95, conf.int=TRUE)
test_delong

In [None]:
#Cf. Bootstrap test
test_boot <- roc.test(roc_meas, roc_met, method="bootstrap",
                      alternative="two.sided", paired=FALSE, conf.level=0.95, conf.int=TRUE,
                      boot.n=10000, boot.stratified=TRUE)
test_boot

## 5. Save result summary

In [None]:
#Check result object
summary(test_delong)
print("")
str(test_delong)

> –> Note that, even if conf.int option is TRUE, confidence interval output for unpaired test is not currently supported by the pROC package.  

In [None]:
#Summarize the test result as a table
temp <- tibble(Variable="AUC (ROC)",
               nCtrls_BMI=length(test_delong$roc1$controls),
               nCases_BMI=length(test_delong$roc1$cases),
               Estimate_BMI=unname(test_delong$estimate[1]),
               nCtrls_MetBMI=length(test_delong$roc2$controls),
               nCases_MetBMI=length(test_delong$roc2$cases),
               Estimate_MetBMI=unname(test_delong$estimate[2]),
               DoF=unname(test_delong$parameter),
               Zstatistic=unname(test_delong$statistic),
               Pval=test_delong$p.value)
temp

#Save
fileDir <- "./ExportData/"
ipynbName <- "221010_Multiomics-BMI-NatMed1stRevision_Microbiome-RFclassifier-DeLong-ver5_Arivale-wenceslaus_"
fileName <- "result-summary.tsv"
temp %>% write_tsv(str_c(fileDir,ipynbName,fileName))

In [None]:
#Summarize ROC curve as a table
temp <- tibble(Sensitivity=test_delong$roc1$sensitivities,
               Specificity=test_delong$roc1$specificities,
               Thresholds=test_delong$roc1$thresholds)
print(str_c("nrow: ",nrow(temp)))
head(temp)
tail(temp)
##Save
fileDir <- "./ExportData/"
ipynbName <- "221010_Multiomics-BMI-NatMed1stRevision_Microbiome-RFclassifier-DeLong-ver5_Arivale-wenceslaus_"
fileName <- "BMI-ROC-curve.tsv"
temp %>% write_tsv(str_c(fileDir,ipynbName,fileName))

In [None]:
#Summarize ROC curve as a table
temp <- tibble(Sensitivity=test_delong$roc2$sensitivities,
               Specificity=test_delong$roc2$specificities,
               Thresholds=test_delong$roc2$thresholds)
print(str_c("nrow: ",nrow(temp)))
head(temp)
tail(temp)
##Save
fileDir <- "./ExportData/"
ipynbName <- "221010_Multiomics-BMI-NatMed1stRevision_Microbiome-RFclassifier-DeLong-ver5_Arivale-wenceslaus_"
fileName <- "MetBMI-ROC-curve.tsv"
temp %>% write_tsv(str_c(fileDir,ipynbName,fileName))

# — Move back to the main Python notebook —  

# — Session information —

In [15]:
sessionInfo()

R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/envs/arivale-r/lib/libopenblasp-r0.3.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pROC_1.18.0     forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7    
 [5] purrr_0.3.4     readr_2.0.2     tidyr_1.1.4     tibble_3.1.5   
 [9] ggplot2_3.3.5   tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7       lubridate_1.8.0  assertthat_0.2.1 digest_0.6.28   
 [5] utf8_1.2.2       