# Estimate Specificity and Sensitivity
- **Project:** Multi-ancestry PRS
- **Version:** Python/3.9
- **Status:** COMPLETE
- **Last Updated:** 2-MAY-2024

## Notebook Overview
- Specificity and Sensitivity 

In [1]:
## Load packages
module load python
module load R

[+] Loading python 3.10  ... 
[+] Loading gcc  11.3.0  ... 
[+] Loading HDF5  1.12.2 
[+] Loading netcdf  4.9.0 
[-] Unloading gcc  11.3.0  ... 
[+] Loading gcc  11.3.0  ... 
[+] Loading openmpi/4.1.3/gcc-11.3.0  ... 
[+] Loading pandoc  2.18  on cn4271 
[+] Loading pcre2  10.40 
[+] Loading R 4.3.2 


In [2]:
###################################### AAC ###################################### 
cd ${WORK_DIR}/imputed_data/AAC/

In [2]:
### RISK
library(data.table)
setwd("./imputed_data/AAC/")
library(caret)
library(pROC)
dat <- read.table("PRS_score_release_AFRICANS.profile", header = T) 
dat$CASE <- dat$PHENO - 1
dat <- subset(dat, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("AFRICANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
dat <- read.table("PRS_score_release_EUROPEAN.profile", header = T) 
dat$CASE <- dat$PHENO - 1
dat <- subset(dat, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EUROPEANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
dat <- read.table("PRS_score_release_LATINO.profile", header = T) 
dat$CASE <- dat$PHENO - 1
dat <- subset(dat, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("LATINO")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
dat <- read.table("PRS_score_release_EASTASIANS.profile", header = T) 
dat$CASE <- dat$PHENO - 1
dat <- subset(dat, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EASTASIANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc


“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "AFRICANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     808     257
   DISEASE       0       0
                                          
               Accuracy : 0.7587          
                 95% CI : (0.7318, 0.7841)
    No Information Rate : 0.7587          
    P-Value [Acc > NIR] : 0.5167          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.7587          
             Prevalence : 0.2413          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 808 controls (dat$reported CONTROL) < 257 cases (dat$reported DISEASE).
Area under the curve: 0.5901

[1] 
[1] "EUROPEANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     804     254
   DISEASE       4       3
                                          
               Accuracy : 0.7577          
                 95% CI : (0.7309, 0.7832)
    No Information Rate : 0.7587          
    P-Value [Acc > NIR] : 0.5452          
                                          
                  Kappa : 0.0101          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.011673        
            Specificity : 0.995050        
         Pos Pred Value : 0.428571        
         Neg Pred Value : 0.759924        
             Prevalence : 0.241315        
         Detection Rate : 0.002817        
   Detection Prevalence : 0.006573        
      Balanced Accuracy : 0.503361        
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 808 controls (dat$reported CONTROL) < 257 cases (dat$reported DISEASE).
Area under the curve: 0.6309

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "LATINO"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     808     257
   DISEASE       0       0
                                          
               Accuracy : 0.7587          
                 95% CI : (0.7318, 0.7841)
    No Information Rate : 0.7587          
    P-Value [Acc > NIR] : 0.5167          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.7587          
             Prevalence : 0.2413          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 808 controls (dat$reported CONTROL) < 257 cases (dat$reported DISEASE).
Area under the curve: 0.5587

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "EASTASIANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     808     257
   DISEASE       0       0
                                          
               Accuracy : 0.7587          
                 95% CI : (0.7318, 0.7841)
    No Information Rate : 0.7587          
    P-Value [Acc > NIR] : 0.5167          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.7587          
             Prevalence : 0.2413          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 808 controls (dat$reported CONTROL) < 257 cases (dat$reported DISEASE).
Area under the curve: 0.6163

In [1]:
###################################### AFR ###################################### 
cd ${WORK_DIR}/imputed_data/AFR/

In [1]:
### RISK
library(data.table)
setwd("./imputed_data/AFR/")
library(caret)
library(pROC)
data <- read.table("PRS_score_release_AFRICANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("AFRICANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EUROPEAN.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EUROPEANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_LATINO.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("LATINO")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EASTASIANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EASTASIANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

Loading required package: ggplot2

Loading required package: lattice

Type 'citation("pROC")' for a citation.


Attaching package: ‘pROC’


The following objects are masked from ‘package:stats’:

    cov, smooth, var


“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "AFRICANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    1683     920
   DISEASE       0       0
                                          
               Accuracy : 0.6466          
                 95% CI : (0.6278, 0.6649)
    No Information Rate : 0.6466          
    P-Value [Acc > NIR] : 0.509           
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.6466          
             Prevalence : 0.3534          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 1683 controls (dat$reported CONTROL) < 920 cases (dat$reported DISEASE).
Area under the curve: 0.5419

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "EUROPEANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    1683     920
   DISEASE       0       0
                                          
               Accuracy : 0.6466          
                 95% CI : (0.6278, 0.6649)
    No Information Rate : 0.6466          
    P-Value [Acc > NIR] : 0.509           
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.6466          
             Prevalence : 0.3534          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 1683 controls (dat$reported CONTROL) < 920 cases (dat$reported DISEASE).
Area under the curve: 0.5375

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "LATINO"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    1683     920
   DISEASE       0       0
                                          
               Accuracy : 0.6466          
                 95% CI : (0.6278, 0.6649)
    No Information Rate : 0.6466          
    P-Value [Acc > NIR] : 0.509           
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.6466          
             Prevalence : 0.3534          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 1683 controls (dat$reported CONTROL) < 920 cases (dat$reported DISEASE).
Area under the curve: 0.5127

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "EASTASIANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    1683     920
   DISEASE       0       0
                                          
               Accuracy : 0.6466          
                 95% CI : (0.6278, 0.6649)
    No Information Rate : 0.6466          
    P-Value [Acc > NIR] : 0.509           
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.6466          
             Prevalence : 0.3534          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 1683 controls (dat$reported CONTROL) < 920 cases (dat$reported DISEASE).
Area under the curve: 0.5318

In [1]:
###################################### AJ ###################################### 
cd ${WORK_DIR}/imputed_data/AJ/

In [2]:
### RISK
library(data.table)
setwd("./imputed_data/AJ/")
library(caret)
library(pROC)
data <- read.table("PRS_score_release_AFRICANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("AFRICANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EUROPEAN.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EUROPEANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_LATINO.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("LATINO")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EASTASIANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EASTASIANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "AFRICANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL       0       0
   DISEASE     459    1011
                                          
               Accuracy : 0.6878          
                 95% CI : (0.6634, 0.7114)
    No Information Rate : 0.6878          
    P-Value [Acc > NIR] : 0.5126          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.6878          
         Neg Pred Value :    NaN          
             Prevalence : 0.6878          
         Detection Rate : 0.6878          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 459 controls (dat$reported CONTROL) < 1011 cases (dat$reported DISEASE).
Area under the curve: 0.5351

[1] 
[1] "EUROPEANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL      67      56
   DISEASE     392     955
                                         
               Accuracy : 0.6952         
                 95% CI : (0.671, 0.7187)
    No Information Rate : 0.6878         
    P-Value [Acc > NIR] : 0.278          
                                         
                  Kappa : 0.1132         
                                         
 Mcnemar's Test P-Value : <2e-16         
                                         
            Sensitivity : 0.9446         
            Specificity : 0.1460         
         Pos Pred Value : 0.7090         
         Neg Pred Value : 0.5447         
             Prevalence : 0.6878         
         Detection Rate : 0.6497         
   Detection Prevalence : 0.9163         
      Balanced Accuracy : 0.5453         
                                         
       'Positive' Class : DISEASE        
                          

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 459 controls (dat$reported CONTROL) < 1011 cases (dat$reported DISEASE).
Area under the curve: 0.6796

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "LATINO"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL       0       0
   DISEASE     459    1011
                                          
               Accuracy : 0.6878          
                 95% CI : (0.6634, 0.7114)
    No Information Rate : 0.6878          
    P-Value [Acc > NIR] : 0.5126          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.6878          
         Neg Pred Value :    NaN          
             Prevalence : 0.6878          
         Detection Rate : 0.6878          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 459 controls (dat$reported CONTROL) < 1011 cases (dat$reported DISEASE).
Area under the curve: 0.5567

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "EASTASIANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL       0       0
   DISEASE     459    1011
                                          
               Accuracy : 0.6878          
                 95% CI : (0.6634, 0.7114)
    No Information Rate : 0.6878          
    P-Value [Acc > NIR] : 0.5126          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.6878          
         Neg Pred Value :    NaN          
             Prevalence : 0.6878          
         Detection Rate : 0.6878          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 459 controls (dat$reported CONTROL) < 1011 cases (dat$reported DISEASE).
Area under the curve: 0.5745

In [1]:
###################################### AMR ###################################### 
cd ${WORK_DIR}/imputed_data/AMR/

In [1]:
### RISK
library(data.table)
setwd("./imputed_data/AMR/")
library(caret)
library(pROC)
data <- read.table("PRS_score_release_AFRICANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("AFRICANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EUROPEAN.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EUROPEANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_LATINO.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("LATINO")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EASTASIANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EASTASIANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

Loading required package: ggplot2

Loading required package: lattice

Type 'citation("pROC")' for a citation.


Attaching package: ‘pROC’


The following objects are masked from ‘package:stats’:

    cov, smooth, var


“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "AFRICANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL       0       0
   DISEASE     139     367
                                          
               Accuracy : 0.7253          
                 95% CI : (0.6842, 0.7638)
    No Information Rate : 0.7253          
    P-Value [Acc > NIR] : 0.5228          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.7253          
         Neg Pred Value :    NaN          
             Prevalence : 0.7253          
         Detection Rate : 0.7253          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 139 controls (dat$reported CONTROL) < 367 cases (dat$reported DISEASE).
Area under the curve: 0.5709

[1] 
[1] "EUROPEANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL       3       3
   DISEASE     136     364
                                          
               Accuracy : 0.7253          
                 95% CI : (0.6842, 0.7638)
    No Information Rate : 0.7253          
    P-Value [Acc > NIR] : 0.5228          
                                          
                  Kappa : 0.0191          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.99183         
            Specificity : 0.02158         
         Pos Pred Value : 0.72800         
         Neg Pred Value : 0.50000         
             Prevalence : 0.72530         
         Detection Rate : 0.71937         
   Detection Prevalence : 0.98814         
      Balanced Accuracy : 0.50670         
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 139 controls (dat$reported CONTROL) < 367 cases (dat$reported DISEASE).
Area under the curve: 0.621

“Levels are not in the same order for reference and data. Refactoring data to match.”


[1] 
[1] "LATINO"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL       0       0
   DISEASE     139     367
                                          
               Accuracy : 0.7253          
                 95% CI : (0.6842, 0.7638)
    No Information Rate : 0.7253          
    P-Value [Acc > NIR] : 0.5228          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.7253          
         Neg Pred Value :    NaN          
             Prevalence : 0.7253          
         Detection Rate : 0.7253          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 139 controls (dat$reported CONTROL) < 367 cases (dat$reported DISEASE).
Area under the curve: 0.5673

[1] 
[1] "EASTASIANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL       1       1
   DISEASE     138     366
                                          
               Accuracy : 0.7253          
                 95% CI : (0.6842, 0.7638)
    No Information Rate : 0.7253          
    P-Value [Acc > NIR] : 0.5228          
                                          
                  Kappa : 0.0064          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.997275        
            Specificity : 0.007194        
         Pos Pred Value : 0.726190        
         Neg Pred Value : 0.500000        
             Prevalence : 0.725296        
         Detection Rate : 0.723320        
   Detection Prevalence : 0.996047        
      Balanced Accuracy : 0.502235        
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 139 controls (dat$reported CONTROL) < 367 cases (dat$reported DISEASE).
Area under the curve: 0.6116

In [1]:
###################################### CAS ###################################### 
cd ${WORK_DIR}/imputed_data/CAS/

In [1]:
### RISK
library(data.table)
setwd("./imputed_data/CAS/")
library(caret)
library(pROC)
data <- read.table("PRS_score_release_AFRICANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("AFRICANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EUROPEAN.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EUROPEANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_LATINO.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("LATINO")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EASTASIANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EASTASIANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

Loading required package: ggplot2

Loading required package: lattice

Type 'citation("pROC")' for a citation.


Attaching package: ‘pROC’


The following objects are masked from ‘package:stats’:

    cov, smooth, var




[1] 
[1] "AFRICANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     175     153
   DISEASE     119     133
                                          
               Accuracy : 0.531           
                 95% CI : (0.4895, 0.5723)
    No Information Rate : 0.5069          
    P-Value [Acc > NIR] : 0.1311          
                                          
                  Kappa : 0.0604          
                                          
 Mcnemar's Test P-Value : 0.0454          
                                          
            Sensitivity : 0.4650          
            Specificity : 0.5952          
         Pos Pred Value : 0.5278          
         Neg Pred Value : 0.5335          
             Prevalence : 0.4931          
         Detection Rate : 0.2293          
   Detection Prevalence : 0.4345          
      Balanced Accuracy : 0.5301          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 294 controls (dat$reported CONTROL) < 286 cases (dat$reported DISEASE).
Area under the curve: 0.5493

[1] 
[1] "EUROPEANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     168     146
   DISEASE     126     140
                                          
               Accuracy : 0.531           
                 95% CI : (0.4895, 0.5723)
    No Information Rate : 0.5069          
    P-Value [Acc > NIR] : 0.1311          
                                          
                  Kappa : 0.061           
                                          
 Mcnemar's Test P-Value : 0.2493          
                                          
            Sensitivity : 0.4895          
            Specificity : 0.5714          
         Pos Pred Value : 0.5263          
         Neg Pred Value : 0.5350          
             Prevalence : 0.4931          
         Detection Rate : 0.2414          
   Detection Prevalence : 0.4586          
      Balanced Accuracy : 0.5305          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 294 controls (dat$reported CONTROL) < 286 cases (dat$reported DISEASE).
Area under the curve: 0.5634

[1] 
[1] "LATINO"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     173     134
   DISEASE     121     152
                                          
               Accuracy : 0.5603          
                 95% CI : (0.5189, 0.6012)
    No Information Rate : 0.5069          
    P-Value [Acc > NIR] : 0.005604        
                                          
                  Kappa : 0.12            
                                          
 Mcnemar's Test P-Value : 0.452370        
                                          
            Sensitivity : 0.5315          
            Specificity : 0.5884          
         Pos Pred Value : 0.5568          
         Neg Pred Value : 0.5635          
             Prevalence : 0.4931          
         Detection Rate : 0.2621          
   Detection Prevalence : 0.4707          
      Balanced Accuracy : 0.5600          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 294 controls (dat$reported CONTROL) < 286 cases (dat$reported DISEASE).
Area under the curve: 0.5829

[1] 
[1] "EASTASIANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     169     141
   DISEASE     125     145
                                          
               Accuracy : 0.5414          
                 95% CI : (0.4998, 0.5825)
    No Information Rate : 0.5069          
    P-Value [Acc > NIR] : 0.05259         
                                          
                  Kappa : 0.0819          
                                          
 Mcnemar's Test P-Value : 0.35772         
                                          
            Sensitivity : 0.5070          
            Specificity : 0.5748          
         Pos Pred Value : 0.5370          
         Neg Pred Value : 0.5452          
             Prevalence : 0.4931          
         Detection Rate : 0.2500          
   Detection Prevalence : 0.4655          
      Balanced Accuracy : 0.5409          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 294 controls (dat$reported CONTROL) < 286 cases (dat$reported DISEASE).
Area under the curve: 0.563

In [1]:
###################################### EAS ###################################### 
cd ${WORK_DIR}/imputed_data/EAS/

In [1]:
### RISK
library(data.table)
setwd("./imputed_data/EAS/")
library(caret)
library(pROC)
data <- read.table("PRS_score_release_AFRICANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("AFRICANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EUROPEAN.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EUROPEANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_LATINO.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("LATINO")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EASTASIANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EASTASIANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

Loading required package: ggplot2

Loading required package: lattice

Type 'citation("pROC")' for a citation.


Attaching package: ‘pROC’


The following objects are masked from ‘package:stats’:

    cov, smooth, var




[1] 
[1] "AFRICANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    2273    1460
   DISEASE     105     117
                                          
               Accuracy : 0.6043          
                 95% CI : (0.5889, 0.6196)
    No Information Rate : 0.6013          
    P-Value [Acc > NIR] : 0.3548          
                                          
                  Kappa : 0.0351          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.07419         
            Specificity : 0.95585         
         Pos Pred Value : 0.52703         
         Neg Pred Value : 0.60889         
             Prevalence : 0.39874         
         Detection Rate : 0.02958         
   Detection Prevalence : 0.05613         
      Balanced Accuracy : 0.51502         
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 2378 controls (dat$reported CONTROL) < 1577 cases (dat$reported DISEASE).
Area under the curve: 0.5724

[1] 
[1] "EUROPEANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    2063    1182
   DISEASE     315     395
                                          
               Accuracy : 0.6215          
                 95% CI : (0.6062, 0.6366)
    No Information Rate : 0.6013          
    P-Value [Acc > NIR] : 0.004822        
                                          
                  Kappa : 0.1301          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.25048         
            Specificity : 0.86754         
         Pos Pred Value : 0.55634         
         Neg Pred Value : 0.63575         
             Prevalence : 0.39874         
         Detection Rate : 0.09987         
   Detection Prevalence : 0.17952         
      Balanced Accuracy : 0.55901         
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 2378 controls (dat$reported CONTROL) < 1577 cases (dat$reported DISEASE).
Area under the curve: 0.6231

[1] 
[1] "LATINO"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    2261    1463
   DISEASE     117     114
                                         
               Accuracy : 0.6005         
                 95% CI : (0.585, 0.6158)
    No Information Rate : 0.6013         
    P-Value [Acc > NIR] : 0.5457         
                                         
                  Kappa : 0.027          
                                         
 Mcnemar's Test P-Value : <2e-16         
                                         
            Sensitivity : 0.07229        
            Specificity : 0.95080        
         Pos Pred Value : 0.49351        
         Neg Pred Value : 0.60714        
             Prevalence : 0.39874        
         Detection Rate : 0.02882        
   Detection Prevalence : 0.05841        
      Balanced Accuracy : 0.51154        
                                         
       'Positive' Class : DISEASE        
                          

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 2378 controls (dat$reported CONTROL) < 1577 cases (dat$reported DISEASE).
Area under the curve: 0.575

[1] 
[1] "EASTASIANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    2330    1543
   DISEASE      48      34
                                          
               Accuracy : 0.5977          
                 95% CI : (0.5822, 0.6131)
    No Information Rate : 0.6013          
    P-Value [Acc > NIR] : 0.6814          
                                          
                  Kappa : 0.0016          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.021560        
            Specificity : 0.979815        
         Pos Pred Value : 0.414634        
         Neg Pred Value : 0.601601        
             Prevalence : 0.398736        
         Detection Rate : 0.008597        
   Detection Prevalence : 0.020733        
      Balanced Accuracy : 0.500687        
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 2378 controls (dat$reported CONTROL) < 1577 cases (dat$reported DISEASE).
Area under the curve: 0.5579

In [1]:
###################################### EUR ###################################### 
cd ${WORK_DIR}/imputed_data/EUR/

In [1]:
### RISK
library(data.table)
setwd("./imputed_data/EUR/")
library(caret)
library(pROC)
data <- read.table("PRS_score_release_AFRICANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("AFRICANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EUROPEAN.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EUROPEANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_LATINO.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("LATINO")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

library(data.table)
library(caret)
data <- read.table("PRS_score_release_EASTASIANS.profile", header = T) 
data$CASE <- data$PHENO - 1
dat <- subset(data, CASE != -10)
meanControls <- mean(dat$SCORE[dat$CASE == 0])
sdControls <- sd(dat$SCORE[dat$CASE == 0])
dat$zSCORE <- (dat$SCORE - meanControls)/sdControls
grsTests <- glm(CASE ~ zSCORE, family="binomial", data = dat)
dat$probDisease <- predict(grsTests, dat, type = c("response"))
dat$predicted <- ifelse(dat$probDisease > 0.5, "DISEASE", "CONTROL")
dat$reported <- ifelse(dat$CASE == 1, "DISEASE","CONTROL")
confMat <- confusionMatrix(data = as.factor(dat$predicted), reference = as.factor(dat$reported), positive = "DISEASE")
print(noquote(""))
print("EASTASIANS")
confMat
roc <- roc(response = dat$reported, predictor = dat$probDisease)
roc

Loading required package: ggplot2

Loading required package: lattice

Type 'citation("pROC")' for a citation.


Attaching package: ‘pROC’


The following objects are masked from ‘package:stats’:

    cov, smooth, var




[1] 
[1] "AFRICANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL       4       3
   DISEASE    7592   11838
                                          
               Accuracy : 0.6093          
                 95% CI : (0.6023, 0.6161)
    No Information Rate : 0.6092          
    P-Value [Acc > NIR] : 0.4973          
                                          
                  Kappa : 3e-04           
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.9997466       
            Specificity : 0.0005266       
         Pos Pred Value : 0.6092640       
         Neg Pred Value : 0.5714286       
             Prevalence : 0.6091990       
         Detection Rate : 0.6090446       
   Detection Prevalence : 0.9996399       
      Balanced Accuracy : 0.5001366       
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 7596 controls (dat$reported CONTROL) < 11841 cases (dat$reported DISEASE).
Area under the curve: 0.5385

[1] 
[1] "EUROPEANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL    1832    1390
   DISEASE    5764   10451
                                          
               Accuracy : 0.6319          
                 95% CI : (0.6251, 0.6387)
    No Information Rate : 0.6092          
    P-Value [Acc > NIR] : 3.669e-11       
                                          
                  Kappa : 0.138           
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.8826          
            Specificity : 0.2412          
         Pos Pred Value : 0.6445          
         Neg Pred Value : 0.5686          
             Prevalence : 0.6092          
         Detection Rate : 0.5377          
   Detection Prevalence : 0.8342          
      Balanced Accuracy : 0.5619          
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 7596 controls (dat$reported CONTROL) < 11841 cases (dat$reported DISEASE).
Area under the curve: 0.6303

[1] 
[1] "LATINO"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     344     321
   DISEASE    7252   11520
                                          
               Accuracy : 0.6104          
                 95% CI : (0.6035, 0.6172)
    No Information Rate : 0.6092          
    P-Value [Acc > NIR] : 0.3706          
                                          
                  Kappa : 0.0217          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.97289         
            Specificity : 0.04529         
         Pos Pred Value : 0.61368         
         Neg Pred Value : 0.51729         
             Prevalence : 0.60920         
         Detection Rate : 0.59268         
   Detection Prevalence : 0.96579         
      Balanced Accuracy : 0.50909         
                                          
       'Positive' Class : DISEASE         
      

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 7596 controls (dat$reported CONTROL) < 11841 cases (dat$reported DISEASE).
Area under the curve: 0.5689

[1] 
[1] "EASTASIANS"


Confusion Matrix and Statistics

          Reference
Prediction CONTROL DISEASE
   CONTROL     697     586
   DISEASE    6899   11255
                                         
               Accuracy : 0.6149         
                 95% CI : (0.608, 0.6218)
    No Information Rate : 0.6092         
    P-Value [Acc > NIR] : 0.05205        
                                         
                  Kappa : 0.0497         
                                         
 Mcnemar's Test P-Value : < 2e-16        
                                         
            Sensitivity : 0.95051        
            Specificity : 0.09176        
         Pos Pred Value : 0.61997        
         Neg Pred Value : 0.54326        
             Prevalence : 0.60920        
         Detection Rate : 0.57905        
   Detection Prevalence : 0.93399        
      Balanced Accuracy : 0.52113        
                                         
       'Positive' Class : DISEASE        
                          

Setting levels: control = CONTROL, case = DISEASE

Setting direction: controls < cases




Call:
roc.default(response = dat$reported, predictor = dat$probDisease)

Data: dat$probDisease in 7596 controls (dat$reported CONTROL) < 11841 cases (dat$reported DISEASE).
Area under the curve: 0.5822