## Dyslipidemia and incident CH
According to the guidelines provided by the American Heart Association (AHA) and the American College of Cardiology (ACC), the following are the threshold values for lipid levels in mg/dL:

1. Total Cholesterol (TC):
   - Desirable level: Less than 200 mg/dL
   - Borderline high: 200-239 mg/dL
   - High: 240 mg/dL and above

2. Low-Density Lipoprotein Cholesterol (LDL-C):
   - Optimal: Less than 100 mg/dL
   - Near optimal/above optimal: 100-129 mg/dL
   - Borderline high: 130-159 mg/dL
   - High: 160-189 mg/dL
   - Very high: 190 mg/dL and above

3. High-Density Lipoprotein Cholesterol (HDL-C):
   - Low: Less than 40 mg/dL (in men), less than 50 mg/dL (in women)
   - High: 60 mg/dL and above (considered protective against heart disease)

4. Triglycerides:
   - Normal: Less than 150 mg/dL
   - Borderline high: 150-199 mg/dL
   - High: 200-499 mg/dL
   - Very high: 500 mg/dL and above

These thresholds may be used as a general guideline for assessing lipid levels in the United States. However, it's important to consult with a healthcare professional who can evaluate your specific health situation, other risk factors, and determine the most appropriate management strategy for dyslipidemia.

## TG/HDL-C ratios

Triglyceride/HDL-C ratio, also known as the TG/HDL-C ratio, is a measure that combines the levels of triglycerides (TG) and high-density lipoprotein cholesterol (HDL-C) in the blood. It is used as an indicator of cardiovascular risk and can provide valuable insights into lipid metabolism and the balance between "good" and "bad" cholesterol.

To calculate the TG/HDL-C ratio, divide the triglyceride level (measured in mg/dL) by the HDL-C level (also measured in mg/dL).

The TG/HDL-C ratio is considered a useful marker of lipid abnormalities and insulin resistance, both of which are associated with an increased risk of cardiovascular disease. Higher TG levels and lower HDL-C levels are typically associated with an unfavorable lipid profile.

A higher TG/HDL-C ratio indicates a greater cardiovascular risk. It suggests an increased presence of small, dense LDL particles (which are more atherogenic) and decreased levels of beneficial HDL particles. Insulin resistance, obesity, metabolic syndrome, and diabetes are conditions commonly associated with higher TG/HDL-C ratios.

In general, a TG/HDL-C ratio below 2 is considered optimal, as it indicates a lower risk of cardiovascular disease. Ratios between 2 and 3.9 are considered average, while ratios above 4 are associated with an increased risk.

It's important to note that the TG/HDL-C ratio is just one component of a comprehensive assessment of cardiovascular risk. Other factors such as blood pressure, smoking status, family history, and additional lipid parameters should also be considered when evaluating overall cardiovascular health.

In [None]:
library(data.table) # version 1.14.6
library(dplyr)
# set working directory
setwd("/medpop/esp2/mesbah/projects/ch_progression/aric/epi/")

In [None]:
# Load data
## 0/1 CH status
# aric_baseline_n_v05 <- fread("../pheno/aric_baseline_n_v05_N4189.pheno_ch_status_trajectory.23Mar2023.csv", header=T)
# aric_baseline_n_v05$dAge <- aric_baseline_n_v05$Age - aric_baseline_n_v05$age_base
#summary(aric_baseline_n_v05$dAge)
aric_baseline_n_v05_noPrevHeme <- fread("../pheno/aric_baseline_n_v05_N4187.pheno_ch_status.noHemeCA.9May2023.csv", header=T)
# 
aric_baseline_n_v05_noPrevHeme$dAge <- aric_baseline_n_v05_noPrevHeme$Age - aric_baseline_n_v05_noPrevHeme$age_base
summary(aric_baseline_n_v05_noPrevHeme$dAge)
nrow(aric_baseline_n_v05_noPrevHeme)
table(aric_baseline_n_v05_noPrevHeme$incident_CH)

## corrected lipid values
lipids_base <- fread("../pheno/aric_baseline_vanilla_02082023.csv", header=T, sep="\t")

## Update lipid values in mg/dl
aric_baseline_n_v05 <- merge(aric_baseline_n_v05_noPrevHeme[, c(1:63,68:112)], 
                                        lipids_base[, c(1,16:19)], 
                                        by.x="GWAS_ID", 
                                        by.y = "gwasid")

names(aric_baseline_n_v05)
summary(aric_baseline_n_v05$ldl_base)
summary(aric_baseline_n_v05$chol_base)
summary(aric_baseline_n_v05$hdl_base)
summary(aric_baseline_n_v05$tg_base)


In [None]:

# # Unadjusted: cont. variable
# chd_is_base== CHD or IS
aric_baseline_n_v05$chd_is_base <- ifelse(aric_baseline_n_v05$chd_base==1 | aric_baseline_n_v05$is_base==1, 1,
                                          ifelse(aric_baseline_n_v05$chd_base==0 | aric_baseline_n_v05$is_base==0,
                                                 0,NA))
table(aric_baseline_n_v05$chd_is_base, exclude = NULL)

 # ASCVD = c("chd", "is")
# aric_baseline_n_v05$ascvd_base <- ifelse(aric_baseline_n_v05$chd_base==1 | 
  #                                         aric_baseline_n_v05$is_base==1 , 1,
   #                                       ifelse(aric_baseline_n_v05$chd_base==0 | 
    #                                               aric_baseline_n_v05$is_base==0 |  
     #                                            is.na( aric_baseline_n_v05$is_base),0,NA))
# table(aric_baseline_n_v05$ascvd_base, exclude = NULL)
# aric_baseline_n_v05$ascvd_base[is.na(aric_baseline_n_v05$ascvd_base)] <- 0
# table(aric_baseline_n_v05$ascvd_base, exclude = NULL)
nrow(aric_baseline_n_v05)



In [None]:
## corrected GPT version for missing data
# inverse_rank_normalize <- function(x) {
#  n <- sum(!is.na(x))
#  ranks <- rank(x, na.last = "keep")
#  normalized_values <- (ranks - 0.5) / n 
#  inverse_normalized_values <- qnorm(normalized_values)
#  return(inverse_normalized_values)
# }

### 
  ### source:  https://www.biostars.org/p/80597/ and the supplement of Yang et al. Nature 2012.
INT_yang2012 <- function(x){
  y<-qnorm((rank(x,na.last='keep')-0.5)/sum(!is.na(x)))
  return(y)
}

In [None]:
## Scale
    # INT
aric_baseline_n_v05$chol_base_INT <- INT_yang2012(aric_baseline_n_v05$chol_base)

aric_baseline_n_v05$ldl_base_INT <- INT_yang2012(aric_baseline_n_v05$ldl_base)

aric_baseline_n_v05$hdl_base_INT <- INT_yang2012(aric_baseline_n_v05$hdl_base)

aric_baseline_n_v05$tg_base_INT <- INT_yang2012(aric_baseline_n_v05$tg_base)

aric_baseline_n_v05$nonHDL_base <- (aric_baseline_n_v05$chol_base - aric_baseline_n_v05$hdl_base)

aric_baseline_n_v05$nonHDL_base_INT <- INT_yang2012(aric_baseline_n_v05$nonHDL_base)

aric_baseline_n_v05$bmi_base_INT <- INT_yang2012(aric_baseline_n_v05$bmi_base)

    # TG/HDL-C
aric_baseline_n_v05$tg_to_hdl_base <- (aric_baseline_n_v05$tg_base/aric_baseline_n_v05$hdl_base)

aric_baseline_n_v05$tg_to_hdl_base_INT <- INT_yang2012(aric_baseline_n_v05$tg_to_hdl_base)



In [None]:
# High LDL 
aric_baseline_n_v05$ldl_base_nomal_vs_high <- ifelse(aric_baseline_n_v05$ldl_base<160, 0, 
                                                     ifelse(aric_baseline_n_v05$ldl_base>=160, 1, NA) )
table(aric_baseline_n_v05$ldl_base_nomal_vs_high, exclude= NULL)

# Normal (<160 mg/dl) vs. High LDL-C (>=160 mg/dl) 
table( aric_baseline_n_v05$ldl_base_nomal_vs_high) 
# aric_baseline_n_v05$ldl_base_nomal_vs_high <- factor(aric_baseline_n_v05$ldl_base_nomal_vs_high, 
  #                                                   levels = c("<160", ">=160"))
str( aric_baseline_n_v05$ldl_base_nomal_vs_high) 

# Low HDL: 
aric_baseline_n_v05$hdl_base_low <- ifelse( (aric_baseline_n_v05$hdl_base>=40 & aric_baseline_n_v05$Gender=="M") | 
                                           (aric_baseline_n_v05$hdl_base>=50 & aric_baseline_n_v05$Gender=="F"), 0, 
                                                     ifelse( (aric_baseline_n_v05$hdl_base<40 & aric_baseline_n_v05$Gender=="M") | 
                                           (aric_baseline_n_v05$hdl_base<50 & aric_baseline_n_v05$Gender=="F"), 1, NA) )
table(aric_baseline_n_v05$hdl_base_low, exclude= NULL)

In [None]:
# Dyslipidemia: 
# LDL-C>=160
# total Chol>=240
# Triglyceride >=200
# HDL-C<40 in Men and <50 in Women
# or use of Statin
table( (aric_baseline_n_v05$ldl_base>=160 & aric_baseline_n_v05$chol_base>=240 & aric_baseline_n_v05$tg_base>=200) & ( (aric_baseline_n_v05$Gender=="M" & aric_baseline_n_v05$hdl_base<40) | (aric_baseline_n_v05$Gender=="F" & aric_baseline_n_v05$hdl_base<50) ) | aric_baseline_n_v05$statin_base==1)

aric_baseline_n_v05$Dyslipidemia <- ifelse((aric_baseline_n_v05$ldl_base>=160 & 
                                            aric_baseline_n_v05$chol_base>=240 & 
                                            aric_baseline_n_v05$tg_base>=200) & 
                                           ( (aric_baseline_n_v05$Gender=="M" & 
                                              aric_baseline_n_v05$hdl_base<40) | 
                                            (aric_baseline_n_v05$Gender=="F" & 
                                             aric_baseline_n_v05$hdl_base<50) ) | 
                                           aric_baseline_n_v05$statin_base==1, 1, 0)

table(aric_baseline_n_v05$Dyslipidemia, exclude=NULL)

In [None]:
ncol(aric_baseline_n_v05)
names(aric_baseline_n_v05)

In [None]:
summary(aric_baseline_n_v05[,c(114:125)])

In [None]:
## 
# fwrite(aric_baseline_n_v05, "../pheno/aric_baseline_n_v05_N4187.pheno_ch_status.noHemeCA.correct_lipids.Jun3May2023.csv", 
 #  row.names = F, col.names = T, sep=",")

In [None]:
aric_baseline_n_v05 <- fread("../pheno/aric_baseline_n_v05_N4187.pheno_ch_status.noHemeCA.correct_lipids.Jun3May2023.csv", header=T)

nrow(aric_baseline_n_v05)

ncol(aric_baseline_n_v05)
ls()
names(aric_baseline_n_v05)

In [None]:
summary(aric_baseline_n_v05[,c(114:125)])

In [None]:
### Exclude Prev. CH 
aric_baseline_n_v05 <- subset(aric_baseline_n_v05, !is.na(aric_baseline_n_v05$incident_CH))
nrow(aric_baseline_n_v05)

In [None]:
## Scale
### 
  ### source:  https://www.biostars.org/p/80597/ and the supplement of Yang et al. Nature 2012.
INT_yang2012 <- function(x){
  y<-qnorm((rank(x,na.last='keep')-0.5)/sum(!is.na(x)))
  return(y)
}
    # INT
aric_baseline_n_v05$chol_base_INT <- INT_yang2012(aric_baseline_n_v05$chol_base)

aric_baseline_n_v05$ldl_base_INT <- INT_yang2012(aric_baseline_n_v05$ldl_base)

aric_baseline_n_v05$hdl_base_INT <- INT_yang2012(aric_baseline_n_v05$hdl_base)

aric_baseline_n_v05$tg_base_INT <- INT_yang2012(aric_baseline_n_v05$tg_base)

aric_baseline_n_v05$nonHDL_base <- (aric_baseline_n_v05$chol_base - aric_baseline_n_v05$hdl_base)

aric_baseline_n_v05$nonHDL_base_INT <- INT_yang2012(aric_baseline_n_v05$nonHDL_base)

aric_baseline_n_v05$bmi_base_INT <- INT_yang2012(aric_baseline_n_v05$bmi_base)

    # TG/HDL-C
aric_baseline_n_v05$tg_to_hdl_base <- (aric_baseline_n_v05$tg_base/aric_baseline_n_v05$hdl_base)

aric_baseline_n_v05$tg_to_hdl_base_INT <- INT_yang2012(aric_baseline_n_v05$tg_to_hdl_base)


In [None]:
summary(aric_baseline_n_v05[,c(114:125)])

In [None]:
head(aric_baseline_n_v05)
table(aric_baseline_n_v05$ldl_base_nomal_vs_high)
table(aric_baseline_n_v05$Dyslipidemia)

table(aric_baseline_n_v05$ldl_base_nomal_vs_high, aric_baseline_n_v05$Dyslipidemia)
summary(aric_baseline_n_v05$tg_to_hdl_base)
summary(aric_baseline_n_v05$tg_to_hdl_base_INT)

In [None]:
nrow(aric_baseline_n_v05)
table(aric_baseline_n_v05$incident_CH, exclude = NULL)

In [None]:
## Save dataframe used in the final glm analysis
# fwrite(aric_baseline_n_v05, "../pheno/aric_baseline_n_v05_N3730.pheno_ch_status.noHemeCA.correct_lipids.FinalDataset_4_glm.July132023.csv", 
  # row.names = F, col.names = T, sep=",")


In [None]:
plot(density(aric_baseline_n_v05$ldl_base, na.rm = T))
plot(density(aric_baseline_n_v05$hdl_base, na.rm = T))
plot(density(aric_baseline_n_v05$chol_base, na.rm = T))
plot(density(aric_baseline_n_v05$tg_base, na.rm = T))
plot(density(aric_baseline_n_v05$tg_to_hdl_base, na.rm = T))

In [None]:
plot(density(aric_baseline_n_v05$ldl_base_INT, na.rm = T))
plot(density(aric_baseline_n_v05$hdl_base_INT, na.rm = T))
plot(density(aric_baseline_n_v05$chol_base_INT, na.rm = T))
plot(density(aric_baseline_n_v05$tg_base_INT, na.rm = T))
plot(density(aric_baseline_n_v05$tg_to_hdl_base_INT, na.rm = T))

In [None]:
## Regression
summary(aric_baseline_n_v05 %>% 
          glm(incident_CH ~ 
                ever_smoke + bmi_base_INT + age_base + Sex + race_BW +  
                nonHDL_base_INT + hdl_base_INT + 
                dm_126_base + htn_5_base + chd_is_base +  
                chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))


In [None]:
summary(aric_baseline_n_v05 %>% 
          glm(incident_TET2 ~ 
                Dyslipidemia + ever_smoke + bmi_base_INT + age_base + Sex + race_BW +  
                 nonHDL_base_INT + hdl_base_INT + 
                dm_126_base + htn_5_base + chd_is_base +  
                chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))


In [None]:
table(aric_baseline_n_v05$ldl_base_nomal_vs_high==1, aric_baseline_n_v05$incident_CH==1)
table(aric_baseline_n_v05$ldl_base_nomal_vs_high==1, aric_baseline_n_v05$incident_TET2==1)
table(aric_baseline_n_v05$ldl_base_nomal_vs_high==1, aric_baseline_n_v05$incident_DNMT3A==1)

In [None]:
table(aric_baseline_n_v05$Dyslipidemia==1, aric_baseline_n_v05$incident_CH==1)
table(aric_baseline_n_v05$Dyslipidemia==1, aric_baseline_n_v05$incident_TET2==1)
table(aric_baseline_n_v05$Dyslipidemia==1, aric_baseline_n_v05$incident_ASXL1==1)
table(aric_baseline_n_v05$Dyslipidemia==1, aric_baseline_n_v05$incident_DNMT3A==1)

In [None]:
summary(aric_baseline_n_v05 %>% 
          glm(incident_CH ~ 
                tg_to_hdl_base + ever_smoke + bmi_base_INT + age_base + Sex + race_BW +  
            hdl_base_INT + nonHDL_base_INT+
                dm_126_base + htn_5_base + chd_is_base +  
                chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))

In [None]:
summary(aric_baseline_n_v05 %>% 
          glm(incident_TET2 ~ 
                (ldl_base_nomal_vs_high) + hdl_base_low + ever_smoke + bmi_base_INT + (age_base) + Sex + race_BW +   
                dm_126_base + htn_5_base + chd_is_base +  
                chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))

In [None]:
names(aric_baseline_n_v05)

## Un-adjusted model: GLM

In [None]:
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.univariable.incident_ch.2023Jul07.csv", append = F, fill = T)

In [None]:
exposures <- c("age_base",  "bmi_base_INT",   
               "chol_base_INT", "ldl_base_INT",
               "hdl_base_INT", "tg_base_INT",
               "nonHDL_base_INT", "tg_to_hdl_base_INT",
               "ldl_base_nomal_vs_high", "Dyslipidemia",
               "hdl_base_low",
               "Sex", "race_BW", "ever_smoke", 
               "dm_126_base", "htn_5_base", 
               "chd_is_base")

ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

##
for(i in exposures){
  
  for (j in ch_phenotype){
    cat("outcome:",j," exposure:", i,"\n")
    # remove NA
    model1 <- summary(aric_baseline_n_v05 %>% filter(!is.na(get(i)) & !is.na(get(j))) %>%
                        glm(get(j) ~  get(i), 
                            data = ., family = "binomial"))
      
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Univariable", paste0(j), paste0(i), 
        model1$coefficients[2,1:4]) ) ), 
      file = "final_glm.univariable.incident_ch.2023Jul07.csv", append = T, fill = T)
    
  }
}

In [None]:
sd(aric_baseline_n_v05$ldl_base_INT, na.rm = T)
sd(aric_baseline_n_v05$tg_base_INT, na.rm = T)
sd(aric_baseline_n_v05$hdl_base_INT, na.rm = T)
sd(aric_baseline_n_v05$chol_base_INT, na.rm = T)
sd(aric_baseline_n_v05$tg_to_hdl_base_INT, na.rm = T)
plot(density(aric_baseline_n_v05$nonHDL_base_INT, na.rm = T), main="non-HDL-C")
plot(density(aric_baseline_n_v05$hdl_base_INT, na.rm = T), main="HDL-C")
plot(density(aric_baseline_n_v05$ldl_base_INT, na.rm = T), main="LDL-C")
plot(density(aric_baseline_n_v05$tg_base_INT, na.rm = T), main="TG-C")
plot(density(aric_baseline_n_v05$tg_to_hdl_base_INT, na.rm = T), main="TG-to-HDL-C")

## Adjusted model:
### all exposures:
#### adjusted for age, Sex, Race, Smoking, bmi, ldl-c, hdl-c, t2d, htn, ascvd, chol_med, batch(visit,center)

In [None]:
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable.incident_ch.2023Jul07.csv", append = F, fill = T)

In [None]:
# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base")

for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
      
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
      
    model3 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(get(j))) %>% 
                        glm(get(j) ~ 
                            age_base + Sex + race_BW + 
                            ever_smoke + bmi_base_INT + 
                            nonHDL_base_INT + hdl_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
      
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model3$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_ch.2023Jul07.csv", 
      append = T, fill = T)
      
  }
}


In [None]:
table(aric_baseline_n_v05$BMI_cat)
aric_baseline_n_v05$BMI_cat <- factor(aric_baseline_n_v05$BMI_cat, levels = c("<=25", "25-30", ">30"))
table(aric_baseline_n_v05$cig_base)

In [None]:
 # smoking: never=3, former=2, current=1 
# aric_baseline_n_v05$cig_base_fact <- factor(aric_baseline_n_v05$cig_base, 
  #                                          levels = c(3,2,1))

aric_baseline_n_v05$Smoking_cat <- ifelse(aric_baseline_n_v05$cig_base==1,"Current smoker", 
                                          ifelse(aric_baseline_n_v05$cig_base==2, "Former smoker",
                                                 ifelse(aric_baseline_n_v05$cig_base==3,"Never smoker", NA))) 
table(aric_baseline_n_v05$Smoking_cat, exclude = NULL)

aric_baseline_n_v05$Smoking_cat_notordered <- factor(aric_baseline_n_v05$Smoking_cat, 
                                            levels = c("Never smoker", "Former smoker", "Current smoker"), 
                                          ordered =F)

# aric_baseline_n_v05$Smoking_cat_ordered <- factor(aric_baseline_n_v05$Smoking_cat, 
  #                                          levels = c("Never smoker", "Former smoker", "Current smoker"), 
   #                                       ordered =T)

In [None]:
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable_smoking_bmi_cat.incident_ch.2023Jul07.csv", append = F, fill = T)

# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("Former_smoker", "Current_smoker", 
                    "BMI_25-30", "BMI_>30")

for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
      
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
      
    model_x <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(get(j))) %>% 
                        glm(get(j) ~ 
                            Smoking_cat_notordered + BMI_cat +
                            age_base + Sex + race_BW + 
                            nonHDL_base_INT + hdl_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
      
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model_x$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable_smoking_bmi_cat.incident_ch.2023Jul07.csv", 
      append = T, fill = T)
      
  }
}


In [None]:
summary(aric_baseline_n_v05 %>% 
          glm(incident_SF ~ Smoking_cat_notordered + BMI_cat +
                            age_base + Sex + race_BW + 
                            nonHDL_base_INT + hdl_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))

summary(aric_baseline_n_v05 %>% 
          glm(incident_SF ~ Smoking_cat_notordered + bmi_base_INT +
                            age_base + Sex + race_BW + 
                            nonHDL_base_INT + hdl_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))

summary(aric_baseline_n_v05 %>% 
          glm(incident_SF ~  ever_smoke + BMI_cat +
                            age_base + Sex + race_BW + 
                            nonHDL_base_INT + hdl_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))

#### Atherogenic lipids vs incident CH

In [None]:
summary(aric_baseline_n_v05 %>% 
          glm(incident_TET2 ~ tg_to_hdl_base_INT + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))

summary(aric_baseline_n_v05 %>% 
          glm(incident_TET2 ~ ldl_base_nomal_vs_high + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              chol_med_base + Center + v2_vs_other, 
              data = ., family="binomial"))


summary(aric_baseline_n_v05 %>% 
          glm(incident_TET2 ~ Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              Center + v2_vs_other, 
              data = ., family="binomial"))

In [None]:
ls()

In [None]:
exposures <- c("tg_to_hdl_base_INT",
               "ldl_base_nomal_vs_high", 
               "Dyslipidemia", "hdl_base_low")

ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable_atherogenic_lipid.incident_ch.2023Jul07.csv", append = F, fill = T)

for (j in ch_phenotype){
  for (k in exposures) {
      
    cat("outcome:",j," exposure:", k,"\n")
      
    model_athero <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(get(j))) %>% 
                        glm(get(j) ~ 
                            get(k) + age_base + Sex + race_BW + 
                            ever_smoke + bmi_base_INT + 
                             nonHDL_base_INT + hdl_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial")) # $coefficients[2,1:4])
      
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(k),
        model_athero$coefficients[1+1,1:4]) ) ),
      file = "final_glm.multivariable_atherogenic_lipid.incident_ch.2023Jul07.csv", 
       append = T, fill = T)
      cat("\n")
      
  }
}

## Smoking x Sex interaction

In [None]:
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

for(i in ch_phenotype){
    cat(i)
print(summary(aric_baseline_n_v05 %>% 
                        glm(get(i) ~ 
                            ever_smoke : Sex +  ever_smoke + Sex + 
                            bmi_base_INT + age_base + race_BW +  
                            hdl_base_INT + nonHDL_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))$coefficient[16,1:4])
cat ("\n")
}


for (j in ch_phenotype){
        
    cat("outcome:",j," exposure: sex_by_smoking","\n")
      
    model_sex_by_smoking <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(get(j))) %>% 
                        glm(get(j) ~ 
                           ever_smoke : Sex +  ever_smoke + Sex + 
                            bmi_base_INT + age_base + race_BW +  
                            hdl_base_INT + nonHDL_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial")) # $coefficients[16,1:4])
      
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), "sex_by_smoking",
        model_sex_by_smoking$coefficients[16,1:4]) ) ),
      file = "final_glm.multivariable_atherogenic_lipid.incident_ch.2023Jul07.csv", 
       append = T, fill = T)
      cat("\n")
      
  }



In [None]:
## w/o chol_med_base adjustment
exposures <- c("tg_to_hdl_base_INT",
               "ldl_base_nomal_vs_high", 
               "Dyslipidemia", "hdl_base_low")

ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable_atherogenic_lipid.incident_ch.nocholMed.2023Jul12.csv", append = F, fill = T)

for (j in ch_phenotype){
  for (k in exposures) {
      
    cat("outcome:",j," exposure:", k,"\n")
      
    model_athero <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(get(j))) %>% 
                        glm(get(j) ~ 
                            get(k) + age_base + Sex + race_BW + 
                            ever_smoke + bmi_base_INT + 
                             nonHDL_base_INT + hdl_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            Center + v2_vs_other, 
                            data = ., family="binomial")) # $coefficients[2,1:4])
      
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted_no_chol_med_base", paste0(j), paste0(k),
        model_athero$coefficients[1+1,1:4]) ) ),
      file = "final_glm.multivariable_atherogenic_lipid.incident_ch.nocholMed.2023Jul12.csv", 
       append = T, fill = T)
      cat("\n")
      
  }
}

In [None]:
## w/o chol_med_base adjustment, nonHDL_base_INT, hdl_base_INT
exposures <- c("tg_to_hdl_base_INT",
               "ldl_base_nomal_vs_high", 
               "Dyslipidemia", "hdl_base_low")

ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable_atherogenic_lipid.incident_ch.noChol_hdl_nonHdl.2023Jul12.csv", append = F, fill = T)

for (j in ch_phenotype){
  for (k in exposures) {
      
    cat("outcome:",j," exposure:", k,"\n")
      
    model_athero <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(get(j))) %>% 
                        glm(get(j) ~ 
                            get(k) + age_base + Sex + race_BW + 
                            ever_smoke + bmi_base_INT +  
                            dm_126_base + htn_5_base + chd_is_base +  
                            Center + v2_vs_other, 
                            data = ., family="binomial")) # $coefficients[2,1:4])
      
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted_no_chol_med_base_nonHDL_base_INT_hdl_base_INT", paste0(j), paste0(k),
        model_athero$coefficients[1+1,1:4]) ) ),
      file = "final_glm.multivariable_atherogenic_lipid.incident_ch.noChol_hdl_nonHdl.2023Jul12.csv", 
       append = T, fill = T)
      cat("\n")
      
  }
}

In [None]:
## Dyslipidemia

summary(aric_baseline_n_v05 %>% 
          glm(incident_TET2 ~ Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              Center + v2_vs_other, 
              data = ., family="binomial"))

## Dyslipidemia

summary(aric_baseline_n_v05 %>% 
          glm(incident_TET2 ~ Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base +   
              Center + v2_vs_other, 
              data = ., family="binomial"))


## Dyslipidemia

summary(aric_baseline_n_v05 %>% 
          glm(incident_CH ~ Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              Center + v2_vs_other, 
              data = ., family="binomial"))

## Dyslipidemia

summary(aric_baseline_n_v05 %>% 
          glm(incident_DNMT3A ~ Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              Center + v2_vs_other, 
              data = ., family="binomial"))

## Dyslipidemia

summary(aric_baseline_n_v05 %>% 
          glm(incident_ASXL1 ~ Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              Center + v2_vs_other, 
              data = ., family="binomial"))

## Dyslipidemia

summary(aric_baseline_n_v05 %>% 
          glm(incident_SF ~ Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              Center + v2_vs_other, 
              data = ., family="binomial"))

## Dyslipidemia

summary(aric_baseline_n_v05 %>% 
          glm(incident_DDR ~ Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              Center + v2_vs_other, 
              data = ., family="binomial"))


In [None]:
## Dyslipidemia
summary(aric_baseline_n_v05 %>% 
          glm(incident_DNMT3A ~ Sex:ever_smoke + Dyslipidemia + age_base + Sex + race_BW + 
              ever_smoke + bmi_base_INT + 
              dm_126_base + htn_5_base + chd_is_base +  
              Center + v2_vs_other, 
              data = ., family="binomial"))

In [None]:
summary(aric_baseline_n_v05 %>% 
                        glm(incident_DNMT3A ~ 
                            Smoking_cat_notordered : Sex +  Smoking_cat_notordered + Sex + 
                            bmi_base_INT + age_base + race_BW +  
                            hdl_base_INT + nonHDL_base_INT + 
                            dm_126_base + htn_5_base + chd_is_base +  
                            chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))

#####################################

# Sensitivity Analysis 

* Only keep incident CH where baseline position has DP>=20


In [None]:
library(data.table)
## Save dataframe used in the final glm analysis
aric_baseline_n_v05 <- fread("/medpop/esp2/mesbah/projects/ch_progression/aric/pheno/aric_baseline_n_v05_N3730.pheno_ch_status.noHemeCA.correct_lipids.FinalDataset_4_glm.July132023.csv", 
  header=T)
nrow(aric_baseline_n_v05)
table(aric_baseline_n_v05$incident_CH, exclude=NULL)

In [None]:
## Clone data
cln_grt.vaf2.DP20_base.corrected <- fread("/medpop/esp2/mesbah/projects/ch_progression/aric/pheno/cln_grt.vaf2.DP20_base.relaxd.modified_hiseq.29Nov2023.csv", header=T)
nrow(cln_grt.vaf2.DP20_base.corrected)
summary(cln_grt.vaf2.DP20_base.corrected$DP.v2)
summary(cln_grt.vaf2.DP20_base.corrected$VAF.v2)

In [None]:
## Overlap
table(aric_baseline_n_v05$incident_CH[aric_baseline_n_v05$ARIC_ID %in% cln_grt.vaf2.DP20_base.corrected$ARIC_ID],
      exclude = NULL)

In [None]:
table(aric_baseline_n_v05$incident_CH[aric_baseline_n_v05$ARIC_ID %in% 
                                      cln_grt.vaf2.DP20_base.corrected$ARIC_ID
                                      [round(cln_grt.vaf2.DP20_base.corrected$VAF.v2,2)>=0.02]],
      exclude = NULL)

table(aric_baseline_n_v05$incident_CH[aric_baseline_n_v05$ARIC_ID %in% 
                                      cln_grt.vaf2.DP20_base.corrected$ARIC_ID
                                      [round(cln_grt.vaf2.DP20_base.corrected$VAF.v2,2)>=0.01]],
      exclude = NULL)

table(aric_baseline_n_v05$incident_CH[aric_baseline_n_v05$ARIC_ID %in% 
                                      cln_grt.vaf2.DP20_base.corrected$ARIC_ID
                                      [round(cln_grt.vaf2.DP20_base.corrected$VAF.v2,2)<0.001]],
      exclude = NULL)

table(aric_baseline_n_v05$incident_CH[aric_baseline_n_v05$ARIC_ID %in% cln_grt.vaf2.DP20_base.corrected$ARIC_ID[round(cln_grt.vaf2.DP20_base.corrected$VAF.v5,2)>=0.02]],
      exclude = NULL)

table(aric_baseline_n_v05$incident_CH[aric_baseline_n_v05$ARIC_ID %in% cln_grt.vaf2.DP20_base.corrected$ARIC_ID])

In [None]:
table(aric_baseline_n_v05$ARIC_ID[aric_baseline_n_v05$incident_CH==0] %in% cln_grt.vaf2.DP20_base.corrected$ARIC_ID,
      exclude = NULL)

summary(aric_baseline_n_v05$Age - aric_baseline_n_v05$age_base)
aric_baseline_n_v05$Time_Followup <- aric_baseline_n_v05$Age - aric_baseline_n_v05$age_base

## Analysis 1: with baseline DP>=20

In [None]:

aric_baseline_n_v05$incident_CH_DPbase20 <- ifelse(aric_baseline_n_v05$incident_CH==1 & 
                                                   aric_baseline_n_v05$ARIC_ID %in% 
                                                   cln_grt.vaf2.DP20_base.corrected$ARIC_ID,
                                                   1,
                                                   ifelse(aric_baseline_n_v05$incident_CH==0,
                                                          0,NA))

table(aric_baseline_n_v05$incident_CH_DPbase20, exclude = NULL)


In [None]:
table(aric_baseline_n_v05$incident_CH, aric_baseline_n_v05$incident_CH_DPbase20, exclude = NULL)
table(aric_baseline_n_v05$incident_ASXL1, aric_baseline_n_v05$incident_CH_DPbase20, exclude = NULL)

In [None]:
#### Univariable
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.univariable.incident_CH_DPbase20.2023Nov30.csv", append = F, fill = T)

exposures <- c("age_base",  "bmi_base_INT",   
               "chol_base_INT", "ldl_base_INT",
               "hdl_base_INT", "tg_base_INT",
               "nonHDL_base_INT", "tg_to_hdl_base_INT",
               "ldl_base_nomal_vs_high", "Dyslipidemia",
               "hdl_base_low",
               "Sex", "race_BW", "ever_smoke", 
               "dm_126_base", "htn_5_base", 
               "chd_is_base", "Time_Followup")

ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

##
for(i in exposures){
  
  for (j in ch_phenotype){
    cat("outcome:",j," exposure:", i,"\n")
    # remove NA
    model1 <- summary(aric_baseline_n_v05 %>% filter(!is.na(incident_CH_DPbase20) & 
                                                     !is.na(get(i)) & 
                                                     !is.na(get(j))) %>%
                        glm(get(j) ~  get(i), 
                            data = ., family = "binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Univariable", paste0(j), paste0(i), 
        model1$coefficients[2,1:4]) ) ), 
      file = "final_glm.univariable.incident_CH_DPbase20.2023Nov30.csv", append = T, fill = T)
    
  }
}

## Multivariable
## adjusted for age, Sex, Race, Smoking, bmi, ldl-c, hdl-c, t2d, htn, ascvd, chol_med, batch(visit,center)
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable.incident_CH_DPbase20.2023Nov30.csv", append = F, fill = T)

# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base")

for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
    
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
    
    model3 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(incident_CH_DPbase20) & 
                               !is.na(get(j))) %>% 
                        glm(get(j) ~ 
                              age_base + Sex + race_BW + 
                              ever_smoke + bmi_base_INT + 
                              nonHDL_base_INT + hdl_base_INT + 
                              dm_126_base + htn_5_base + chd_is_base +  
                              chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model3$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_CH_DPbase20.2023Nov30.csv", 
      append = T, fill = T)
    
  }
}


In [None]:
# Adjust for Follow-up time
# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base", "Time_Followup")
## 
for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
    
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
    
    model4 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(incident_CH_DPbase20) & 
                               !is.na(get(j))) %>% 
                        glm(get(j) ~ 
                              age_base + Sex + race_BW + 
                              ever_smoke + bmi_base_INT + 
                              nonHDL_base_INT + hdl_base_INT + 
                              dm_126_base + htn_5_base + chd_is_base + 
                            Time_Followup +
                              chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model4$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_CH_DPbase20.2023Nov30.csv", 
      append = T, fill = T)
    
  }
}

## Analysis 2: DP>=20 and Baseline Clone VAF <0.1% (i.e. <0.001)


In [None]:
aric_baseline_n_v05$incident_CH_DPbase20VAFbase001 <- ifelse(aric_baseline_n_v05$incident_CH==1 & 
                                                   aric_baseline_n_v05$ARIC_ID %in% 
                                                   cln_grt.vaf2.DP20_base.corrected$ARIC_ID
                                                             [round(cln_grt.vaf2.DP20_base.corrected$VAF.v2,2)
                                                              <0.001],
                                                   1,
                                                   ifelse(aric_baseline_n_v05$incident_CH==0,
                                                          0,NA))

table(aric_baseline_n_v05$incident_CH_DPbase20VAFbase001, exclude = NULL)

In [None]:
#### Univariable
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.univariable.incident_CH_DPbase20VAFbase001.2023Nov30.csv", append = F, fill = T)

exposures <- c("age_base",  "bmi_base_INT",   
               "chol_base_INT", "ldl_base_INT",
               "hdl_base_INT", "tg_base_INT",
               "nonHDL_base_INT", "tg_to_hdl_base_INT",
               "ldl_base_nomal_vs_high", "Dyslipidemia",
               "hdl_base_low",
               "Sex", "race_BW", "ever_smoke", 
               "dm_126_base", "htn_5_base", 
               "chd_is_base","Time_Followup")

ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

##
for(i in exposures){
  
  for (j in ch_phenotype){
    cat("outcome:",j," exposure:", i,"\n")
    # remove NA
    model1 <- summary(aric_baseline_n_v05 %>% filter(!is.na(incident_CH_DPbase20VAFbase001) & 
                                                     !is.na(get(i)) & 
                                                     !is.na(get(j))) %>%
                        glm(get(j) ~  get(i), 
                            data = ., family = "binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Univariable", paste0(j), paste0(i), 
        model1$coefficients[2,1:4]) ) ), 
      file = "final_glm.univariable.incident_CH_DPbase20VAFbase001.2023Nov30.csv", append = T, fill = T)
    
  }
}

## Multivariable
## adjusted for age, Sex, Race, Smoking, bmi, ldl-c, hdl-c, t2d, htn, ascvd, chol_med, batch(visit,center)
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.2023Nov30.csv", append = F, fill = T)

# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base")

for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
    
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
    
    model3 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(incident_CH_DPbase20VAFbase001) & 
                               !is.na(get(j))) %>% 
                        glm(get(j) ~ 
                              age_base + Sex + race_BW + 
                              ever_smoke + bmi_base_INT + 
                              nonHDL_base_INT + hdl_base_INT + 
                              dm_126_base + htn_5_base + chd_is_base +  
                              chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model3$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.2023Nov30.csv", 
      append = T, fill = T)
    
  }
}

## 
# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base", "Time_Followup")
## 
for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
    
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
    
    model4 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(incident_CH_DPbase20VAFbase001) & 
                               !is.na(get(j))) %>% 
                        glm(get(j) ~ 
                              age_base + Sex + race_BW + 
                              ever_smoke + bmi_base_INT + 
                              nonHDL_base_INT + hdl_base_INT + 
                              dm_126_base + htn_5_base + chd_is_base + 
                            Time_Followup +
                              chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model4$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.2023Nov30.csv", 
      append = T, fill = T)
    
  }
}

In [None]:
cln_grt.vaf2.DP20_base.indelAD5FRRR2.corrected <- fread("../pheno/cln_grt.vaf2.DP20_base.indelAD5FRRR2.modified_hiseq.mild.29Nov2023.csv", header=T)
nrow(cln_grt.vaf2.DP20_base.indelAD5FRRR2.corrected)

cln_grt.vaf2.DP20_base_allAD5FRRR2.corrected <- fread("../pheno/cln_grt.vaf2.DP20_base_allAD5FRRR2.modified_hiseq.stringent.29Nov2023.csv", header=T)
nrow(cln_grt.vaf2.DP20_base_allAD5FRRR2.corrected)


In [None]:
aric_baseline_n_v05$incident_CH_DPbase20VAFbase001.indelAD5FRRR2 <- ifelse(aric_baseline_n_v05$incident_CH==1 & 
                                                   aric_baseline_n_v05$ARIC_ID %in% 
                                                   cln_grt.vaf2.DP20_base.indelAD5FRRR2.corrected$ARIC_ID
                                                             [round(cln_grt.vaf2.DP20_base.indelAD5FRRR2.corrected$VAF.v2,2)
                                                              <0.001],
                                                   1,
                                                   ifelse(aric_baseline_n_v05$incident_CH==0,
                                                          0,NA))

table(aric_baseline_n_v05$incident_CH_DPbase20VAFbase001.indelAD5FRRR2, exclude = NULL)

In [None]:
#### Univariable
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.univariable.incident_CH_DPbase20VAFbase001.indelAD5FRRR2.2023Nov30.csv", append = F, fill = T)

exposures <- c("age_base",  "bmi_base_INT",   
               "chol_base_INT", "ldl_base_INT",
               "hdl_base_INT", "tg_base_INT",
               "nonHDL_base_INT", "tg_to_hdl_base_INT",
               "ldl_base_nomal_vs_high", "Dyslipidemia",
               "hdl_base_low",
               "Sex", "race_BW", "ever_smoke", 
               "dm_126_base", "htn_5_base", 
               "chd_is_base","Time_Followup")

ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

##
for(i in exposures){
  
  for (j in ch_phenotype){
    cat("outcome:",j," exposure:", i,"\n")
    # remove NA
    model1 <- summary(aric_baseline_n_v05 %>% filter(!is.na(incident_CH_DPbase20VAFbase001.indelAD5FRRR2) & 
                                                     !is.na(get(i)) & 
                                                     !is.na(get(j))) %>%
                        glm(get(j) ~  get(i), 
                            data = ., family = "binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Univariable", paste0(j), paste0(i), 
        model1$coefficients[2,1:4]) ) ), 
      file = "final_glm.univariable.incident_CH_DPbase20VAFbase001.indelAD5FRRR2.2023Nov30.csv", append = T, fill = T)
    
  }
}

## Multivariable
## adjusted for age, Sex, Race, Smoking, bmi, ldl-c, hdl-c, t2d, htn, ascvd, chol_med, batch(visit,center)
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.indelAD5FRRR2.2023Nov30.csv", append = F, fill = T)

# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base")

for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
    
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
    
    model3 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(incident_CH_DPbase20VAFbase001.indelAD5FRRR2) & 
                               !is.na(get(j))) %>% 
                        glm(get(j) ~ 
                              age_base + Sex + race_BW + 
                              ever_smoke + bmi_base_INT + 
                              nonHDL_base_INT + hdl_base_INT + 
                              dm_126_base + htn_5_base + chd_is_base +  
                              chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model3$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.indelAD5FRRR2.2023Nov30.csv", 
      append = T, fill = T)
    
  }
}

## 
# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base", "Time_Followup")
## 
for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
    
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
    
    model4 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(incident_CH_DPbase20VAFbase001.indelAD5FRRR2) & 
                               !is.na(get(j))) %>% 
                        glm(get(j) ~ 
                              age_base + Sex + race_BW + 
                              ever_smoke + bmi_base_INT + 
                              nonHDL_base_INT + hdl_base_INT + 
                              dm_126_base + htn_5_base + chd_is_base + 
                            Time_Followup +
                              chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model4$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.indelAD5FRRR2.2023Nov30.csv", 
      append = T, fill = T)
    
  }
}

In [None]:
aric_baseline_n_v05$incident_CH_DPbase20VAFbase001.allAD5FRRR2 <- ifelse(aric_baseline_n_v05$incident_CH==1 & 
                                                   aric_baseline_n_v05$ARIC_ID %in% 
                                                   cln_grt.vaf2.DP20_base_allAD5FRRR2.corrected$ARIC_ID
                                                             [round(cln_grt.vaf2.DP20_base_allAD5FRRR2.corrected$VAF.v2,2)
                                                              <0.001],
                                                   1,
                                                   ifelse(aric_baseline_n_v05$incident_CH==0,
                                                          0,NA))

table(aric_baseline_n_v05$incident_CH_DPbase20VAFbase001.allAD5FRRR2, exclude = NULL)

In [None]:
#### Univariable
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.univariable.incident_CH_DPbase20VAFbase001.allAD5FRRR2.2023Nov30.csv", append = F, fill = T)

exposures <- c("age_base",  "bmi_base_INT",   
               "chol_base_INT", "ldl_base_INT",
               "hdl_base_INT", "tg_base_INT",
               "nonHDL_base_INT", "tg_to_hdl_base_INT",
               "ldl_base_nomal_vs_high", "Dyslipidemia",
               "hdl_base_low",
               "Sex", "race_BW", "ever_smoke", 
               "dm_126_base", "htn_5_base", 
               "chd_is_base","Time_Followup")

ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

##
for(i in exposures){
  
  for (j in ch_phenotype){
    cat("outcome:",j," exposure:", i,"\n")
    # remove NA
    model1 <- summary(aric_baseline_n_v05 %>% filter(!is.na(incident_CH_DPbase20VAFbase001.allAD5FRRR2) & 
                                                     !is.na(get(i)) & 
                                                     !is.na(get(j))) %>%
                        glm(get(j) ~  get(i), 
                            data = ., family = "binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Univariable", paste0(j), paste0(i), 
        model1$coefficients[2,1:4]) ) ), 
      file = "final_glm.univariable.incident_CH_DPbase20VAFbase001.allAD5FRRR2.2023Nov30.csv", append = T, fill = T)
    
  }
}

## Multivariable
## adjusted for age, Sex, Race, Smoking, bmi, ldl-c, hdl-c, t2d, htn, ascvd, chol_med, batch(visit,center)
cat(gsub(pattern = ", ", replacement = ",", x = toString(
  c("Dataset","Outcome", "Exposure","Beta", "SE", "t-stat", "P"))),
  file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.allAD5FRRR2.2023Nov30.csv", append = F, fill = T)

# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base")

for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
    
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
    
    model3 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(incident_CH_DPbase20VAFbase001.allAD5FRRR2) & 
                               !is.na(get(j))) %>% 
                        glm(get(j) ~ 
                              age_base + Sex + race_BW + 
                              ever_smoke + bmi_base_INT + 
                              nonHDL_base_INT + hdl_base_INT + 
                              dm_126_base + htn_5_base + chd_is_base +  
                              chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model3$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.allAD5FRRR2.2023Nov30.csv", 
      append = T, fill = T)
    
  }
}

## 
# Outcomes
ch_phenotype <- c("incident_CH", 
                  "incident_DNMT3A",
                  "incident_TET2",
                  "incident_ASXL1",
                  "incident_SF",
                  "incident_DDR")

# Exposures
test_exposures <- c("age_base", "Sex", "race_BW", 
                    "ever_smoke", "bmi_base_INT", 
                    "nonHDL_base_INT", "hdl_base_INT", 
                    "dm_126_base", "htn_5_base", 
                    "chd_is_base", "Time_Followup")
## 
for (j in ch_phenotype){
  for (k in 1:length(test_exposures)) {
    
    cat("outcome:",j," exposure:", test_exposures[k],"\n")
    
    model4 <- summary(aric_baseline_n_v05 %>% 
                        filter(!is.na(incident_CH_DPbase20VAFbase001.allAD5FRRR2) & 
                               !is.na(get(j))) %>% 
                        glm(get(j) ~ 
                              age_base + Sex + race_BW + 
                              ever_smoke + bmi_base_INT + 
                              nonHDL_base_INT + hdl_base_INT + 
                              dm_126_base + htn_5_base + chd_is_base + 
                            Time_Followup +
                              chol_med_base + Center + v2_vs_other, 
                            data = ., family="binomial"))
    
    cat( gsub(pattern = ", ", replacement = ",", x = toString(
      c("Adjusted", paste0(j), paste0(test_exposures[k]),
        model4$coefficients[k+1,1:4]) ) ),
      file = "final_glm.multivariable.incident_CH_DPbase20VAFbase001.allAD5FRRR2.2023Nov30.csv", 
      append = T, fill = T)
    
  }
}

In [None]:
#### Concordance of effects
library(data.table)
setwd("/medpop/esp2/mesbah/projects//ch_progression/aric/epi/")

In [None]:
uni_ch1 <- fread("final_glm.univariable.incident_ch.2023Jul07.csv", header=T)
uni_ch1$out_expo <- paste(uni_ch1$Dataset, uni_ch1$Outcome, uni_ch1$Exposure, sep="_")
# format 
uni_ch1$P_val <- formatC(x = uni_ch1$P, digits = 1,format = "E")

# OR
uni_ch1$OR <- as.numeric(formatC(round(exp(uni_ch1$Beta),2), digits = 2, format = "f"))

uni_ch1$lSE <- ( uni_ch1$Beta - 1.96 * uni_ch1$SE)
uni_ch1$uSE <- ( uni_ch1$Beta + 1.96 * uni_ch1$SE)

# 95% CI
uni_ch1$CI95 <- paste0("[",formatC(round(exp( uni_ch1$Beta - 1.96 * uni_ch1$SE),2), digits = 2, format = "f"),
                                     ", ",
                                     formatC(round(exp( uni_ch1$Beta + 1.96 * uni_ch1$SE),2), digits = 2, format = "f"), 
                                     "]")

head(uni_ch1)

In [None]:
# ch2
uni_chDPbase20VAFbase001 <- fread("final_glm.univariable.incident_CH_DPbase20VAFbase001.2023Nov30.csv", header=T)
uni_chDPbase20VAFbase001$out_expo <- paste(uni_chDPbase20VAFbase001$Dataset,
                                           uni_chDPbase20VAFbase001$Outcome, 
                                           uni_chDPbase20VAFbase001$Exposure, sep="_")
# format 
uni_chDPbase20VAFbase001$P_val <- formatC(x = uni_chDPbase20VAFbase001$P, digits = 1,format = "E")

# OR
uni_chDPbase20VAFbase001$OR <- as.numeric(formatC(round(exp(uni_chDPbase20VAFbase001$Beta),2), digits = 2, format = "f"))

uni_chDPbase20VAFbase001$lSE <- ( uni_chDPbase20VAFbase001$Beta - 1.96 * uni_chDPbase20VAFbase001$SE)
uni_chDPbase20VAFbase001$uSE <- ( uni_chDPbase20VAFbase001$Beta + 1.96 * uni_chDPbase20VAFbase001$SE)

# 95% CI
uni_chDPbase20VAFbase001$CI95 <- paste0("[",formatC(round(exp( uni_chDPbase20VAFbase001$Beta - 1.96 * uni_chDPbase20VAFbase001$SE),2), digits = 2, format = "f"),
                                     ", ",
                                     formatC(round(exp( uni_chDPbase20VAFbase001$Beta + 1.96 * uni_chDPbase20VAFbase001$SE),2), digits = 2, format = "f"), 
                                     "]")

head(uni_chDPbase20VAFbase001)
#
# ch3
uni_challAD5FRRR2 <- fread("final_glm.univariable.incident_CH_DPbase20VAFbase001.allAD5FRRR2.2023Nov30.csv", header=T)
uni_challAD5FRRR2$out_expo <- paste(uni_challAD5FRRR2$Dataset, uni_challAD5FRRR2$Outcome, uni_challAD5FRRR2$Exposure, sep="_")
# format 
uni_challAD5FRRR2$P_val <- as.numeric(formatC(x = uni_challAD5FRRR2$P, digits = 1,format = "E"))

# OR
uni_challAD5FRRR2$OR <- formatC(round(exp(uni_challAD5FRRR2$Beta),2), digits = 2, format = "f")

uni_challAD5FRRR2$lSE <- ( uni_challAD5FRRR2$Beta - 1.96 * uni_challAD5FRRR2$SE)
uni_challAD5FRRR2$uSE <- ( uni_challAD5FRRR2$Beta + 1.96 * uni_challAD5FRRR2$SE)

# 95% CI
uni_challAD5FRRR2$CI95 <- paste0("[",formatC(round(exp( uni_challAD5FRRR2$Beta - 1.96 * uni_challAD5FRRR2$SE),2), digits = 2, format = "f"),
                                     ", ",
                                     formatC(round(exp( uni_challAD5FRRR2$Beta + 1.96 * uni_challAD5FRRR2$SE),2), digits = 2, format = "f"), 
                                     "]")

head(uni_challAD5FRRR2)

In [None]:
univariable_inc_CH <- merge(uni_ch1, 
                            uni_chDPbase20VAFbase001[,c(4:13)], 
                            by="out_expo")

##
univariable_inc_CH <- merge(univariable_inc_CH, 
                            uni_challAD5FRRR2[,c(4:13)], 
                            by="out_expo")

str(univariable_inc_CH)

In [None]:
plot(univariable_inc_CH$OR, univariable_inc_CH$OR.x)
plot(univariable_inc_CH$OR, univariable_inc_CH$OR.y)