This notebook goes over the code needed to reproduce the results from Table 4, using WiSER with the Action to Control Cardiovascular Disease data, investigating factors related to glycemic levels and glycemic variability with fasting plasma glucose levels as the outcome.

## Availability & Description

Due to confidentiality concerns, access to the ACCORD dataset is only available through the National Institutes of Health's (NIH) Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC). Researchers can apply for acceess to download this dataset through BioLINCC. 

The URL for the webpage is https://biolincc.nhlbi.nih.gov/studies/accord/ and the Study Accession identifier is HLB01041317a. This page includes a description of the dataset, study, and details on how to request access to the data. We cannot give more details on the data due to BioLINCC's data use agreement. 


This notebook goes over code, that when used with BioLINCC's ACCORD data, can reproduce results in the paper (Table 3).

In [3]:
versioninfo()

Julia Version 1.4.0
Commit b8e9a9ecc6 (2020-03-21 16:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9920X CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8


## Data Cleaning

The majority of data cleaning was done in R using the provided `accordScript.Rmd` R markdown file. Below is the remaining done, mostly for quality control, done in Julia.

In [None]:
# load packages & data
using DataFrames, CSV, WiSER, KNITRO, StatsBase
ENV["COLUMNS"]=1200

df = DataFrame!(CSV.File("AccordWiSER_oralmeds.csv", missingstring="NA"))

We want to look at g2avetid (total insulin), but some values are missing. To decide if we can fill them with 0, we look at where the prescribed insulin is 0 or missing and see what those values are. If they are 0 or missing, then correct g2avetid should be 0. If it's higher then let's throw them out just in case it is error prone. 

In [None]:
missing_g2avetid = findall(ismissing.(df.g2avetid))
deleteids = map(x -> ismissing(x) ? false : x .== 0 ? false : true, df[missing_g2avetid, :g2prscbin])
deleterows = missing_g2avetid[deleteids] #31 possible errors, so we delete them.
delete!(df, deleterows)

In [None]:
#change to months and remove exit visits 
visitnumber = Dict("BLR" => 0,
                "F01" => 1,
                 "F04" => 4,
                 "F08" => 8,
                 "F12" => 12,
                 "F16" => 16,
                 "F20" => 20,
                 "F24" => 24,
                 "F28" => 28,
                 "F32" => 32,
                 "F36" => 36,
                 "F40" => 40,
                 "F44" => 44,
                 "F48" => 48,
                 "F52" => 52,
                 "F56" => 56,
                 "F60" => 60,
                 "F64" => 64,
                 "F68" => 68,
                 "F72" => 72,
                 "F76" => 76,
                 "F80" => 80,
                 "F84" => 84,
                 "EXIT" => missing
)


fpg_analysisvars = [:MaskID, :Visit, :VisitNumber, :fpg, :female, :baseline_age,
    :raceclass, :bmi, :std_bmi, :cvd_hx_baseline, :std_age,
    :std_visit, :g2avetid, :std_g2avetid, :g2aveba, :g2avebol, :g2avepba,
     :sulphonylureas, :alpha_glucosidase_inhibitors, :incretin_mimetics,
    :metformin, :meglitinides, :thiazolidinediones, :antihyperglycemics,
    :DPP4_inhibitors, :Insulin_BMI, :insulin_wtkg]


df[!, :raceclass] = levels!(CategoricalArray(df[!, :raceclass]),
    ["White"; "Black"; "Hispanic"; "Other"])

df.VisitNumber = map(x -> get(visitnumber, x, 0), df.Visit)
df.bmi = df.wt_kg ./ (df.ht_cm ./ 100).^2
standardizes(x) = (x .- mean(skipmissing(x))) ./ std(skipmissing(x))
df[!, :std_bmi] = standardizes(df[!, :bmi])
df[!, :std_age] = standardizes(df[!, :baseline_age])
df[!, :std_visit] = standardizes(df[!, :VisitNumber])

#set to 0 if missing since it should be 0. 
df.g2avetid = map(x -> ismissing(x) ? 0 : x, df.g2avetid)
df.g2aveba = map(x -> ismissing(x) ? 0 : x, df.g2aveba)
df.g2avebol = map(x -> ismissing(x) ? 0 : x, df.g2avebol)
df.g2avepba = map(x -> ismissing(x) ? 0 : x, df.g2avepba)

df[!, :std_g2avetid] = standardizes(df[!, :g2avetid])
df[!, :insulin_wtkg] = df[!, :g2avetid] ./ df.wt_kg
df[!, :Insulin_BMI] = df[!, :g2avetid] ./ df[!, :bmi]



fpg_df = select(df, fpg_analysisvars)
fpg_df = dropmissing(fpg_df, fpg_analysisvars)
CSV.write("accord_fpg_final_withmeds.csv", fpg_df) # optionally save this cleaned dataset

## Analysis

The following produces the results reported in Table 4, the analysis of ACCORD data using WiSER.

In [None]:
fpg_df = DataFrame!(CSV.File("accord_fpg_final_withmeds.csv"));


solver = KNITRO.KnitroSolver(outlev=0)
fpg_withmeds_insulinkg = WSVarLmmModel(@formula(fpg ~ 1 + VisitNumber + bmi + female + baseline_age + 
        raceclass + cvd_hx_baseline + insulin_wtkg +
        sulphonylureas +
        metformin + meglitinides + thiazolidinediones),
    @formula(fpg ~ 1 + VisitNumber), @formula(fpg ~ 1 + VisitNumber + bmi + female + baseline_age + 
         raceclass + cvd_hx_baseline + insulin_wtkg +
        sulphonylureas +
        metformin + meglitinides + thiazolidinediones), 
    :MaskID, fpg_df);
@time WiSER.fit!(fpg_withmeds_insulinkg, solver, parallel = false, runs = 8)

In [None]:
#replace names with more descriptive names

mean_names = ["β$i: " for i in 1:14] .* [
    "Intercept"
    "Visit Number"
    "BMI"
    "Female"
    "Baseline Age"
    "Race: Black"
    "Race: Hispanic"
    "Race: Other"
    "CVD History"
    "Total Injected Insulin (units/kg body weight)"
    "Sulphonylureas"
 "Metformin"
 "Meglitinides"
 "Thiazolidinediones"]
wsvar_names =  ["τ$i: " for i in 1:14] .* [
    "Intercept"
    "Visit Number"
    "BMI"
    "Female"
    "Baseline Age"
    "Race: Black"
    "Race: Hispanic"
    "Race: Other"
    "CVD History"
    "Total Injected Insulin (units/kg body weight)"
    "Sulphonylureas"
 "Metformin"
 "Meglitinides"
 "Thiazolidinediones"]

fpg_withmeds_insulinkg.meannames .= mean_names
fpg_withmeds_insulinkg.wsvarnames .= wsvar_names
fpg_withmeds_insulinkg

#### Supplementary Table S.4

The following obtains the results of summary statistics found in Supplementary Table S.4.

In [None]:
fpg_df = DataFrame!(CSV.File("accord_fpg_final_withmeds.csv"));
# descriptive summary statistics
describe(fpg_df, :mean, :std, :median, :min, :max, :nunique)

In [None]:
# age 
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :baseline_age => first)[!, 2]), 
std(combine(DataFrames.groupby(fpg_df, :MaskID), :baseline_age => first)[!, 2])

In [None]:
# bmi
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :bmi => mean)[!, 2]), 
std(combine(DataFrames.groupby(fpg_df, :MaskID), :bmi => mean)[!, 2])

In [None]:
# fasting plasma glucose 
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :fpg => mean)[!, 2]), 
std(combine(DataFrames.groupby(fpg_df, :MaskID), :fpg => mean)[!, 2]) #people w higher plasma came more often? 

In [None]:
# oral meds
countmap(fpg_df[!, :sulphonylureas]), proportionmap(fpg_df[!, :sulphonylureas])

In [None]:
countmap(fpg_df[!, :metformin]), proportionmap(fpg_df[!, :metformin])

In [None]:
countmap(fpg_df[!, :thiazolidinediones]), proportionmap(fpg_df[!, :thiazolidinediones])

In [None]:
countmap(fpg_df[!, :meglitinides]), proportionmap(fpg_df[!, :meglitinides])

In [None]:
# number of visits
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :VisitNumber => length)[!, 2]), std
(combine(DataFrames.groupby(fpg_df, :MaskID), :VisitNumber => length)[!, 2])

In [None]:
# summary stats for maximum number of months of treatment for each person
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :VisitNumber => maximum)[!, 2]),
std(combine(DataFrames.groupby(fpg_df, :MaskID), :VisitNumber => maximum)[!, 2])

In [None]:
# Base CVD history
@show proportionmap(combine(DataFrames.groupby(fpg_df, :MaskID), :cvd_hx_baseline => first)[!, 2])
countmap(combine(DataFrames.groupby(fpg_df, :MaskID), :cvd_hx_baseline => first)[!, 2])

In [None]:
# Sex
@show proportionmap(combine(DataFrames.groupby(fpg_df, :MaskID), :female => first)[!, 2])
countmap(combine(DataFrames.groupby(fpg_df, :MaskID), :female => first)[!, 2])

In [None]:
# Race
countmap(combine(DataFrames.groupby(fpg_df, :MaskID), :raceclass => first)[!, 2]), 
proportionmap(combine(DataFrames.groupby(fpg_df, :MaskID), :raceclass => first)[!, 2])