# The Action to Control Cardiovascular Disease (ACCORD)

This notebook goes over the code needed to reproduce the results from Table 4, using WiSER with the Action to Control Cardiovascular Disease data, investigating factors related to glycemic levels and glycemic variability with fasting plasma glucose levels as the outcome.

#### Packages and Reproducibility

Julia allows for easy reproducibility, by including a `Manifest.toml` and `Project.toml` pair, the user can simply run `] activate .` and the correct environment with dependencies used will run.  

In [1]:
]activate .

[32m[1m Activating[22m[39m environment at `~/WiSER_Reproduce/accord_glycemic_variation_analysis/Project.toml`


Note: We use the KNITRO solver in our analysis, which requires a KNITRO license. If you wish to run the analysis without it, you can use another solver, but the results will be slightly different. Commented code is given to do this.

## Availability & Description

Due to confidentiality concerns, access to the ACCORD dataset is only available through the National Institutes of Health's (NIH) Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC). Researchers can apply for acceess to download this dataset through BioLINCC. 

The URL for the webpage is https://biolincc.nhlbi.nih.gov/studies/accord/ and the Study Accession identifier is HLB01041317a. This page includes a description of the dataset, study, and details on how to request access to the data. We cannot give more details on the data due to BioLINCC's data use agreement. 

Due to data confidentiality concerns, we supress output of the dataframes that show subject-level data. 

This notebook goes over code, that when used with BioLINCC's ACCORD data, can reproduce results in the paper (Table 3).

In [2]:
versioninfo()

Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9920X CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)


## Data Cleaning

The majority of data cleaning was done in R using the provided `accordScript.Rmd` R markdown file. Below is the remaining done, mostly for quality control, done in Julia.

In [3]:
# load packages & data
using DataFrames, CSV, WiSER, KNITRO, StatsBase
ENV["COLUMNS"]=1200

df = DataFrame!(CSV.File("AccordWiSER_oralmeds.csv", missingstring="NA"));

We want to look at g2avetid (total insulin), but some values are missing. To decide if we can fill them with 0, we look at where the prescribed insulin is 0 or missing and see what those values are. If they are 0 or missing, then correct g2avetid should be 0. If it's higher then let's throw them out just in case it is error prone. 

In [4]:
missing_g2avetid = findall(ismissing.(df.g2avetid))
deleteids = map(x -> ismissing(x) ? false : x .== 0 ? false : true, df[missing_g2avetid, :g2prscbin])
deleterows = missing_g2avetid[deleteids] #31 possible errors, so we delete them.
delete!(df, deleterows);

In [5]:
#change to months and remove exit visits 
visitnumber = Dict("BLR" => 0,
                "F01" => 1,
                 "F04" => 4,
                 "F08" => 8,
                 "F12" => 12,
                 "F16" => 16,
                 "F20" => 20,
                 "F24" => 24,
                 "F28" => 28,
                 "F32" => 32,
                 "F36" => 36,
                 "F40" => 40,
                 "F44" => 44,
                 "F48" => 48,
                 "F52" => 52,
                 "F56" => 56,
                 "F60" => 60,
                 "F64" => 64,
                 "F68" => 68,
                 "F72" => 72,
                 "F76" => 76,
                 "F80" => 80,
                 "F84" => 84,
                 "EXIT" => missing
)


fpg_analysisvars = [:MaskID, :Visit, :VisitNumber, :fpg, :female, :baseline_age,
    :raceclass, :bmi, :std_bmi, :cvd_hx_baseline, :std_age,
    :std_visit, :g2avetid, :std_g2avetid, :g2aveba, :g2avebol, :g2avepba,
     :sulphonylureas, :alpha_glucosidase_inhibitors, :incretin_mimetics,
    :metformin, :meglitinides, :thiazolidinediones, :antihyperglycemics,
    :DPP4_inhibitors, :Insulin_BMI, :insulin_wtkg]


df[!, :raceclass] = levels!(CategoricalArray(df[!, :raceclass]),
    ["White"; "Black"; "Hispanic"; "Other"])

df.VisitNumber = map(x -> get(visitnumber, x, 0), df.Visit)
df.bmi = df.wt_kg ./ (df.ht_cm ./ 100).^2
standardizes(x) = (x .- mean(skipmissing(x))) ./ std(skipmissing(x))
df[!, :std_bmi] = standardizes(df[!, :bmi])
df[!, :std_age] = standardizes(df[!, :baseline_age])
df[!, :std_visit] = standardizes(df[!, :VisitNumber])

#set to 0 if missing since it should be 0. 
df.g2avetid = map(x -> ismissing(x) ? 0 : x, df.g2avetid)
df.g2aveba = map(x -> ismissing(x) ? 0 : x, df.g2aveba)
df.g2avebol = map(x -> ismissing(x) ? 0 : x, df.g2avebol)
df.g2avepba = map(x -> ismissing(x) ? 0 : x, df.g2avepba)

df[!, :std_g2avetid] = standardizes(df[!, :g2avetid])
df[!, :insulin_wtkg] = df[!, :g2avetid] ./ df.wt_kg
df[!, :Insulin_BMI] = df[!, :g2avetid] ./ df[!, :bmi]



fpg_df = select(df, fpg_analysisvars)
fpg_df = dropmissing(fpg_df, fpg_analysisvars)
CSV.write("accord_fpg_final_withmeds.csv", fpg_df) # optionally save this cleaned dataset

## Analysis

The following produces the results reported in Table 4, the analysis of ACCORD data using WiSER.

The following constructs and fits the model. The `Optimization unsuccessful` warnings can be ignored because KNITRO by default uses a very stringent convergence criterion. FeasibleApproximate indicates the solution is adequate. Other nonlinear optimization solvers such as IPOPT will return `Optimal` status.

In [23]:
fpg_df = DataFrame!(CSV.File("accord_fpg_final_withmeds.csv"));

fpg_df[!, :raceclass] = levels!(CategoricalArray(fpg_df[!, :raceclass]),
    ["White"; "Black"; "Hispanic"; "Other"])

solver = KNITRO.KnitroSolver(outlev=0)
fpg_withmeds_insulinkg = WSVarLmmModel(@formula(fpg ~ 1 + VisitNumber + bmi + female + baseline_age + 
        raceclass + cvd_hx_baseline + insulin_wtkg +
        sulphonylureas +
        metformin + meglitinides + thiazolidinediones),
    @formula(fpg ~ 1 + VisitNumber), @formula(fpg ~ 1 + VisitNumber + bmi + female + baseline_age + 
         raceclass + cvd_hx_baseline + insulin_wtkg +
        sulphonylureas +
        metformin + meglitinides + thiazolidinediones), 
    :MaskID, fpg_df);
@time WiSER.fit!(fpg_withmeds_insulinkg, solver, parallel = false, runs = 8)

### IF NO KNITRO LICENSE, comment out line 4 above (solver = ...) and run:
# solver = Ipopt.IpoptSolver(print_level=0, watchdog_shortened_iter_trigger=3, max_iter=100)
# @time WiSER.fit!(fpg_withmeds_insulinkg, solver, parallel = false, runs = 8)

run = 1, ‖Δβ‖ = 16.146207, ‖Δτ‖ = 0.703541, ‖ΔL‖ = 1.039015, status = Optimal, time(s) = 2.506867
run = 2, ‖Δβ‖ = 1.686140, ‖Δτ‖ = 0.054100, ‖ΔL‖ = 0.316476, status = Optimal, time(s) = 2.437670
run = 3, ‖Δβ‖ = 0.582325, ‖Δτ‖ = 0.024910, ‖ΔL‖ = 0.018916, status = FeasibleApproximate, time(s) = 6.045182


└ @ WiSER /home/cgerman/.julia/packages/WiSER/tXr2S/src/fit.jl:63


run = 4, ‖Δβ‖ = 0.095528, ‖Δτ‖ = 0.008882, ‖ΔL‖ = 0.018769, status = FeasibleApproximate, time(s) = 3.577827


└ @ WiSER /home/cgerman/.julia/packages/WiSER/tXr2S/src/fit.jl:63


run = 5, ‖Δβ‖ = 0.060079, ‖Δτ‖ = 0.001524, ‖ΔL‖ = 0.003220, status = FeasibleApproximate, time(s) = 4.123524


└ @ WiSER /home/cgerman/.julia/packages/WiSER/tXr2S/src/fit.jl:63


run = 6, ‖Δβ‖ = 0.025578, ‖Δτ‖ = 0.001776, ‖ΔL‖ = 0.002901, status = FeasibleApproximate, time(s) = 5.161650


└ @ WiSER /home/cgerman/.julia/packages/WiSER/tXr2S/src/fit.jl:63


run = 7, ‖Δβ‖ = 0.012787, ‖Δτ‖ = 0.000557, ‖ΔL‖ = 0.001256, status = FeasibleApproximate, time(s) = 4.274475


└ @ WiSER /home/cgerman/.julia/packages/WiSER/tXr2S/src/fit.jl:63


run = 8, ‖Δβ‖ = 0.006982, ‖Δτ‖ = 0.000413, ‖ΔL‖ = 0.000709, status = FeasibleApproximate, time(s) = 5.806255
 

└ @ WiSER /home/cgerman/.julia/packages/WiSER/tXr2S/src/fit.jl:63


34.222602 seconds (20.95 k allocations: 1.615 MiB)



Within-subject variance estimation by robust regression (WiSER)
Number of individuals/clusters: 10195
Total observations: 67063

Fixed-effects parameters:
────────────────────────────────────────────────────────────────────
                             Estimate   Std. Error       Z  Pr(>|Z|)
────────────────────────────────────────────────────────────────────
β1: (Intercept)          219.009       4.21204       52.00    <1e-99
β2: VisitNumber           -0.214364    0.0103507    -20.71    <1e-94
β3: bmi                   -0.0367907   0.0576003     -0.64    0.5230
β4: female                -1.39081     0.674363      -2.06    0.0392
β5: baseline_age          -0.747074    0.0505662    -14.77    <1e-48
β6: raceclass: Black      -8.54919     0.85301      -10.02    <1e-22
β7: raceclass: Hispanic   -2.26932     1.29648       -1.75    0.0801
β8: raceclass: Other      -1.26865     1.12104       -1.13    0.2578
β9: cvd_hx_baseline        0.963786    0.684931       1.41    0.1594
β10: insulin_wtk

In [24]:
#replace names with more descriptive names

mean_names = ["β$i: " for i in 1:14] .* [
    "Intercept"
    "Visit Number"
    "BMI"
    "Female"
    "Baseline Age"
    "Race: Black"
    "Race: Hispanic"
    "Race: Other"
    "CVD History"
    "Total Injected Insulin (units/kg body weight)"
    "Sulphonylureas"
 "Metformin"
 "Meglitinides"
 "Thiazolidinediones"]
wsvar_names =  ["τ$i: " for i in 1:14] .* [
    "Intercept"
    "Visit Number"
    "BMI"
    "Female"
    "Baseline Age"
    "Race: Black"
    "Race: Hispanic"
    "Race: Other"
    "CVD History"
    "Total Injected Insulin (units/kg body weight)"
    "Sulphonylureas"
 "Metformin"
 "Meglitinides"
 "Thiazolidinediones"]

fpg_withmeds_insulinkg.meannames .= mean_names
fpg_withmeds_insulinkg.wsvarnames .= wsvar_names
fpg_withmeds_insulinkg


Within-subject variance estimation by robust regression (WiSER)
Number of individuals/clusters: 10195
Total observations: 67063

Fixed-effects parameters:
───────────────────────────────────────────────────────────────────────────────────────────────
                                                        Estimate   Std. Error       Z  Pr(>|Z|)
───────────────────────────────────────────────────────────────────────────────────────────────
β1: Intercept                                       219.009       4.21204       52.00    <1e-99
β2: Visit Number                                     -0.214364    0.0103507    -20.71    <1e-94
β3: BMI                                              -0.0367907   0.0576003     -0.64    0.5230
β4: Female                                           -1.39081     0.674363      -2.06    0.0392
β5: Baseline Age                                     -0.747074    0.0505662    -14.77    <1e-48
β6: Race: Black                                      -8.54919     0.85301   

#### Supplementary Table S.4

The following obtains the results of summary statistics found in Supplementary Table S.4.

In [9]:
fpg_df = DataFrame!(CSV.File("accord_fpg_final_withmeds.csv"));
# descriptive summary statistics
describe(fpg_df, :mean, :std, :median, :min, :max, :nunique)

Unnamed: 0_level_0,variable,mean,std,median,min,max,nunique
Unnamed: 0_level_1,Symbol,Union…,Union…,Union…,Any,Any,Union…
1,MaskID,105058.0,2951.15,105032.0,100001,110251,
2,Visit,,,,BLR,F84,22.0
3,VisitNumber,21.1016,20.2368,12.0,0,84,
4,fpg,142.153,53.2534,134.0,17.0,620.0,
5,female,0.38182,0.485836,0.0,0,1,
6,baseline_age,62.7023,6.55255,61.9,44.4,79.3,
7,raceclass,,,,Black,White,4.0
8,bmi,32.679,5.75145,32.1693,16.7308,59.7833,
9,std_bmi,-0.0158996,0.985565,-0.103242,-2.74878,4.62868,
10,cvd_hx_baseline,0.341858,0.474336,0.0,0,1,


In [10]:
# age 
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :baseline_age => first)[!, 2]), 
std(combine(DataFrames.groupby(fpg_df, :MaskID), :baseline_age => first)[!, 2])

(62.75461500735655, 6.636880021061068)

In [11]:
# bmi
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :bmi => mean)[!, 2]), 
std(combine(DataFrames.groupby(fpg_df, :MaskID), :bmi => mean)[!, 2])

(32.66151849598058, 5.585395294277677)

In [12]:
# fasting plasma glucose 
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :fpg => mean)[!, 2]), 
std(combine(DataFrames.groupby(fpg_df, :MaskID), :fpg => mean)[!, 2]) #people w higher plasma came more often? 

(143.75401007907138, 34.865447717327825)

In [13]:
# oral meds
countmap(fpg_df[!, :sulphonylureas]), proportionmap(fpg_df[!, :sulphonylureas])

(Dict(0 => 28548,1 => 38515), Dict(0 => 0.42568927724676797,1 => 0.574310722753232))

In [14]:
countmap(fpg_df[!, :metformin]), proportionmap(fpg_df[!, :metformin])

(Dict(0 => 12842,1 => 54221), Dict(0 => 0.19149158254178905,1 => 0.8085084174582109))

In [15]:
countmap(fpg_df[!, :thiazolidinediones]), proportionmap(fpg_df[!, :thiazolidinediones])

(Dict(0 => 31400,1 => 35663), Dict(0 => 0.46821645318581034,1 => 0.5317835468141896))

In [16]:
countmap(fpg_df[!, :meglitinides]), proportionmap(fpg_df[!, :meglitinides])

(Dict(0 => 56879,1 => 10184), Dict(0 => 0.8481427911068696,1 => 0.15185720889313034))

In [17]:
# number of visits
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :VisitNumber => length)[!, 2]), std
(combine(DataFrames.groupby(fpg_df, :MaskID), :VisitNumber => length)[!, 2])

10195-element Array{Int64,1}:
  5
 10
  5
  4
  7
  7
  8
  7
 10
  2
  6
  7
  7
  ⋮
  7
  6
  4
  7
  8
  9
  7
  7
  7
  5
  1
  7

In [18]:
# summary stats for maximum number of months of treatment for each person
mean(combine(DataFrames.groupby(fpg_df, :MaskID), :VisitNumber => maximum)[!, 2]),
std(combine(DataFrames.groupby(fpg_df, :MaskID), :VisitNumber => maximum)[!, 2])

(47.452280529671405, 18.779508947137924)

In [19]:
# Base CVD history
@show proportionmap(combine(DataFrames.groupby(fpg_df, :MaskID), :cvd_hx_baseline => first)[!, 2])
countmap(combine(DataFrames.groupby(fpg_df, :MaskID), :cvd_hx_baseline => first)[!, 2])

proportionmap((combine(DataFrames.groupby(fpg_df, :MaskID), :cvd_hx_baseline => first))[!, 2]) = Dict(0 => 0.6485532123589995,1 => 0.3514467876410005)


Dict{Int64,Int64} with 2 entries:
  0 => 6612
  1 => 3583

In [20]:
# Sex
@show proportionmap(combine(DataFrames.groupby(fpg_df, :MaskID), :female => first)[!, 2])
countmap(combine(DataFrames.groupby(fpg_df, :MaskID), :female => first)[!, 2])

proportionmap((combine(DataFrames.groupby(fpg_df, :MaskID), :female => first))[!, 2]) = Dict(0 => 0.6142226581657675,1 => 0.38577734183423246)


Dict{Int64,Int64} with 2 entries:
  0 => 6262
  1 => 3933

In [21]:
# Race
countmap(combine(DataFrames.groupby(fpg_df, :MaskID), :raceclass => first)[!, 2]), 
proportionmap(combine(DataFrames.groupby(fpg_df, :MaskID), :raceclass => first)[!, 2])

(Dict("Black" => 1946,"Other" => 1165,"White" => 6351,"Hispanic" => 733), Dict("Black" => 0.19087788131436978,"Other" => 0.114271701814615,"White" => 0.622952427660618,"Hispanic" => 0.07189798921039725))