# NHANES: Medical Conditions Real Data Example:

In this notebook I use the "Questionnaire data" from [NHANES 2017-2020 datasets](https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Questionnaire&Cycle=2017-2020).

We will compare the estimates from the random intercept model with Bernoulli Base using QuasiCopula.jl vs. MixedModels.jl using "Medical Conditions" Questionnaire data, the data dictionary can be found [here](https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/P_MCQ.htm#).
    
    - We will cluster by ID variable SEQN and control for weight in kg
    
    - Each outcome vector is a vector of length 5 of the following indicators:
    (1) "MCQ010": Ever been told you have asthma {1 = Yes, 0 = No}
    (2) "MCQ080": Doctor ever said you were overweight {1 = Yes, 0 = No}
    (3) "MCQ160A": Doctor ever said you had arthritis {1 = Yes, 0 = No}
    (4) "MCQ160P": Ever told you had COPD, emphysema, ChB {1 = Yes, 0 = No}
    (5) "MCQ371B": Are you now increasing exercise {1 = Yes, 0 = No}

In [1]:
versioninfo()

Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8


In [2]:
using QuasiCopula, CSV, LinearAlgebra, DataFrames, GLM, MixedModels

In [3]:
BLAS.set_num_threads(1)
Threads.nthreads()

8

# Read in the dataset

In [4]:
df = CSV.read("nhanes_medicalconditions_random_5.csv", DataFrame)
df[!, :SEQN] .= string.(df[!, :SEQN])
df

Unnamed: 0_level_0,Column1,SEQN,Condition,Value,value,weight
Unnamed: 0_level_1,Int64,String,String,Int64,String,Float64
1,1,109266,MCQ010,0,BMXWT,97.1
2,2,109266,MCQ080,1,BMXWT,97.1
3,3,109266,MCQ160A,0,BMXWT,97.1
4,4,109266,MCQ160P,0,BMXWT,97.1
5,5,109266,MCQ371B,1,BMXWT,97.1
6,6,109271,MCQ010,1,BMXWT,98.8
7,7,109271,MCQ080,0,BMXWT,98.8
8,8,109271,MCQ160A,1,BMXWT,98.8
9,9,109271,MCQ160P,1,BMXWT,98.8
10,10,109271,MCQ371B,1,BMXWT,98.8


In [9]:
y = :Value
grouping = :SEQN
covariates = [:weight]

d = Bernoulli()
link = LogitLink()
model = VC_model(df, y, grouping, covariates, d, link)

Quasi-Copula Variance Component Model
  * base distribution: Bernoulli
  * link function: LogitLink
  * number of clusters: 8402
  * cluster size min, max: 5, 5
  * number of variance components: 1
  * number of fixed effects: 2

In [10]:
@time QuasiCopula.fit!(model)

initializing β using Newton's Algorithm under Independence Assumption
gcm.β = [-2.116541325648103, 0.015478924283567065]
initializing variance components using MM-Algorithm
gcm.θ = [0.01141672501325964]
Total number of variables............................:        3
                     variables with only lower bounds:        1
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0


Number of Iterations....: 16

                                   (scaled)                 (unscaled)
Objective...............:   4.1507467478198083e+02    2.5408247200435046e+04
Dual infeasibility......:   2.65816281418

-25408.247200435046

In [11]:
@show model.β
@show model.θ;

model.β = [-2.1498197674367825, 0.01573587485174485]
model.θ = [0.012480931117962406]


### Fit using MixedModels.jl

Now we fit the same model using MixedModels.jl with 25 Gaussian quadrature points. 

In [12]:
glmm_formula = @formula(Value ~ 1 + weight + (1|SEQN));
mdl = GeneralizedLinearMixedModel(glmm_formula, df, d, link)
@time MixedModels.fit!(mdl; nAGQ = 25);
GLMM_β = mdl.beta
GLMM_θ = mdl.σs[1][1]^2
@show GLMM_β
@show GLMM_θ;

│  - To prevent this behaviour, do `ProgressMeter.ijulia_behavior(:append)`. 
└ @ ProgressMeter /Users/sarahji/.julia/packages/ProgressMeter/Vf8un/src/ProgressMeter.jl:620
[32mMinimizing 94 	 Time: 0:00:03 (42.18 ms/it)[39m


  3.976660 seconds (35.06 k allocations: 1.790 MiB)
GLMM_β = [-2.147465147943177, 0.01570120614446389]
GLMM_θ = 0.06418738295383387


# Takeaways:

 - Estimates are similar
 - Ours is faster