# nhefs: Bivariate Count Real Data Example:

In this notebook I use the [National Health and Nutrition Examination Survey Data I (NHANES 1) Epidemiologic Follow-up Study (NHEFS)](https://wwwn.cdc.gov/nchs/nhanes/nhefs/).

The data can be found in the [R package: causaldata](https://cran.r-project.org/web/packages/causaldata/causaldata.pdf).

We will compare the estimates from the random intercept model with Poisson Base using QuasiCopula.jl vs. MixedModels.jl
    
    - GROUPING: We will cluster by ID variable (seqn)
    - COVARIATES: Average price of tobacco in the state of residence (price)
    
    - OUTCOMES: Each outcome vector is a bivariate vector of the following counts:
    (1) NUMBER OF CIGARETTES SMOKED PER DAY IN 1971
    (2) NUMBER OF CIGARETTES SMOKED PER DAY IN 1982

In [1]:
versioninfo()

Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8


In [2]:
using QuasiCopula, CSV, LinearAlgebra, DataFrames, GLM, MixedModels

┌ Info: Precompiling QuasiCopula [c47b6ae2-b804-4668-9957-eb588c99ffbc]
└ @ Base loading.jl:1342


In [3]:
BLAS.set_num_threads(1)
Threads.nthreads()

8

# Read in the dataset

In [4]:
df = CSV.read("nhefs_bivariate_count.csv", DataFrame)
df[!, :seqn] .= string.(df[!, :seqn])
df

Unnamed: 0_level_0,Column1,seqn,count,price
Unnamed: 0_level_1,Int64,String,Int64,Float64
1,1,233,30,2.18359
2,2,233,20,1.73999
3,3,235,20,2.34668
4,4,235,10,1.79736
5,5,244,20,1.56958
6,6,244,6,1.51343
7,7,245,3,1.50659
8,8,245,7,1.4519
9,9,252,20,2.34668
10,10,252,20,1.79736


### Form the random intercept model at fit using QuasiCopula.jl

In [9]:
y = :count
grouping = :seqn
covariates = [:price]

d = Poisson()
link = LogLink()
model = VC_model(df, y, grouping, covariates, d, link; penalized = true)

Quasi-Copula Variance Component Model
  * base distribution: Poisson
  * link function: LogLink
  * number of clusters: 1537
  * cluster size min, max: 2, 2
  * number of variance components: 1
  * number of fixed effects: 2
  * L2 ridge penalty on variance components: true

In [10]:
@time QuasiCopula.fit!(model)

initializing β using Newton's Algorithm under Independence Assumption
gcm.β = [2.1159279480362807, 0.39787703892248655]
initializing variance components using MM-Algorithm
gcm.θ = [5.569804005046703]
Total number of variables............................:        3
                     variables with only lower bounds:        1
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0


Number of Iterations....: 18

                                   (scaled)                 (unscaled)
Objective...............:   2.5561938641713874e+02    2.1883035067236062e+04
Dual infeasibility......:   1.02879226974178

-21883.035067236062

In [11]:
@show model.β
@show model.θ;

model.β = [2.0440750847817437, 0.4299903917427388]
model.θ = [0.5299623044726194]


### Fit using MixedModels.jl

Now we fit the same model using MixedModels.jl with 25 Gaussian quadrature points. 

In [12]:
glmm_formula = @formula(count ~ 1 + price + (1|seqn));
mdl = GeneralizedLinearMixedModel(glmm_formula, df, d, link)
@time MixedModels.fit!(mdl; nAGQ = 25);
GLMM_β = mdl.beta
GLMM_θ = mdl.σs[1][1]^2
@show GLMM_β
@show GLMM_θ;

│  - To prevent this behaviour, do `ProgressMeter.ijulia_behavior(:append)`. 
└ @ ProgressMeter /Users/sarahji/.julia/packages/ProgressMeter/Vf8un/src/ProgressMeter.jl:620
[32mMinimizing 76 	 Time: 0:00:00 ( 3.53 ms/it)[39m


  0.270687 seconds (33.05 k allocations: 1.257 MiB)
GLMM_β = [1.5148402125927902, 0.5984329913583817]
GLMM_θ = 0.4823489292459855
