<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Dyestuff-Data" data-toc-modified-id="Dyestuff-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Dyestuff Data</a></span></li><li><span><a href="#Dyestuff2-data" data-toc-modified-id="Dyestuff2-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Dyestuff2 data</a></span></li><li><span><a href="#Sleepstudy-Data" data-toc-modified-id="Sleepstudy-Data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Sleepstudy Data</a></span></li></ul></div>

# Model Fit Comparison: LMM vs GLMCopula

In [1]:
versioninfo()

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


In [2]:
#include("../src/GLMCopula.jl")

In [3]:
using GLMCopula, RData, MixedModels, GLM
using LinearAlgebra, Random

┌ Info: Recompiling stale cache file /Users/sarahji/.julia/compiled/v1.2/GLMCopula/1swgR.ji for GLMCopula [e1188d9a-b146-567e-aa2b-a46eb542ab6f]
└ @ Base loading.jl:1240
│ - If you have GLMCopula checked out for development and have
│   added GLM as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with GLMCopula


## Dyestuff Data

This is a random intercept model `Y ~ 1 + (1|G)` for a data set in R. The fitted model by the [MixedModels.jl package](https://dmbates.github.io/MixedModels.jl/stable/constructors/#Models-with-simple,-scalar-random-effects-1) yields a loglikelihood of **-163.66353**. We will see the copula model has a slightly better fit with loglikelihood **-163.35545**.

In [4]:
dat = Dict(Symbol(k)=>v for (k,v) in 
    load(joinpath(dirname(pathof(MixedModels)), "..", "test", "dat.rda")))
dyestuff = dat[:Dyestuff]

Unnamed: 0_level_0,G,Y
Unnamed: 0_level_1,Categorical…,Float64
1,A,1545.0
2,A,1440.0
3,A,1440.0
4,A,1520.0
5,A,1580.0
6,B,1540.0
7,B,1555.0
8,B,1490.0
9,B,1560.0
10,B,1495.0


In [5]:
function create_gcm1(data)
    groups = unique(data[!, :G])
    n, p, m = length(groups), 1, 1
    d = Normal()
    D = typeof(d)
    gcs = Vector{GaussianCopulaVCObs{Float64, D}}(undef, n)
    #quantity = zeros(n)
    for (i, grp) in enumerate(groups)
        gidx = data[!, :G] .== grp
        ni = count(gidx)
        y = Float64.(data[gidx, :Y])
        X = ones(ni, 1)
        V = [ones(ni, ni)]
        gcs[i] = GaussianCopulaVCObs(y, X, V, d)
    end
    gcm = GaussianCopulaVCModel(gcs)
    return gcm
end

gcm = create_gcm1(dyestuff);

In [6]:
# initialize β and τ from least square solution
init_β!(gcm)
@show gcm.β
# update σ2 and τ from β using MM algorithm
@show gcm.τ
fill!(gcm.Σ, 1)
update_Σ!(gcm)
@show gcm.Σ;

gcm.β = [1527.4999999999998]
gcm.τ = [0.0002604449267498643]
gcm.Σ = [0.8636149140056555]


In [7]:
@show loglikelihood2!(gcm, true, false)
@show gcm.∇β
@show gcm.∇τ
@show gcm.∇Σ;

loglikelihood2!(gcm, true, false) = -160.88911694859192
gcm.∇β = [0.06561148856048625]
gcm.∇τ = [-0.004475727521338513]
gcm.∇Σ = [-8.467730615246971e-6]


In [8]:
# @time GLMCopula.fit!(gcm, NLopt.NLoptSolver(algorithm=:LD_SLSQP, maxeval=4000))
@time GLMCopula.fit!(gcm, IpoptSolver(print_level=5))
@show gcm.β
@show gcm.τ
@show gcm.Σ;


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        1
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equ

In [9]:
@show loglikelihood2!(gcm, true, false)
@show gcm.∇β
@show gcm.∇τ
@show gcm.∇Σ;

loglikelihood2!(gcm, true, false) = -159.66121162685212
gcm.∇β = [-4.762760741350291e-10]
gcm.∇τ = [6.678692443529144e-8]
gcm.∇Σ = [6.648207245643926e-11]


## Dyestuff2 data

`Dyestuff2` is a data set simulated from LMM. [The fitted model by MixedModels.jl package](https://github.com/dmbates/MixedModelsinJulia/blob/master/SimpleLMM.ipynb) has a log-likelihood of **-81.436518**, while copula model yields **-81.436519**.

In [10]:
dyestuff2 = dat[:Dyestuff2]

Unnamed: 0_level_0,G,Y
Unnamed: 0_level_1,Categorical…,Float64
1,A,7.298
2,A,3.846
3,A,2.434
4,A,9.566
5,A,7.99
6,B,5.22
7,B,6.556
8,B,0.608
9,B,11.788
10,B,-0.892


In [11]:
gcm2 = create_gcm1(dyestuff2);

In [12]:
# initialize β and τ from least square solution
init_β!(gcm2)
@show gcm2.β
@show gcm2.τ
# update σ2 and τ from β using MM algorithm
fill!(gcm2.Σ, 1)
update_Σ!(gcm2)
@show gcm2.Σ;

gcm2.β = [5.6655999999999995]
gcm2.τ = [0.07492826008723599]
gcm2.Σ = [8.002522079028351e-7]


In [13]:
@show loglikelihood2!(gcm2, true, false)
@show gcm2.∇β
@show gcm2.∇τ
@show gcm2.∇Σ;

loglikelihood2!(gcm2, true, false) = -79.3976395254606
gcm2.∇β = [-7.017719738655614e-13]
gcm2.∇τ = [-7.681370063394866e-5]
gcm2.∇Σ = [-7.192151129476404]


In [14]:
@time GLMCopula.fit!(gcm2, IpoptSolver(print_level=5))
@show gcm2.β
@show gcm2.τ
@show gcm2.Σ;

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        1
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 

In [15]:
@show loglikelihood2!(gcm2, true, false)
@show gcm2.∇β
@show gcm2.∇τ
@show gcm2.∇Σ;

loglikelihood2!(gcm2, true, false) = -79.39763458163671
gcm2.∇β = [-1.2878587085651816e-14]
gcm2.∇τ = [-1.0833236917306976e-5]
gcm2.∇Σ = [-7.192167502739009]


## Sleepstudy Data

Sleepstudy data has random intercept and slope within each group (individual) `Y ~ 1 + U + (1+U|G)`. The LMM model fit by [MixedModels.jl package](https://dmbates.github.io/MixedModels.jl/stable/constructors/#Models-with-vector-valued-random-effects-1) has a loglikelihood **-875.96967**.

In [16]:
sleepstudy = dat[:sleepstudy]

Unnamed: 0_level_0,Y,U,G
Unnamed: 0_level_1,Float64,Float64,Categorical…
1,249.56,0.0,308
2,258.705,1.0,308
3,250.801,2.0,308
4,321.44,3.0,308
5,356.852,4.0,308
6,414.69,5.0,308
7,382.204,6.0,308
8,290.149,7.0,308
9,430.585,8.0,308
10,466.353,9.0,308


In [17]:
function create_gcmLMM(data)
    groups = unique(data[!, :G])
    n, p, m = length(groups), 2, 2
    d = Normal()
    D = typeof(d)
    gcs = Vector{GaussianCopulaLMMObs{Float64, D}}(undef, n)
    #quantity = zeros(n)
    for (i, grp) in enumerate(groups)
        gidx = data[!, :G] .== grp
        ni = count(gidx)
        yi = Float64.(data[gidx, :Y])
        Ui = Float64.(data[gidx, :U])
        Xi = [ones(ni, 1) Ui]
        gcs[i] = GaussianCopulaLMMObs(yi, Xi, Xi, d)
    end
    gcm = GaussianCopulaLMMModel(gcs)
    return gcm
end

gcm3 = create_gcmLMM(sleepstudy);

In [18]:
# initialize β and τ from least square solution
init_β!(gcm3)
@show gcm3.β
@show gcm3.τ[1]
gcm3.Σ .= diagm(ones(2))
@show gcm3.Σ
@show loglikelihood2!(gcm3, true, false)
@show gcm3.∇β
@show gcm3.∇τ
@show gcm3.∇Σ;

gcm3.β = [251.40510484848488, 10.46728595959596]
gcm3.τ[1] = 0.0004441684924519196
gcm3.Σ = [1.0 0.0; 0.0 1.0]
loglikelihood2!(gcm3, true, false) = -901.0863608729611
gcm3.∇β = [-0.582840426898384, -3.686754757369668]
gcm3.∇τ = [37690.66749687556]
gcm3.∇Σ = [0.7263046709981121 -0.22102101114869332; -0.22102101114869332 -1.86408558819219]


In [19]:
@show loglikelihood2!(gcm3, true, false)
@show gcm3.∇β
@show gcm3.∇τ
@show gcm3.∇Σ;

loglikelihood2!(gcm3, true, false) = -901.0863608729611
gcm3.∇β = [-0.582840426898384, -3.686754757369668]
gcm3.∇τ = [37690.66749687556]
gcm3.∇Σ = [0.7263046709981121 -0.22102101114869332; -0.22102101114869332 -1.86408558819219]


In [20]:
# @time GLMCopula.fit!(gcm, NLopt.NLoptSolver(algorithm=:LD_SLSQP, maxeval=4000))
@time GLMCopula.fit!(gcm3, IpoptSolver(print_level=5))
@show gcm3.β
@show gcm3.τ
@show gcm3.Σ;

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        6
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 

                                  6.026077 seconds (3.34 M allocations: 388.517 MiB, 0.71% gc time)
gcm3.β = [250.8222644215865, 6.780531202226976]
gcm3.τ = [8280.845713831737]
gcm3.Σ = [18.269238405391302 -1.889399424282315; -1.889399424282315 0.19597891505081796]


└ @ GLMCopula /Users/sarahji/.julia/dev/GLMCopula/src/gaussian_lmm.jl:70


In [21]:
@show loglikelihood2!(gcm3, true, false)
@show gcm3.∇β
@show gcm3.∇τ
@show gcm3.∇Σ;

loglikelihood2!(gcm3, true, false) = -621.839257534785
gcm3.∇β = [2.559760545820388e7, 1.6052545537223977e8]
gcm3.∇τ = [-239260.78067938777]
gcm3.∇Σ = [4.733838637728748 37.86154576065062; 37.86154576065062 291.3485566881399]
