# Logistic Regression Random Intercept Model (VerbAgg data)


In this notebook we will use the VerbAgg dataset from MixedModels to test the fit of our copula model on the logistic regression outcome.

    (1) MixedModels Logl = -4749.666193096799
    (1) Our Logl = -4966.938142508903
    
    (2) MixedModels Sigma = 1.0555687
    (2) Our Sigma = 0.3736168505572097
    
    (3) MixedModels Beta = [-1.0314, 0.0458053]
    (3) Our Beta = [-0.8064836595836495, 0.0367491910284294]
    

In [1]:
using MixedModels, DataFrames, LinearAlgebra
using GLM, RData, GeneralizedCopula, Test
using LinearAlgebra: BlasReal, copytri!

datf = joinpath(dirname(pathof(MixedModels)), "..", "test", "dat.rda")
dat = Dict(Symbol(k)=> v for (k,v) in load(datf));
data = dat[:VerbAgg]
#transform the yes no outcome to 1 0 for logistic regression
out = map(x -> strip(String(x)) == "N" ? 0.0 : 1.0, data[!, :r2])

# using the mixed models package we get a higher loglikelihood
Df = DataFrame(y = out, a = data[:, :a], id = data[:, :id])
Df

Unnamed: 0_level_0,y,a,id
Unnamed: 0_level_1,Float64,Int32,Cat…
1,0.0,20,1
2,0.0,11,2
3,1.0,17,3
4,1.0,21,4
5,1.0,17,5
6,1.0,21,6
7,1.0,39,7
8,0.0,21,8
9,0.0,24,9
10,1.0,16,10


In [2]:
verbaggform = @formula(y ~ 1 + a + (1|id));
@show gm_all = fit(GeneralizedLinearMixedModel, verbaggform, Df, Bernoulli())
@show loglikelihood(gm_all) #  -4749.666193096799

gm_all = fit(GeneralizedLinearMixedModel, verbaggform, Df, Bernoulli()) = Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  y ~ 1 + a + (1 | id)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 9499.3324

Variance components:
      Column    Variance  Std.Dev. 
id (Intercept)  1.0555687 1.0274088

 Number of obs: 7584; levels of grouping factors: 316

Fixed-effects parameters:
─────────────────────────────────────────────────────
               Estimate  Std.Error   z value  P(>|z|)
─────────────────────────────────────────────────────
(Intercept)  -1.0314     0.26913    -3.83233   0.0001
a             0.0458053  0.0130787   3.50229   0.0005
─────────────────────────────────────────────────────
loglikelihood(gm_all) = -4749.666193096799


-4749.666193096799

In [3]:
groups = unique(Df[!, :id])
n = length(groups)
d = Bernoulli()
D = typeof(d)
gcs = Vector{GLMCopulaVCObs{Float64, D}}(undef, n)
 for (i, grp) in enumerate(groups)
    gidx = Df[!, :id] .== string(grp)
    ni = count(gidx)
    y = Float64.(out[gidx, 1])
    anger = Float64.(Df[gidx, :a])
    X = [ones(ni, 1) anger]
    V = [ones(ni, ni)]
    gcs[i] = GLMCopulaVCObs(y, X, V, d)
end
gcm_all = GLMCopulaVCModel(gcs);

## MM update for Sigma

$$
  \begin{eqnarray}
\mathcal{L}(\sigma) = - \sum_i^n \ln \left(1 + \sum_k^m \sigma_k^2 t_{ik}\right) + \sum_i^n \ln \left(1 + \frac{1}{2} \sum_k^m \sigma_k^2 q_{ik}\right)
\end{eqnarray}
$$

where
$$
\begin{eqnarray*}
	t_{ik} &=& \frac 12 \text{tr}(\mathbf{V}_{ik}), \quad \mathbf{t}_i = (t_{i1}, \ldots, t_{im})^T \\
	q_{ik} &=& \frac{1}{2}  \mathbf{r}_i( \boldsymbol{\beta})^T \mathbf{V}_{ik}  \mathbf{r}_i (\boldsymbol{\beta}), \quad \mathbf{q}_i = (q_{i1}, \ldots, q_{im})^T.
\end{eqnarray*}
$$

where the $k^{th}$ element in the $m \times 1$ vector of variance components above is:

$$\begin{eqnarray*}
\sigma_{k}^{2^{(t+1)}} & = & \sigma_{k}^{2^{(t)}} \left(\frac{\sum_{i=1}^n
\frac{q_{ik}}{1+ q_i^{(t)}}}{\sum_{i=1}^n \frac{t_{ik}}{1+ t_i^{(t)}}} \right).
\end{eqnarray*}
$$

where $q_i^{(t)} = \sum_{k=1}^m \sigma_k^{2(t)} q_{ik}$ and $t_i^{(t)} = \sum_{k=1}^m \sigma_k^{2(t)} t_{ik}$. 


In this case $m = 1$ single variance component to model the random intercept. 

Initialize β and σ2, here I just copy the solution for β and σ2 from MixedModels.jl over

In [4]:
initialize_model!(gcm_all);
# gcm_all.β .= [-1.0314; 0.0458053] # from the mixed models package

fill!(gcm_all.Σ, 1.0)
# update σ2 from β using the MM algorithm
update_Σ!(gcm_all)

@show gcm_all.β
@show gcm_all.Σ

1 0.0 -5303.33883714196 1263
2 -5303.33883714196 -5221.383546700103 315
3 -5221.383546700103 -5221.3445167404 315
4 -5221.3445167404 -5221.344516711747 315
gcm_all.β = [-0.7954736728413148, 0.034971709990645014]
gcm_all.Σ = [0.36107907014459284]


1-element Array{Float64,1}:
 0.36107907014459284

In [5]:
@show loglikelihood3!(gcm_all, true, true)

loglikelihood3!(gcm_all, true, true) = -4967.48248349655


-4967.48248349655

In [6]:
@time fit2!(gcm_all, IpoptSolver(print_level = 5, derivative_test = "first-order"))


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Starting derivative checker for first derivatives.


No errors detected by derivative checker.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bound

GLMCopulaVCModel{Float64,Bernoulli{Float64}}(GLMCopulaVCObs{Float64,Bernoulli{Float64}}[GLMCopulaVCObs{Float64,Bernoulli{Float64}}([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0  …  1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0], [1.0 20.0; 1.0 20.0; … ; 1.0 20.0; 1.0 20.0], [[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]], [1.2763641672736594, 25.527283345473208], [-0.48244078068485574 -9.648815613697115; -0.48244078068485574 -9.648815613697115; … ; -0.5181983157499847 -10.363966314999693; -0.5181983157499847 -10.363966314999693], [8.487983166e-314], [0.0], [-20.795962273073208 -119.84676377035146; -415.9192454614642 -8318.384909229284], [2.5342201503e-314], [-0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, 1.0363966314999693, 1.0363966314999693  …  1.0363966314999693, 1.0363966314999693, -0.9648815613697115, -0.9648815613697115

In [7]:
@show loglikelihood3!(gcm_all, true, true)
@show gcm_all.β
@show gcm_all.∇β

loglikelihood3!(gcm_all, true, true) = -4966.938142508903
gcm_all.β = [-0.8064836595836494, 0.03674919102842939]
gcm_all.∇β = [1.9649013971445584e-8, 3.8375646838062494e-7]


2-element Array{Float64,1}:
 1.9649013971445584e-8
 3.8375646838062494e-7

In [8]:
gcm_all = GLMCopulaVCModel(gcs);

initialize_model!(gcm_all)
@show gcm_all.β
fill!(gcm_all.Σ, 1.0)
update_Σ!(gcm_all)
@show GeneralizedCopula.loglikelihood3!(gcm_all, true, true)
@time fit2!(gcm_all, IpoptSolver(print_level = 5, max_iter = 100, hessian_approximation = "exact"))

1 0.0 -5303.33883714196 1263
2 -5303.33883714196 -5221.383546700103 315
3 -5221.383546700103 -5221.3445167404 315
4 -5221.3445167404 -5221.344516711747 315
gcm_all.β = [-0.7954736728413148, 0.034971709990645014]
GeneralizedCopula.loglikelihood3!(gcm_all, true, true) = -4967.48248349655
This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:   

GLMCopulaVCModel{Float64,Bernoulli{Float64}}(GLMCopulaVCObs{Float64,Bernoulli{Float64}}[GLMCopulaVCObs{Float64,Bernoulli{Float64}}([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0  …  1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0], [1.0 20.0; 1.0 20.0; … ; 1.0 20.0; 1.0 20.0], [[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]], [1.2763641672736594, 25.527283345473208], [-0.48244078068485574 -9.648815613697115; -0.48244078068485574 -9.648815613697115; … ; -0.5181983157499847 -10.363966314999693; -0.5181983157499847 -10.363966314999693], [8.487983166e-314], [0.0], [-20.795962273073204 -119.84676377035144; -415.9192454614642 -8318.384909229284], [2.5342201503e-314], [-0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, -0.9648815613697115, 1.0363966314999693, 1.0363966314999693  …  1.0363966314999693, 1.0363966314999693, -0.9648815613697115, -0.9648815613697115

In [9]:
@show loglikelihood3!(gcm_all, true, true)
@show gcm_all.β
@show gcm_all.∇β
@show gcm_all.Σ

loglikelihood3!(gcm_all, true, true) = -4966.938142508903
gcm_all.β = [-0.8064836595836495, 0.0367491910284294]
gcm_all.∇β = [1.96489606807404e-8, 3.837563546937872e-7]
gcm_all.Σ = [0.3736168505572097]


1-element Array{Float64,1}:
 0.3736168505572097