## Simulated Logistic Data (Random Intercept)

In this notebook we will compare the fit of the copula model to that from MixedModels.jl.: 

    (1) MixedModels Logl = -581.2623152580586
    (1) Our Logl = -580.9424179798858
    
    (2) MixedModels Sigma = 0.027436916
    (2) Our Sigma = 0.009318450420500316
    
    (3) MixedModels Beta = [-0.0153369, 1.8638]
    (3) Our Beta = [-0.00790385604489672, 1.894567813275244]

In [1]:
using DataFrames, MixedModels, Random, GeneralizedCopula, GLM
using ForwardDiff, Test, LinearAlgebra
using LinearAlgebra: BlasReal, copytri!

Random.seed!(1234)
p = 10   # number of genes

n = 100  # number of cells

df = DataFrame(gene = repeat('A':'J', outer = n), normal = rand(n * p), nresp = ones(n * p))

#lmm1 = lmm(@formula(nresp ~ 1 + (1|gene)), df);
lmm1 = LinearMixedModel(@formula(nresp ~ 1 +  normal + (1|gene)), df);
# simulate the linear mixed model response and fit it (as a sanity check)

refit!(simulate!(lmm1, β = [0.2, 1.5], σ = 0.2, θ = [2.]))

Linear mixed model fit by maximum likelihood
 nresp ~ 1 + normal + (1 | gene)
   logLik   -2 logLik     AIC        BIC    
  158.38107 -316.76215 -308.76215 -289.13113

Variance components:
            Column    Variance   Std.Dev.  
gene     (Intercept)  0.07291637 0.27003031
Residual              0.04049292 0.20122852
 Number of obs: 1000; levels of grouping factors: 10

  Fixed-effects parameters:
───────────────────────────────────────────────────
             Estimate  Std.Error   z value  P(>|z|)
───────────────────────────────────────────────────
(Intercept)  0.211444  0.0863288   2.44929   0.0143
normal       1.44991   0.022097   65.6157    <1e-99
───────────────────────────────────────────────────

In [2]:
# # #simulate the Logistic response
df[!, :counts] = rand.(Bernoulli.(GLM.linkinv.(canonicallink(Bernoulli()), response(lmm1))))
df

Unnamed: 0_level_0,gene,normal,nresp,counts
Unnamed: 0_level_1,Char,Float64,Float64,Bool
1,'A',0.590845,1.0,0
2,'B',0.766797,1.0,1
3,'C',0.566237,1.0,1
4,'D',0.460085,1.0,1
5,'E',0.794026,1.0,0
6,'F',0.854147,1.0,1
7,'G',0.200586,1.0,1
8,'H',0.298614,1.0,0
9,'I',0.246837,1.0,1
10,'J',0.579672,1.0,1


In [3]:
glmm1 = MixedModels.fit!(GeneralizedLinearMixedModel(@formula(counts ~ 1 + normal + (1|gene)), df, Bernoulli()), fast= true)

Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  counts ~ 1 + normal + (1 | gene)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 1162.5246

Variance components:
        Column     Variance   Std.Dev.  
gene (Intercept)  0.027436916 0.16564092

 Number of obs: 1000; levels of grouping factors: 10

Fixed-effects parameters:
─────────────────────────────────────────────────────
               Estimate  Std.Error   z value  P(>|z|)
─────────────────────────────────────────────────────
(Intercept)  -0.0153369   0.141446  -0.10843   0.9137
normal        1.8638      0.253509   7.35202   <1e-12
─────────────────────────────────────────────────────

In [4]:
loglikelihood(glmm1)

-581.2623152580586

In [5]:
groups = unique(df[!, :gene])
n, p, m = length(groups), 1, 1
d = Bernoulli()
D = typeof(d)
gcs = Vector{GLMCopulaVCObs{Float64, D}}(undef, n)
for (i, grp) in enumerate(groups)
    gidx = df[!, :gene] .== grp
    ni = count(gidx)
    y = Float64.(df[gidx, :counts])
    normal = Float64.(df[gidx, :normal])
    X = [ones(ni, 1) normal]
    V = [ones(ni, ni)]
    gcs[i] = GLMCopulaVCObs(y, X, V, d)
end
gcm = GLMCopulaVCModel(gcs);

initialize_model!(gcm)
@show gcm.β

1 0.0 -666.9799404289483 39
2 -666.9799404289483 -586.0818070360872 9
3 -586.0818070360872 -581.8035396454056 9
4 -581.8035396454056 -581.801314034753 9
5 -581.801314034753 -581.8013140270714 9
gcm.β = [-0.012968507265783212, 1.8546148866373162]


2-element Array{Float64,1}:
 -0.012968507265783212
  1.8546148866373162

Initialize β and σ2, here I just copy the solution for β and σ2 from MixedModels.jl over

In [6]:
fill!(gcm.Σ, 1.0)
update_Σ!(gcm)
@show gcm.Σ
GeneralizedCopula.loglikelihood3!(gcm, true, true)

gcm.Σ = [0.008387787645184741]


-580.980177097489

In [7]:
@time fit2!(gcm, IpoptSolver(print_level = 5, max_iter = 100, derivative_test = "first-order", hessian_approximation = "limited-memory"))


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Starting derivative checker for first derivatives.

* grad_f[          1] = -6.9082772786589840e+02    ~ -2.1431677318256029e+03  [ 6.777e-01]
* grad_f[          2] = -3.7422763319189471e+02    ~ -1.5373882979474729e+04  [ 9.757e-01]

Derivative checker detected 2 error(s).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0



GLMCopulaVCModel{Float64,Bernoulli{Float64}}(GLMCopulaVCObs{Float64,Bernoulli{Float64}}[GLMCopulaVCObs{Float64,Bernoulli{Float64}}([0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0  …  0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0], [1.0 0.5908446386657102; 1.0 0.6488819502093455; … ; 1.0 0.9280266587797947; 1.0 0.6017241156288353], [[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]], [-0.46343144810221837, -0.3336409567807186], [-0.8716202081586084 -0.5149921269432042; -0.27147894825271857 -0.176157789383006; … ; -1.1996209239867608 -1.1132801978897637; -0.28388137449831097 -0.17081826901349437], [2.6353460363e-314], [2.638871404e-314], [-23.285473078427316 -8.790074034540593; -10.504086034708939 -6.2300206851735025], [2.655332399e-314], [-1.7432404163172173, 0.5429578965054369, 0.40801966126646216, 0.41558069573653583, 0.9608887405125223, 0.5378730495397693, -1.8327522122297417, 0.8269724336241433, 0.667306512981481, 0.5509652016697016  …  -1.0051100

In [8]:
@show loglikelihood3!(gcm, true, true)
@show gcm.β
@show gcm.∇β

loglikelihood3!(gcm, true, true) = -580.9424179798858
gcm.β = [-0.00790385604489672, 1.894567813275244]
gcm.∇β = [-4.154814270407314e-9, -2.2657369314060816e-9]


2-element Array{Float64,1}:
 -4.154814270407314e-9
 -2.2657369314060816e-9

In [9]:
gcm = GLMCopulaVCModel(gcs);

initialize_model!(gcm)

@show gcm.β
fill!(gcm.Σ, 1.0)
update_Σ!(gcm)
GeneralizedCopula.loglikelihood3!(gcm, true, true)
@time fit2!(gcm, IpoptSolver(print_level = 5, max_iter = 100, hessian_approximation = "exact"))

1 0.0 -666.9799404289483 39
2 -666.9799404289483 -586.0818070360872 9
3 -586.0818070360872 -581.8035396454056 9
4 -581.8035396454056 -581.801314034753 9
5 -581.801314034753 -581.8013140270714 9
gcm.β = [-0.012968507265783212, 1.8546148866373162]
This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints wit

GLMCopulaVCModel{Float64,Bernoulli{Float64}}(GLMCopulaVCObs{Float64,Bernoulli{Float64}}[GLMCopulaVCObs{Float64,Bernoulli{Float64}}([0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0  …  0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0], [1.0 0.5908446386657102; 1.0 0.6488819502093455; … ; 1.0 0.9280266587797947; 1.0 0.6017241156288353], [[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]], [-0.4634313999972788, -0.3336409340343549], [-0.871620208447289 -0.5149921271137696; -0.2714789481578448 -0.1761577893214441; … ; -1.1996209245114124 -1.1132801983766543; -0.28388137440331734 -0.1708182689563344], [2.6353460363e-314], [2.638871404e-314], [-25.0334110081309 -8.64827311101766; -11.308375596498724 -6.582912180138164], [2.655332399e-314], [-1.7432404168945779, 0.5429578963156896, 0.40801966108512944, 0.4155806955543787, 0.9608887403590028, 0.5378730493501169, -1.832752212867247, 0.8269724334507712, 0.667306512794009, 0.550965201479836  …  -1.005110093433922, 

In [10]:
@show loglikelihood3!(gcm, true, true)
@show gcm.β
@show gcm.∇β
@show gcm.Σ;

loglikelihood3!(gcm, true, true) = -580.9424179798858
gcm.β = [-0.007903855754504537, 1.8945678139048627]
gcm.∇β = [9.20409970461833e-9, 4.455006141768081e-9]
gcm.Σ = [0.009318450420500316]
