# Form the Logistic Regression Random Intercept Model: Simulated set

In this notebook we will use simulated dataset using MixedModels.jl to test the fit of our copula model on the logistic regression outcome.

For a single observation, $i = 1$ we will use ForwardDiff.jl to check the following calculations:

    (1) Loglikelihood

    (2) Gradient with respect to Beta

In [1]:
using DataFrames, MixedModels, Random, GeneralizedCopula, GLM
using ForwardDiff, Test, LinearAlgebra
using LinearAlgebra: BlasReal, copytri!

Random.seed!(1235)
p = 10   # number of genes

n = 100  # number of cells

df = DataFrame(gene = repeat('A':'J', outer=n), normal = rand(n * p), nresp = ones(n * p))

#lmm1 = lmm(@formula(nresp ~ 1 + (1|gene)), df);
lmm1 = LinearMixedModel(@formula(nresp ~ 1 +  normal + (1|gene)), df);
# simulate the linear mixed model response and fit it (as a sanity check)

refit!(simulate!(lmm1, β=[0.2, 1.5], σ=0.01, θ=[5.]))

#simulate the Logistic response
df[!, :counts] = rand.(Bernoulli.(GLM.linkinv.(canonicallink(Bernoulli()), response(lmm1))))
df

Unnamed: 0_level_0,gene,normal,nresp,counts
Unnamed: 0_level_1,Char,Float64,Float64,Bool
1,'A',0.0951588,1.0,0
2,'B',0.270898,1.0,0
3,'C',0.906315,1.0,0
4,'D',0.833585,1.0,1
5,'E',0.945055,1.0,1
6,'F',0.443669,1.0,1
7,'G',0.904577,1.0,1
8,'H',0.941598,1.0,0
9,'I',0.0375897,1.0,0
10,'J',0.996082,1.0,1


In [2]:
glmm1 = MixedModels.fit!(GeneralizedLinearMixedModel(@formula(counts ~ 1 + normal + (1|gene)), df, Bernoulli()), fast=true)

Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  counts ~ 1 + normal + (1 | gene)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 1138.5538

Variance components:
        Column     Variance     Std.Dev.  
gene (Intercept)  0.0019004487 0.043594136

 Number of obs: 1000; levels of grouping factors: 10

Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)  0.240532   0.135665  1.77299   0.0762
normal       1.52029    0.252689  6.01645   <1e-8
──────────────────────────────────────────────────

In [3]:
loglikelihood(glmm1)

-569.2768789333542

In [4]:
groups = unique(df[!, :gene])
n, p, m = length(groups), 1, 1
d = Bernoulli()
D = typeof(d)
gcs = Vector{GLMCopulaVCObs{Float64, D}}(undef, n)
for (i, grp) in enumerate(groups)
    gidx = df[!, :gene] .== grp
    ni = count(gidx)
    y = Float64.(df[gidx, :counts])
    normal = Float64.(df[gidx, :normal])
    X = [ones(ni, 1) normal]
    V = [ones(ni, ni)]
    gcs[i] = GLMCopulaVCObs(y, X, V, d)
end
gcm = GLMCopulaVCModel(gcs);

initialize_model!(gcm)
@show gcm.β

1 0.0 -626.5957471791157 39
2 -626.5957471791157 -571.7584788144428 9
3 -571.7584788144428 -569.2814939049927 9
4 -569.2814939049927 -569.2800318847973 9
5 -569.2800318847973 -569.2800318829738 9
gcm.β = [0.24007269498653974, 1.5211675941696152]


2-element Array{Float64,1}:
 0.24007269498653974
 1.5211675941696152

## MM update for Sigma

$$
  \begin{eqnarray}
\mathcal{L}(\sigma) = - \sum_i^n \ln \left(1 + \sum_k^m \sigma_k^2 t_{ik}\right) + \sum_i^n \ln \left(1 + \frac{1}{2} \sum_k^m \sigma_k^2 q_{ik}\right)
\end{eqnarray}
$$

where
$$
\begin{eqnarray*}
	t_{ik} &=& \frac 12 \text{tr}(\mathbf{V}_{ik}), \quad \mathbf{t}_i = (t_{i1}, \ldots, t_{im})^T \\
	q_{ik} &=& \frac{1}{2}  \mathbf{r}_i( \boldsymbol{\beta})^T \mathbf{V}_{ik}  \mathbf{r}_i (\boldsymbol{\beta}), \quad \mathbf{q}_i = (q_{i1}, \ldots, q_{im})^T.
\end{eqnarray*}
$$

where the $k^{th}$ element in the $m \times 1$ vector of variance components above is:

$$\begin{eqnarray*}
\sigma_{k}^{2^{(t+1)}} & = & \sigma_{k}^{2^{(t)}} \left(\frac{\sum_{i=1}^n
\frac{q_{ik}}{1+ q_i^{(t)}}}{\sum_{i=1}^n \frac{t_{ik}}{1+ t_i^{(t)}}} \right).
\end{eqnarray*}
$$

where $q_i^{(t)} = \sum_{k=1}^m \sigma_k^{2(t)} q_{ik}$ and $t_i^{(t)} = \sum_{k=1}^m \sigma_k^{2(t)} t_{ik}$. 


In this case $m = 1$ single variance component to model the random intercept. 

Initialize β and σ2, here I just copy the solution for β and σ2 from MixedModels.jl over

In [5]:
fill!(gcm.Σ, 1.0)
update_Σ!(gcm)
GeneralizedCopula.loglikelihood3!(gcm, true, true)

-569.178024831856

Closer Look at Observation 1
i =1

In [6]:
gc = gcm.data[1]
β  = gcm.β
Σ  = gcm.Σ
τ  = gcm.τ

@show β
@show Σ
@show τ

n_i  = length(gc.y)

β = [0.24007269498653974, 1.5211675941696152]
Σ = [0.002721018657053964]
τ = [1.0]


100

## Loglikelihood for observation i = 1, j in [1, n_1]
$$\mathcal{L}(\mathbf{\beta})_1 =  - \ln \Big[1\! +\! \frac{1}{2}tr(\mathbf{\Gamma_{1}})\Big] +
\ln \Big\{1\!+\!\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_1} \mathbf{r_1}(\mathbf{\beta})\Big\} +  \sum_{j=1}^{n_1}y_{1j}log(\mu_{1j}(\mathbf{\beta})) - \mu_{1j}(\mathbf{\beta})$$

### First I want to check if the mean and residuals are updated and standardized at this point

In [7]:
@test gc.η == gc.X*β                        # systematic linear component
@test gc.μ == exp.(gc.η)./(1 .+ exp.(gc.η)) # mu = ginverse of XB = mean component for GLM = [p]
@test gc.varμ == gc.μ .*(1 .- gc.μ)         # variance of the Bernoulli response as a function of mean mu [p(1-p)]
@test gc.res ≈ (gc.y - gc.μ)./sqrt.(gc.varμ)# standardized residual for GLM [(y - p)/sqrt(p(1-p))]

[32m[1mTest Passed[22m[39m

### Next I want to check if the hard coded terms in the loglikelihood are correct

$$\text{Term 1 }= - \ln \Big[1\! +\! \frac{1}{2}tr(\mathbf{\Gamma_{1}})\Big]$$

In [8]:
# term 1:
trace_gamma = Σ[1]*tr(gc.V[1])
@test trace_gamma ≈ n_i*Σ[1]

trace_gamma_half = trace_gamma/2
@show trace_gamma_half

term1 = -log(1 + trace_gamma_half)
@show term1;

trace_gamma_half = 0.1360509328526982
term1 = -0.12755815455154634


$$\text{Term 2} = \ln \Big\{1\!+\!\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_1} \mathbf{r_1}(\mathbf{\beta})\Big\}$$

In [9]:
# term 2:
quad_form_standardized_res_half = (Σ[1]*transpose(gc.res)*gc.V[1]*gc.res)/2
@show quad_form_standardized_res_half

term2 = log(1 + quad_form_standardized_res_half) 
@show term2

quad_form_standardized_res_half = 0.10027895248141495
term2 = 0.095563740819936


0.095563740819936

### In the loglikelihood function I have:

$$\text{Term1 + Term2} =  - \ln \Big[1\! +\! \frac{1}{2}tr(\mathbf{\Gamma_{1}})\Big] +
\ln \Big\{1\!+\!\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_1} \mathbf{r_1}(\mathbf{\beta})\Big\}$$

In [10]:
logl_hard_coded_obs1 = term1 + term2
copula_logl_function = GeneralizedCopula.copula_loglikelihood_addendum(gc, Σ)
@show logl_hard_coded_obs1
@show copula_logl_function
@test copula_logl_function ≈ logl_hard_coded_obs1

logl_hard_coded_obs1 = -0.03199441373161034
copula_logl_function = -0.03199441373161034


[32m[1mTest Passed[22m[39m

### Part of Loglikelihood that comes from the Density using GLM.jl

$$\text{Term 3} = \sum_{j=1}^{n_1}y_{1j}log(\mu_{1j}(\mathbf{\beta})) + (1 − y_{1j})log(1 − \mu_{1j}(\mathbf{\beta}))$$

In [11]:
function logistic_density(y, μ)
    logl = 0.0
    for j in 1:length(y)
        logl += y[j]*log(μ[j]) + (1 - y[j])*log(1 - μ[j])
    end
    logl
end

term3 = logistic_density(gc.y, gc.μ)

-64.39918956480813

In [12]:
logl_component_logistic = 0.0
logl_component_logistic += component_loglikelihood(gc, τ[1], logl_component_logistic)

-64.39918956480813

In [13]:
@test logl_component_logistic == term3

[32m[1mTest Passed[22m[39m

$$\mathcal{L}(\mathbf{\beta})_1 =  - \ln \Big[1\! +\! \frac{1}{2}tr(\mathbf{\Gamma_{1}})\Big] +
\ln \Big\{1\!+\!\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_1} \mathbf{r_1}(\mathbf{\beta})\Big\} +  \sum_{j=1}^{n_1}y_{1j}log(\mu_{1j}(\mathbf{\beta})) - \mu_{1j}(\mathbf{\beta})$$

In [14]:
logl_hard = term1 + term2 + term3

-64.43118397853974

In [15]:
function copula_loglikelihood(gc::Union{GLMCopulaVCObs{T, D}, GaussianCopulaVCObs{T, D}}, β::Vector{T}, τ::T,
    Σ::Vector{T}) where {T<: BlasReal, D}
  # first get the loglikelihood from the component density with glm.jl
  logl = 0.0
  update_res!(gc, β)
  standardize_res!(gc)
    logl += GeneralizedCopula.copula_loglikelihood_addendum(gc, Σ)
    logl += GeneralizedCopula.component_loglikelihood(gc, τ, zero(T))
  logl
end

logl_functions = copula_loglikelihood(gc, β, τ[1], Σ)

-64.43118397853974

In [16]:
@test logl_hard == logl_functions

[32m[1mTest Passed[22m[39m

# A Closer Look at the Gradient for observation i=1

$$\begin{eqnarray*}
\nabla_\beta &=& \sum_{i=1}^n \sum_j \nabla \ln f_{ij}(y_{ij} \mid \mathbf{\beta}) + \sum_{i=1}^n
\frac{\nabla \mathbf{r_i(\mathbf{\beta})}\mathbf{\Gamma_i}\mathbf{r_i(\mathbf{\beta})}}{1+\frac{1}{2}\mathbf{r_i}(\mathbf{\beta})^t \mathbf{\Gamma_i} \mathbf{r_i(\mathbf{\beta})}}
\end{eqnarray*}
$$

The gradient is made of two terms. The first is from the GLM component loglikelihood that corresponds to the Logistic Regression density. The second part is specific to our copula model. We start with Term 1 for observation 1:

$$\begin{eqnarray*}
    \text{Term 1} &=& \sum_{j=1}^{n_1} \frac{(y_{1j}-\mu_{1j}) \mu_{1j}'(\eta_{1j})}{\sigma_{1j}^2} \mathbf{x}_{1j}= \mathbf{X_1}^T \mathbf{W_{11}}(\mathbf{Y_1}-\boldsymbol{\mu_1}) 
\end{eqnarray*}
$$

We will check if the field $\mu_{1j}'$ or `mueta` from the GLM.jl package matches our theoretical value

In [17]:
# these are slightly off by small decimals
@test gc.dμ ≈ exp.(gc.η)./(1 .+ exp.(gc.η)).^2        # derivative of mean with respect to systematic component

[32m[1mTest Passed[22m[39m

In [18]:
function logistic_gradient(y, X, dμ, σ2, μ)
    grad = zeros(size(X, 2))
    for j in 1:length(y)
        grad += (y[j] - μ[j])*dμ[j]/σ2[j] * X[j, :]
    end
    grad
end

term1_gradient = logistic_gradient(gc.y, gc.X, gc.dμ, gc.varμ, gc.μ)

2-element Array{Float64,1}:
 -3.1486382135439355
 -3.5362506402911693

In [19]:
transpose(gc.X)*(gc.y - gc.μ)

2-element Array{Float64,1}:
 -3.1486382135439337
 -3.5362506402911693

In [20]:
term1_grad_fctn = GeneralizedCopula.glm_gradient(gc, β, τ) 

2-element Array{Float64,1}:
 -3.1486382135439355
 -3.53625064029117

### Copula density specific gradient portion

$$ \text{Term 2} = \frac{\nabla \mathbf{r_1(\mathbf{\beta})}^\top\mathbf{\Gamma_1}\mathbf{r_1(\mathbf{\beta})}}{1+\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_i} \mathbf{r_1(\mathbf{\beta})}}
$$

Notice the second term uses a critical value that will come up in the Hessian as well. For the first observation, $i = 1,$ we have $n_1 = 316$ and $p = 2$ in the VerbAgg dataset. $\nabla \mathbf{r_1(\mathbf{\beta})}$ which is an $316 \times 2$ matrix of differentials. 

Each column of $\nabla \mathbf{r_1(\mathbf{\beta})}^\top$ is a $p \times 1$ vector $\nabla r_{1j}(\mathbf{\beta})$

$$
\begin{eqnarray*}
\nabla r_{1j}(\mathbf{\beta}) &=&  -\frac{1}{\sigma_{1j}(\mathbf{\beta})} \nabla \mu_{1j}(\mathbf{\beta})- \frac{1}{2} \frac{y_{1j}-\mu_{1j}(\mathbf{\beta})}{\sigma_{1j}^3(\mathbf{\beta})} \nabla \sigma_{1j}^2(\mathbf{\beta})\\
&=&  -\frac{1}{\sigma_{1j}(\mathbf{\beta})} \nabla \mu_{1j}(\mathbf{\beta})- \frac{1}{2\sigma_{1j}^2(\mathbf{\beta})}r_{1j}(\mathbf{\beta}) \nabla \sigma_{1j}^2(\mathbf{\beta})
\end{eqnarray*}
$$


where 

$$
\begin{eqnarray*}
\nabla \mu_{1j}(\mathbf{\beta}) &=& \frac{d\mu_{1j}(\mathbf{\beta})}{d\eta_{1j}(\mathbf{\beta})} \frac{d\eta_{1j}(\mathbf{\beta})}{d\mathbf{\beta}} = \frac{(1+e^{\eta_{1j}(\mathbf{\beta})})e^{\eta_{1j}(\mathbf{\beta})} -  e^{2\eta_{1j}(\mathbf{\beta})}}{(1 + e^{\eta_{1j}(\mathbf{\beta})})^2}* \mathbf{x_{1j}} \\
&=& \frac{ e^{\eta_{1j}(\mathbf{\beta})}}{(1 + e^{\eta_{1j}(\mathbf{\beta})})^2} * \mathbf{x_{1j}}
\end{eqnarray*}
$$

In [21]:
# for j = 1 and j = 2, ..., j = end; lets take a look at the first two columns 
∇μβ1 = exp.(gc.η[1]) / (1 + exp.(gc.η[1]))^2 .* gc.X[1, :]
∇μβ2 = exp.(gc.η[2]) / (1 + exp.(gc.η[2]))^2 .* gc.X[2, :]
# ...
∇μβend = exp.(gc.η[end]) / (1 + exp.(gc.η[end]))^2 .* gc.X[end, :]

@show ∇μβ1
@show ∇μβ2
@show ∇μβend

∇μβ1 = [0.2409680921859315, 0.022930235776909426]
∇μβ2 = [0.2285602950344211, 0.054584020645153025]
∇μβend = [0.12625431142102753, 0.1252044789777172]


2-element Array{Float64,1}:
 0.12625431142102753
 0.1252044789777172

In [22]:
∇μη = exp.(gc.η) ./ (1 .+ exp.(gc.η)).^2 
@show gc.varμ ≈ ∇μη
@test gc.dμ ≈ gc.varμ
∇ηβ = gc.X

∇μβ = transpose(∇ηβ)*Diagonal(∇μη)

gc.varμ ≈ ∇μη = true


2×100 Array{Float64,2}:
 0.240968   0.22856   0.23924    0.127628  …  0.211891   0.225209   0.126254
 0.0229302  0.054584  0.0284615  0.125273     0.0814239  0.0609794  0.125204

and  
$$    \begin{eqnarray*}
    \nabla \sigma_{1j}^2(\mathbf{\beta}) &=&
    \frac{d\sigma_{1j}^2(\mathbf{\beta})}{d\mu_{1j}(\mathbf{\beta})} \frac{d\mu_{1j}(\mathbf{\beta})}{d\eta_{1j}(\mathbf{\beta})} \frac{d\eta_{1j}(\mathbf{\beta})}{d\mathbf{\beta}} =
    [1 - 2\mu_{1j}(\mathbf{\beta})]* \frac{ e^{\eta_{ij}(\mathbf{\beta})}}{(1 + e^{\eta_{1j}(\mathbf{\beta})})^2} * \mathbf{x_{1j}}\\
    &=& \frac{1-e^{\eta_{1j}}}{1 + e^{\eta_{1j}}} *\frac{ e^{\eta_{1j}(\mathbf{\beta})}}{(1 + e^{\eta_{1j}(\mathbf{\beta})})^2} * \mathbf{x_{1j}}\\
    &=& \frac{e^{\eta_{1j}(\mathbf{\beta})}(1 -  e^{\eta_{1j}(\mathbf{\beta})})}{(1 + e^{\eta_{1j}(\mathbf{\beta})})^3} * \mathbf{x_{1j}}
\end{eqnarray*}$$

In [23]:
# for j = 1 and j = 2 ,... , j = end; lets take a look at the first two columns 
∇σ2β1 = (1 - 2*gc.μ[1]) * gc.dμ[1] .* gc.X[1, :]
∇σ2β2 = (1 - 2*gc.μ[2]) * gc.dμ[2] .* gc.X[2, :]
# ...
∇σ2βend = (1 - 2*gc.μ[end]) * gc.dμ[end] .* gc.X[end, :]

@show ∇σ2β1
@show ∇σ2β2
@show ∇σ2βend

∇σ2β = transpose(gc.X)* Diagonal((1 .- 2*gc.μ) .* gc.dμ)

∇σ2β1 = [-0.04580145587723626, -0.004358411824003551]
∇σ2β2 = [-0.06693298382633617, -0.015984715851317114]
∇σ2βend = [-0.08882623442218783, -0.08808762469342755]


2×100 Array{Float64,2}:
 -0.0458015   -0.066933   -0.0496338   …  -0.0827287  -0.0709188  -0.0888262
 -0.00435841  -0.0159847  -0.00590476     -0.0317904  -0.0192025  -0.0880876

$$
\begin{eqnarray*}
\nabla r_{1j}(\mathbf{\beta}) &=&  -\frac{1}{\sigma_{1j}(\mathbf{\beta})} \nabla \mu_{1j}(\mathbf{\beta})- \frac{1}{2} \frac{y_{1j}-\mu_{1j}(\mathbf{\beta})}{\sigma_{1j}^3(\mathbf{\beta})} \nabla \sigma_{1j}^2(\mathbf{\beta})\\
&=&  -\frac{1}{\sigma_{1j}(\mathbf{\beta})} \nabla \mu_{1j}(\mathbf{\beta})- \frac{1}{2\sigma_{1j}^2(\mathbf{\beta})}r_{1j}(\mathbf{\beta}) \nabla \sigma_{1j}^2(\mathbf{\beta})
\end{eqnarray*}
$$


In [24]:
# for j =1 
∇resβ1 = -1/(sqrt(gc.varμ[1])) .* ∇μβ1 - ((1/2gc.varμ[1]) * gc.res[1]) .* ∇σ2β1
∇resβ2 = -1/(sqrt(gc.varμ[2])) .* ∇μβ2 - ((1/2gc.varμ[2]) * gc.res[2]) .* ∇σ2β2
# ...
∇resβend = -1/(sqrt(gc.varμ[end])) .* ∇μβend - ((1/2gc.varμ[end]) * gc.res[end]) .* ∇σ2βend

@show ∇resβ1
@show ∇resβ2
@show ∇resβend


∇resβ1 = [-0.6060852734326417, -0.05767435055260186]
∇resβ2 = [-0.676061856683234, -0.16145487709070677]
∇resβend = [-1.198593207512256, -1.1886266406567167]


2-element Array{Float64,1}:
 -1.198593207512256
 -1.1886266406567167

In [25]:
update_res!(gc, β)
standardize_res!(gc)
std_res_differential!(gc)

In [26]:
@test gc.∇resβ[1, :] ≈ ∇resβ1

[32m[1mTest Passed[22m[39m

### Gradient portion of Copula specific model

$$\begin{eqnarray*}
\text{Term 2} &=& \sum_{i=1}^n
\frac{\nabla \mathbf{r_i(\mathbf{\beta})}\mathbf{\Gamma_i}\mathbf{r_i(\mathbf{\beta})}}{1+\frac{1}{2}\mathbf{r_i}(\mathbf{\beta})^t \mathbf{\Gamma_i} \mathbf{r_i(\mathbf{\beta})}}
\end{eqnarray*}
$$

Again for observation $ i = 1$ we have:  

$$ \text{Term 2} = \frac{\nabla \mathbf{r_1(\mathbf{\beta})}^\top\mathbf{\Gamma_1}\mathbf{r_1(\mathbf{\beta})}}{1+\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_i} \mathbf{r_1(\mathbf{\beta})}}
$$

In [27]:
Γ1 = Σ[1]*gc.V[1]

grad_t2_numerator = transpose(gc.∇resβ) * Γ1 * gc.res       # new term ∇resβ^t * Γ * res
@show grad_t2_numerator

quadratic_form_half = (transpose(gc.res) * Γ1 * gc.res)/2
@show quadratic_form_half 
@test quadratic_form_half ≈ quad_form_standardized_res_half # from the loglikelihood 'qsum'

grad_t2_denominator = inv(1 + quadratic_form_half)
@show grad_t2_denominator

gradient_term2 = grad_t2_numerator * grad_t2_denominator

grad_t2_numerator = [1.1124618538057356, 0.510843346495259]
quadratic_form_half = 0.10027895248141497
grad_t2_denominator = 0.908860428298424


2-element Array{Float64,1}:
 1.0110725569155394
 0.46428530268908125

In [28]:
gradient_term2_function = GeneralizedCopula.copula_gradient_addendum(gc, β, τ[1], Σ)

2-element Array{Float64,1}:
 1.0110725569155394
 0.46428530268908147

In [29]:
@test gradient_term2 ≈ gradient_term2_function

[32m[1mTest Passed[22m[39m

In [30]:
gradient_hard_code = term1_gradient + gradient_term2

2-element Array{Float64,1}:
 -2.137565656628396
 -3.071965337602088

In [31]:
function copula_gradient(gc::GLMCopulaVCObs{T, D}, β, τ, Σ)  where {T<:BlasReal, D}
    fill!(gc.∇β, 0.0)
    gc.∇β .= GeneralizedCopula.glm_gradient(gc, β, τ) .+ GeneralizedCopula.copula_gradient_addendum(gc, β, τ[1], Σ)
end

copula_gradient (generic function with 1 method)

In [32]:
full_gradient_function = copula_gradient(gc, β, τ, Σ)

2-element Array{Float64,1}:
 -2.137565656628396
 -3.0719653376020886

In [33]:
@test full_gradient_function ≈ gradient_hard_code

[32m[1mTest Passed[22m[39m

In [34]:
@show loglikelihood3!(gcm, true, true)

loglikelihood3!(gcm, true, true) = -569.178024831856


-569.178024831856

In [35]:
@show loglikelihood(glmm1) 

loglikelihood(glmm1) = -569.2768789333542


-569.2768789333542

## Let's now use the ForwardDiff.jl package to check if our matrix calculus is correct.

I want to start by checking my calculation of the gradient. 

    (1) I will modify the functions that reflect parts of the loglikelihood above in section 1 to use the package properly. 
    (2) I will then compare the results to that from my gradient functions above in section 2 


In [36]:
gc = gcm.data[1]
β  = gcm.β
Σ  = gcm.Σ
τ  = gcm.τ

@show β
@show Σ
@show τ

n_i  = length(gc.y)

update_res!(gc, β)

β = [0.24007269498653974, 1.5211675941696152]
Σ = [0.002721018657053964]
τ = [1.0]


100-element Array{Float64,1}:
 -0.595036349961836
 -0.6464230342725452
  0.3962676943522152
  0.1501828652115309
 -0.7634550747052807
 -0.6948180074313923
  0.4009137339610708
  0.150650372983696
  0.3359999259306664
  0.43361527127351573
 -0.6702281848040941
 -0.6687389940639799
 -0.8176259038216608
  ⋮
  0.2785824293065273
  0.38053739157712396
  0.4201936089229257
 -0.7255552104349672
  0.28927631401067855
  0.32959154945645686
 -0.6233760412963762
  0.43815890444333794
  0.15530085275777494
 -0.6952152038815471
 -0.6574510607320434
 -0.8517750539463713

###  Test if the gradient of the component loglikelihood matches our gradient function

In [37]:
function logistic_density2(β::Vector)
    η = gc.X*β                        # systematic linear component
    μ = exp.(η)./(1 .+ exp.(η)) # mu = ginverse of XB = mean component for GLM = [p]
    dμ = exp.(η)./(1 .+ exp.(η)).^2
    varμ = dμ
    logl = sum(gc.y .* log.(μ) .+ (1 .- gc.y).*log.(1 .- μ))
end

logl_term3 = logistic_density2(β)
@show logl_term3

g = x -> ForwardDiff.gradient(logistic_density2, x)

gradientmagictest = g(β)
@show gradientmagictest

@test term1_grad_fctn ≈ gradientmagictest

logl_term3 = -64.39918956480815
gradientmagictest = [-3.148638213543933, -3.5362506402911675]


[32m[1mTest Passed[22m[39m

# Now we check the gradient of the matrix of differentials residual vector

Lets start with $i = 1, j = 1$ so the first observation of the $i^{th}$ group has the following gradient of the standardized residual.

In [38]:
function standardized_residual_firstobs(β::Vector)
    η = gc.X*β                        # systematic linear component
    μ = exp.(η)./(1 .+ exp.(η)) # mu = ginverse of XB = mean component for GLM = [p]
    varμ = exp.(η)./(1 .+ exp.(η)).^2
    res = (gc.y[1] - μ[1]) / sqrt(varμ[1])
end

g2 = x -> ForwardDiff.gradient(standardized_residual_firstobs, x)
gradientmagictest2 = g2(β)
@show gradientmagictest2

@test gc.∇resβ[1, :] ≈ gradientmagictest2

gradientmagictest2 = [-0.6060852734326414, -0.057674350552601825]


[32m[1mTest Passed[22m[39m

## Now I will check the part of the Loglikelihood that is specific to our density

In [39]:
function copula_loglikelihood_addendum1(β::Vector)
  m = length(gc.V)
  η = gc.X*β                        # systematic linear component
  μ = exp.(η)./(1 .+ exp.(η)) # mu = ginverse of XB = mean component for GLM = [p]
  varμ = exp.(η)./(1 .+ exp.(η)).^2
  res = (gc.y .- μ) ./ sqrt.(varμ)
  trace_gamma = Σ[1]*tr(gc.V[1])
  trace_gamma_half = trace_gamma/2

  term1 = -log(1 + trace_gamma_half) # -1.252762968495368
  quad_form_standardized_res_half = (Σ[1]*transpose(res)*gc.V[1]*res)/2
  term2 = log(1 + quad_form_standardized_res_half) # 0.0381700599136237
  logl_hard_coded_obs1 = term1 + term2
  logl_hard_coded_obs1
end

g3 = x -> ForwardDiff.gradient(copula_loglikelihood_addendum1, x)

@show copula_loglikelihood_addendum1(β) 

gradientmagictest3 = g3(β)
@show gradientmagictest3

@test gradient_term2_function ≈ gradientmagictest3

copula_loglikelihood_addendum1(β) = -0.03199441373161034
gradientmagictest3 = [1.0110725569155394, 0.46428530268908125]


[32m[1mTest Passed[22m[39m

## Now I will put together both the parts of the loglikelihood and both the parts of the gradient to check alltogether now

In [40]:
function full_loglikelihood(β::Vector)
    logl = 0.0
    logl = logistic_density2(β) + copula_loglikelihood_addendum1(β)
    logl
end

@show full_loglikelihood(β)
@test logl_functions ≈ full_loglikelihood(β)

g4 = x -> ForwardDiff.gradient(full_loglikelihood, x)

gradientmagictest4 = g4(β)
@show gradientmagictest4

full_gradient_function = copula_gradient(gc, β, τ, Σ)
@test full_gradient_function ≈ gradientmagictest4

full_loglikelihood(β) = -64.43118397853976
gradientmagictest4 = [-2.1375656566283934, -3.0719653376020863]


[32m[1mTest Passed[22m[39m

In [41]:
@show loglikelihood3!(gcm, true, true)
@show gcm.∇β

loglikelihood3!(gcm, true, true) = -569.178024831856
gcm.∇β = [0.8868746581526032, 0.5097552048265013]


2-element Array{Float64,1}:
 0.8868746581526032
 0.5097552048265013

In [42]:
@time fit2!(gcm, IpoptSolver(print_level = 5, max_iter = 25, derivative_test = "first-order", hessian_approximation = "limited-memory"))


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Starting derivative checker for first derivatives.

* grad_f[          1] = -7.1481473871480591e+02    ~ -2.2083320040581098e+03  [ 6.763e-01]
* grad_f[          2] = -3.8837192386906469e+02    ~ -1.5388133397209458e+04  [ 9.748e-01]

Derivative checker detected 2 error(s).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0



GLMCopulaVCModel{Float64,Bernoulli{Float64}}(GLMCopulaVCObs{Float64,Bernoulli{Float64}}[GLMCopulaVCObs{Float64,Bernoulli{Float64}}([0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0], [1.0 0.09515880533766441; 1.0 0.23881672289989253; … ; 1.0 0.2707677310894794; 1.0 0.9916847794622285], [[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]], [-2.180722203335005, -3.0972225588765303], [-0.6071068614699008 -0.057771563649774677; -0.6775763076804018 -0.16181655331484282; … ; -0.6943295433093811 -0.18800203507027552; -1.2047676293103786 -1.1947497207758948], [0.0], [2.5945152577e-314], [-20.819020437329797 -8.102088512052198; -8.571514640180354 -5.009603856102177], [2.579252384e-314], [-1.2142137229398013, -1.3551526153608031, 0.8087250237814103, 0.41824710029533574, -1.8031500654157, -1.5131053759632052, 0.8166403961176865, 0.4190167336104238, 0.7096237673384648, 0.8737648728273818  …  0.8500039431363726, -1.6

In [43]:
@show loglikelihood3!(gcm, true, true)
@show gcm.∇β

loglikelihood3!(gcm, true, true) = -569.1749163635732
gcm.∇β = [-4.2159520319273724e-10, 7.840386118118658e-10]


2-element Array{Float64,1}:
 -4.2159520319273724e-10
  7.840386118118658e-10

In [44]:
loglikelihood(glmm1)

-569.2768789333542

In [45]:
gcm = GLMCopulaVCModel(gcs);

initialize_model!(gcm)
@show gcm.β
fill!(gcm.Σ, 1.0)
update_Σ!(gcm)
GeneralizedCopula.loglikelihood3!(gcm, true, true)
@time fit2!(gcm, IpoptSolver(print_level = 5, max_iter = 25, derivative_test = "first-order"))

1 0.0 -626.5957471791157 39
2 -626.5957471791157 -571.7584788144428 9
3 -571.7584788144428 -569.2814939049927 9
4 -569.2814939049927 -569.2800318847973 9
5 -569.2800318847973 -569.2800318829738 9
gcm.β = [0.24007269498653974, 1.5211675941696152]
This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Starting derivative checker for first derivatives.

* grad_f[          1] = -7.1481473871480591e+02    ~ -2.2083320040581098e+03  [ 6.763e-01]
* grad_f[          2] = -3.8837192386906469e+02    ~ -1.5388133397209458e+04  [ 9.748e-01]

Derivative checker detected 2 error(s).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        2
                     variables with only lower bounds:        0
              

GLMCopulaVCModel{Float64,Bernoulli{Float64}}(GLMCopulaVCObs{Float64,Bernoulli{Float64}}[GLMCopulaVCObs{Float64,Bernoulli{Float64}}([0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0], [1.0 0.09515880533766441; 1.0 0.23881672289989253; … ; 1.0 0.2707677310894794; 1.0 0.9916847794622285], [[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]], [-2.180722203335005, -3.0972225588765303], [-0.6071068614699008 -0.057771563649774677; -0.6775763076804018 -0.16181655331484282; … ; -0.6943295433093811 -0.18800203507027552; -1.2047676293103786 -1.1947497207758948], [0.0], [2.5945152577e-314], [-20.819020437329797 -8.102088512052198; -8.571514640180354 -5.009603856102177], [2.579252384e-314], [-1.2142137229398013, -1.3551526153608031, 0.8087250237814103, 0.41824710029533574, -1.8031500654157, -1.5131053759632052, 0.8166403961176865, 0.4190167336104238, 0.7096237673384648, 0.8737648728273818  …  0.8500039431363726, -1.6

In [46]:
@show loglikelihood3!(gcm, true, true)
@show gcm.∇β

loglikelihood3!(gcm, true, true) = -569.1749163635732
gcm.∇β = [-4.2159520319273724e-10, 7.840386118118658e-10]


2-element Array{Float64,1}:
 -4.2159520319273724e-10
  7.840386118118658e-10