# Form the Normal Regression Random Intercept Model


In this notebook we will use the dyestuff dataset from MixedModels to test the fit of our copula model

For a single observation, $i = 1$ we will check the following quantities:

    (1) Loglikelihood

    (2) Gradient with respect to Beta

In [29]:
using RDatasets, Test, GLM, LinearAlgebra, GeneralizedCopula
using LinearAlgebra: BlasReal, copytri!
# Dataframe with columns: Batch (Categorical), Yield (Int32)
dyestuff = dataset("lme4", "Dyestuff")
groups = unique(dyestuff[!, :Batch])
n, p, m = length(groups), 1, 1
d = Normal()
D = typeof(d)
gcs = Vector{GLMCopulaVCObs{Float64, D}}(undef, n)
for (i, grp) in enumerate(groups)
    gidx = dyestuff[!, :Batch] .== grp
    ni = count(gidx)
    y = Float64.(dyestuff[gidx, :Yield])
    X = ones(ni, 1)
    V = [ones(ni, ni)]
    gcs[i] = GLMCopulaVCObs(y, X, V, d)
end
gcm = GLMCopulaVCModel(gcs);

Initialize β and σ2, here I just copy the solution for β and σ2 from MixedModels.jl over

In [30]:
# initialize β and τ from least square solution
@info "Initial point:"
initialize_model!(gcm);
#gcm.β .= [1527.4999999999998]
@show gcm.β
# update σ2 and τ from β using the MM algorithm
fill!(gcm.Σ, 1)
# update_Σ!(gcm, 500, 1e-6, GurobiSolver(OutputFlag=0), true)
update_Σ_jensen!(gcm)
@show gcm.τ
@show gcm.Σ;

gcm.β = [1527.4999999999998]
gcm.τ = [0.00033164503710934656]
gcm.Σ = [0.8636149140056555]


┌ Info: Initial point:
└ @ Main In[30]:2


Closer Look at Observation 1
i =1

In [31]:
gc = gcm.data[1]
β  = gcm.β
Σ  = gcm.Σ
τ  = gcm.τ

n_i  = length(gc.y)

5

In [32]:
loglikelihood2!(gcm, true, true)

-164.00082379403617

## Loglikelihood for observation i = 1, j in [1, n_1]
$$\mathcal{L}(\mathbf{beta})_1 =  - \ln \Big[1\! +\! \frac{1}{2}tr(\mathbf{\Gamma_{1}})\Big] +
    \ln \Big\{1\!+\!\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_1} \mathbf{r_1}(\mathbf{\beta})\Big\} +  \sum_{j=1}^{n_1} N(\mu_j , \sigma_j ^ 2)$$

### First I want to check if the mean and residuals are updated and standardized at this point

In [33]:
@test gc.η == gc.X*β                        # systematic linear component
@test gc.μ == gc.η                          # mu = ginverse of XB = mean component for GLM = [p]
@show gc.varμ
@show gc.dμ
@test gc.res ≈ (gc.y - gc.μ)                # not standardized for normal

gc.res .*= sqrt(τ[1])                       # standardized residual for GLM
@show gc.res 

gc.varμ = [1.0, 1.0, 1.0, 1.0, 1.0]
gc.dμ = [1.0, 1.0, 1.0, 1.0, 1.0]
gc.res = [0.31869466988755873, -1.5934733494377689, -1.5934733494377689, -0.13658342995180497, 0.9560840096626679]


5-element Array{Float64,1}:
  0.31869466988755873
 -1.5934733494377689
 -1.5934733494377689
 -0.13658342995180497
  0.9560840096626679

### Next I want to check if the hard coded terms in the loglikelihood are correct

$$\text{Term 1 }= - \ln \Big[1\! +\! \frac{1}{2}tr(\mathbf{\Gamma_{1}})\Big]$$

In [34]:
# term 1:
trace_gamma = Σ[1]*tr(gc.V[1])
@test trace_gamma ≈ n_i*Σ[1]

trace_gamma_half = trace_gamma/2
@show trace_gamma_half

term1 = -log(1 + trace_gamma_half)
@show term1;

trace_gamma_half = 2.1590372850141386
term1 = -1.150267324540463


$$\text{Term 2} = \ln \Big\{1\!+\!\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_1} \mathbf{r_1}(\mathbf{\beta})\Big\}$$

In [35]:
# term 2:
quad_form_standardized_res_half = (Σ[1]*transpose(gc.res)*gc.V[1]*gc.res)/2
@show quad_form_standardized_res_half

term2 = log(1 + quad_form_standardized_res_half) 
@show term2

quad_form_standardized_res_half = 1.812461063788311
term2 = 1.0340599234483314


1.0340599234483314

### In the loglikelihood function I have:

$$\text{Term1 + Term2} =  - \ln \Big[1\! +\! \frac{1}{2}tr(\mathbf{\Gamma_{1}})\Big] +
\ln \Big\{1\!+\!\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_1} \mathbf{r_1}(\mathbf{\beta})\Big\}$$

In [36]:
logl_hard_coded_obs1 = term1 + term2
copula_logl_function = GeneralizedCopula.copula_loglikelihood_addendum(gc, Σ)

@show logl_hard_coded_obs1
@show copula_logl_function

@test copula_logl_function == logl_hard_coded_obs1

logl_hard_coded_obs1 = -0.11620740109213168
copula_logl_function = -0.11620740109213168


[32m[1mTest Passed[22m[39m

### Part of Loglikelihood that comes from the Density using GLM.jl

$$\text{Term 3}$$

In [37]:
logl_component_normal = 0.0
logl_component_normal += component_loglikelihood(gc, τ[1], 0.0)

-27.67962227699931

In [38]:
function copula_loglikelihoods(gc::Union{GLMCopulaVCObs{T, D}, GaussianCopulaVCObs{T, D}}, β::Vector{T}, τ::T,
    Σ::Vector{T}) where {T<: Real, D}
  # first get the loglikelihood from the component density with glm.jl
  logl = 0.0
  update_res!(gc, β)
    if GLM.dispersion_parameter(gc.d)
        fill!(gc.varμ, inv(τ))
    end
    standardize_res!(gc)
    logl += GeneralizedCopula.copula_loglikelihood_addendum(gc, Σ)
    logl += GeneralizedCopula.component_loglikelihood(gc, τ, zero(T))
  logl
end

logl_functions = copula_loglikelihoods(gc, β, τ[1], Σ)

@test logl_functions == -27.795829678091444

[32m[1mTest Passed[22m[39m

# A Closer Look at the Gradient for observation i=1

$$\begin{eqnarray*}
\nabla_\beta &=& \sum_{i=1}^n \sum_j \nabla \ln f_{ij}(y_{ij} \mid \mathbf{\beta}) + \sum_{i=1}^n
\frac{\nabla \mathbf{r_i(\mathbf{\beta})}\mathbf{\Gamma_i}\mathbf{r_i(\mathbf{\beta})}}{1+\frac{1}{2}\mathbf{r_i}(\mathbf{\beta})^t \mathbf{\Gamma_i} \mathbf{r_i(\mathbf{\beta})}}
\end{eqnarray*}
$$

The gradient is made of two terms. The first is from the GLM component loglikelihood that corresponds to the Logistic Regression density. The second part is specific to our copula model. We start with Term 1 for observation 1:

$$\begin{eqnarray*}
    \text{Term 1} &=& \sum_{j=1}^{n_1} \frac{(y_{1j}-\mu_{1j}) \mu_{1j}'(\eta_{1j})}{\sigma_{1j}^2} \mathbf{x}_{1j}= \mathbf{X_1}^T \mathbf{W_{11}}(\mathbf{Y_1}-\boldsymbol{\mu_1}) 
\end{eqnarray*}
$$

We will check if the field $\mu_{1j}'$ or `mueta` from the GLM.jl package matches our theoretical value

In [39]:
term1_grad_fctn = GeneralizedCopula.glm_gradient(gc, β, τ) 

1-element Array{Float64,1}:
 -0.037310066674801114

In [40]:
score = transpose(gc.X)*(gc.y - gc.μ)
@show score
@show score * τ[1]

score = [-112.49999999999886]
score * τ[1] = [-0.037310066674801114]


1-element Array{Float64,1}:
 -0.037310066674801114

### Copula density specific gradient portion

$$ \text{Term 2} = \frac{\nabla \mathbf{r_1(\mathbf{\beta})}^\top\mathbf{\Gamma_1}\mathbf{r_1(\mathbf{\beta})}}{1+\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_i} \mathbf{r_1(\mathbf{\beta})}}
$$

Notice the second term uses a critical value that will come up in the Hessian as well. For the first observation, $i = 1,$ we have $n_1 = 316$ and $p = 2$ in the VerbAgg dataset. $\nabla \mathbf{r_1(\mathbf{\beta})}$ which is an $316 \times 2$ matrix of differentials. 

Each column of $\nabla \mathbf{r_1(\mathbf{\beta})}^\top$ is a $p \times 1$ vector $\nabla r_{1j}(\mathbf{\beta})$

$$
\begin{eqnarray*}
\nabla r_{1j}(\mathbf{\beta}) &=&  -\frac{1}{\sigma_{1j}(\mathbf{\beta})} \nabla \mu_{1j}(\mathbf{\beta})- \frac{1}{2} \frac{y_{1j}-\mu_{1j}(\mathbf{\beta})}{\sigma_{1j}^3(\mathbf{\beta})} \nabla \sigma_{1j}^2(\mathbf{\beta})\\
&=&  -\frac{1}{\sigma_{1j}(\mathbf{\beta})} \nabla \mu_{1j}(\mathbf{\beta})- \frac{1}{2\sigma_{1j}^2(\mathbf{\beta})}r_{1j}(\mathbf{\beta}) \nabla \sigma_{1j}^2(\mathbf{\beta})
\end{eqnarray*}
$$


where 
$$    \begin{eqnarray*}
    \nabla \mu_{1j}(\mathbf{\beta}) &=& \frac{d\mu_{1j}(\mathbf{\beta})}{d\eta_{1j}(\mathbf{\beta})} \frac{d\eta_{1j}(\mathbf{\beta})}{d\mathbf{\beta}} = -1 * \mathbf{x_{1j}} \\
    &=& - \mathbf{x_{1j}}
\end{eqnarray*}$$

In [41]:
# for j = 1 and j = 2, ..., j = end; lets take a look at the first two columns 
∇μβ1 = gc.X[1, :]
∇μβ2 = gc.X[2, :]
# ...
∇μβend = gc.X[end, :]

@show ∇μβ1
@show ∇μβ2
@show ∇μβend

∇μβ1 = [1.0]
∇μβ2 = [1.0]
∇μβend = [1.0]


1-element Array{Float64,1}:
 1.0

In [42]:
∇μη = ones(n_i)
∇ηβ = gc.X

∇μβ = transpose(∇ηβ)*Diagonal(∇μη)

1×5 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0

and  
$$    \begin{eqnarray*}
    \nabla \sigma_{1j}^2(\mathbf{\beta}) &=&
    \frac{d\sigma_{1j}^2(\mathbf{\beta})}{d\mu_{1j}(\mathbf{\beta})} \frac{d\mu_{1j}(\mathbf{\beta})}{d\eta_{1j}(\mathbf{\beta})} \frac{d\eta_{1j}(\mathbf{\beta})}{d\mathbf{\beta}} =
    0 * - \mathbf{x_{1j}}\\
\end{eqnarray*}$$

In [43]:
# for j = 1 and j = 2 ,... , j = end; lets take a look at the first two columns 
∇σ2β1 = 0 .* gc.X[1, :]
∇σ2β2 = 0 .* gc.X[2, :]
# ...
∇σ2βend = 0 .* gc.X[end, :]

@show ∇σ2β1
@show ∇σ2β2
@show ∇σ2βend

∇σ2β = transpose(gc.X)* Diagonal(zeros(n_i))

∇σ2β1 = [0.0]
∇σ2β2 = [0.0]
∇σ2βend = [0.0]


1×5 Array{Float64,2}:
 0.0  0.0  0.0  0.0  0.0

In [44]:
gc = std_res_differential!(gc)
gc.∇resβ

5×1 Array{Float64,2}:
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0

### Gradient portion of Copula specific model

$$\begin{eqnarray*}
\text{Term 2} &=& \sum_{i=1}^n
\frac{\nabla \mathbf{r_i(\mathbf{\beta})}\mathbf{\Gamma_i}\mathbf{r_i(\mathbf{\beta})}}{1+\frac{1}{2}\mathbf{r_i}(\mathbf{\beta})^t \mathbf{\Gamma_i} \mathbf{r_i(\mathbf{\beta})}}
\end{eqnarray*}
$$

Again for observation $ i = 1$ we have:  

$$ \text{Term 2} = \frac{\nabla \mathbf{r_1(\mathbf{\beta})}^\top\mathbf{\Gamma_1}\mathbf{r_1(\mathbf{\beta})}}{1+\frac{1}{2}\mathbf{r_1}(\mathbf{\beta})^t \mathbf{\Gamma_i} \mathbf{r_1(\mathbf{\beta})}}
$$

In [45]:
Γ1 = Σ[1]*gc.V[1]

grad_t2_numerator = transpose(gc.∇resβ) * Γ1 * gc.res       # new term ∇resβ^t * Γ * res
@show grad_t2_numerator

quadratic_form_half = (transpose(gc.res) * Γ1 * gc.res)/2
@show quadratic_form_half 
@test quadratic_form_half ≈ quad_form_standardized_res_half # from the loglikelihood 'qsum'

grad_t2_denominator = inv(1 + quadratic_form_half)
@show grad_t2_denominator

gradient_term2 = grad_t2_numerator * grad_t2_denominator * sqrt(τ[1])

grad_t2_numerator = [8.846661533432094]
quadratic_form_half = 1.812461063788311
grad_t2_denominator = 0.3555604779299687


1-element Array{Float64,1}:
 0.05728351307289227

In [46]:
gradient_term2_function = GeneralizedCopula.copula_gradient_addendum(gc, β, τ[1], Σ)

1-element Array{Float64,1}:
 0.05728351307289226

In [47]:
gradient_hard_code = term1_grad_fctn + gradient_term2

1-element Array{Float64,1}:
 0.019973446398091156

In [48]:
function copula_gradient(gc::GLMCopulaVCObs{T, D}, β, τ, Σ)  where {T<:BlasReal, D}
    fill!(gc.∇β, 0.0)
    gc.∇β .= GeneralizedCopula.glm_gradient(gc, β, τ) .+ GeneralizedCopula.copula_gradient_addendum(gc, β, τ[1], Σ)
end

full_gradient_function = copula_gradient(gc, β, τ, Σ)

1-element Array{Float64,1}:
 0.01997344639809115

In [49]:
@show loglikelihood2!(gcm, true, true)

loglikelihood2!(gcm, true, true) = -164.00082379403617


-164.00082379403617

In [50]:
@time GeneralizedCopula.fit2!(gcm, IpoptSolver(print_level = 5, derivative_test = "first-order"))

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Starting derivative checker for first derivatives.


No errors detected by derivative checker.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        1
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:  

GLMCopulaVCModel{Float64,Normal{Float64}}(GLMCopulaVCObs{Float64,Normal{Float64}}[GLMCopulaVCObs{Float64,Normal{Float64}}([1545.0, 1440.0, 1440.0, 1520.0, 1580.0], [1.0; 1.0; … ; 1.0; 1.0], [[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]], [-0.009332475479925655], [-1.0; -1.0; … ; -1.0; -1.0], [2.2470135217e-314], [1.5e-323], [-14.894316220056414], [2.1219957924e-314], [0.05966153756787458, -1.894018152216939, -1.894018152216939, -0.40550029333327153, 0.7108881008294792], [2.5], [5.858419862007839], [5.0], [-3.4229869593697955, -3.4229869593697955, -3.4229869593697955, -3.4229869593697955, -3.4229869593697955], [NaN], [1541.7935063882878, 1541.7935063882878, 1541.7935063882878, 1541.7935063882878, 1541.7935063882878], [1541.7935063882878, 1541.7935063882878, 1541.7935063882878, 1541.7935063882878, 1541.7935063882878], [1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0], Normal{Float64}(μ=0.0, σ=1.0), [1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0

In [51]:
@show loglikelihood2!(gcm, true, true)
@show gcm.∇β
@show gcm.∇τ
@show gcm.∇Σ;

loglikelihood2!(gcm, true, true) = -163.3554542330147
gcm.∇β = [5.433611893757018e-12]
gcm.∇τ = [2.279980277e-314]
gcm.∇Σ = [0.0]


In [52]:
@show gcm.β
@show gcm.τ
@show gcm.Σ;

gcm.β = [1541.7935063882878]
gcm.τ = [0.0003462008462836904]
gcm.Σ = [52109.25787010273]
