# Form the NB Regression Random Intercept Model: Simulated set

Using Block update of beta and MM-update. We will use the file "fit_old.jl"

Without having to compute the gradient and hessian of variance components vector, we finish in 67 iterations and 9 seconds.

Next we may try joint estimation using Newton's + IPOPT. 

In [45]:
using Revise
using DataFrames, Random, GLM, GLMCopula
using ForwardDiff, Test, LinearAlgebra
using LinearAlgebra: BlasReal, copytri!
using SpecialFunctions

Random.seed!(1234)

# sample size
N = 10000
# observations per subject
n = 5

variance_component_1 = 0.9
variance_component_2 = 0.1

r = 10
p = 0.7
μ = r * (1-p) * inv(p)

#var = r * (1-p) * inv(p^2)

# true beta
β_true = log(μ)

dist = NegativeBinomial

Γ = variance_component_1 * ones(n, n) + variance_component_2 * Matrix(I, n, n)
vecd = [dist(r, p) for i in 1:n]
nonmixed_multivariate_dist = NonMixedMultivariateDistribution(vecd, Γ)

Y_Nsample = simulate_nobs_independent_vectors(nonmixed_multivariate_dist, N)

10000-element Vector{Vector{Float64}}:
 [2.0, 1.0, 2.0, 3.0, 1.0]
 [8.0, 5.0, 3.0, 5.0, 7.0]
 [7.0, 4.0, 4.0, 11.0, 7.0]
 [4.0, 3.0, 7.0, 4.0, 9.0]
 [10.0, 11.0, 10.0, 1.0, 8.0]
 [4.0, 4.0, 3.0, 4.0, 3.0]
 [9.0, 6.0, 5.0, 4.0, 4.0]
 [6.0, 1.0, 3.0, 6.0, 0.0]
 [4.0, 7.0, 3.0, 3.0, 6.0]
 [5.0, 5.0, 3.0, 6.0, 2.0]
 [7.0, 2.0, 3.0, 2.0, 3.0]
 [3.0, 2.0, 2.0, 1.0, 4.0]
 [7.0, 6.0, 2.0, 7.0, 6.0]
 ⋮
 [2.0, 3.0, 1.0, 4.0, 2.0]
 [8.0, 3.0, 6.0, 2.0, 7.0]
 [2.0, 1.0, 3.0, 5.0, 7.0]
 [2.0, 3.0, 1.0, 1.0, 5.0]
 [7.0, 3.0, 4.0, 4.0, 11.0]
 [2.0, 3.0, 6.0, 2.0, 6.0]
 [2.0, 3.0, 1.0, 1.0, 2.0]
 [8.0, 9.0, 6.0, 3.0, 5.0]
 [5.0, 6.0, 2.0, 3.0, 2.0]
 [5.0, 6.0, 6.0, 9.0, 5.0]
 [0.0, 2.0, 1.0, 4.0, 3.0]
 [6.0, 6.0, 8.0, 3.0, 6.0]

In [4]:
d = NegativeBinomial(10) # set r here
link = LogLink()
D = typeof(d)
Link = typeof(link)
T = Float64
gcs = Vector{GLMCopulaVCObs{T, D, Link}}(undef, N)
for i in 1:N
    y = Float64.(Y_Nsample[i])
    X = ones(n, 1)
    V = [ones(n, n), Matrix(I, n, n)]
    gcs[i] = GLMCopulaVCObs(y, X, V, d, link)
end
gcm = GLMCopulaVCModel(gcs);

initialize_model!(gcm)
@show gcm.β
@show gcm.Σ
@show gcm.data[1].d.r

initializing β using Newton's Algorithm under Independence Assumption
1 0.0 -121027.88891374407 39999
2 -121027.88891374407 -121027.88891374407 39999
initializing variance components using MM-Algorithm
gcm.β = [1.5151931648552381]
gcm.Σ = [0.6979270527960945, 0.08741044176671597]
(gcm.data[1]).d.r = 10.0


10.0

Initialize β and σ2, here I just copy the solution for β and σ2 from MixedModels.jl over

In [11]:
GLMCopula.loglikelihood!(gcm, true, true)

-118480.31871515416

## Fit_old (not updating r) works

If true $r$ is not 1, setting the correct $r$ in initialize function will make it work too.

In [12]:
@time fit2!(gcm, IpoptSolver(print_level = 5, max_iter = 500, mehrotra_algorithm="yes", warm_start_init_point="yes", hessian_approximation = "exact"))

gcm.β = [1.5151931648552381]
gcm.Σ = [0.6979257382428836, 0.08740945654307972]
gcm.β = [1.5151931648552381]
gcm.Σ = [0.6979244525871305, 0.0874084929729934]
This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        1

Total number of variables............................:        1
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:    

In [13]:
@show β_true
@show gcm.β
@show gcm.Σ
@show variance_component_1, variance_component_2
@show gcm.∇β
@show gcm.data[1].d
@show GLMCopula.loglikelihood!(gcm, true, true);

β_true = 1.4552872326068422
gcm.β = [1.45381327887095]
gcm.Σ = [0.8537921429784839, 0.12204795378602154]
(variance_component_1, variance_component_2) = (0.9, 0.1)
gcm.∇β = [-2.7448491024095745e-5]
(gcm.data[1]).d = NegativeBinomial{Float64}(r=10.0, p=0.5)
GLMCopula.loglikelihood!(gcm, true, true) = -118247.93818277148


## fit_old (updating r)

In [56]:
d = NegativeBinomial() # r is 1
link = LogLink()
D = typeof(d)
Link = typeof(link)
T = Float64
gcs = Vector{GLMCopulaVCObs{T, D, Link}}(undef, N)
for i in 1:N
    y = Float64.(Y_Nsample[i])
    X = ones(n, 1)
    V = [ones(n, n), Matrix(I, n, n)]
    gcs[i] = GLMCopulaVCObs(y, X, V, d, link)
end
gcm = GLMCopulaVCModel(gcs);

initialize_model!(gcm)
@show gcm.β
@show gcm.Σ
@show gcm.data[1].d.r

initializing β using Newton's Algorithm under Independence Assumption
1 0.0 -130890.45942520844 39999
2 -130890.45942520844 -130890.45942520844 39999
gcm.β = [1.5151931648552381]
(gcm.data[1]).res = [-2.5503, -3.5503, -2.5503, -1.5503, -3.5503]
(gcm.data[1]).η = [1.5151931648552381, 1.5151931648552381, 1.5151931648552381, 1.5151931648552381, 1.5151931648552381]
(gcm.data[1]).μ = [4.5503, 4.5503, 4.5503, 4.5503, 4.5503]
(gcm.data[1]).y = [2.0, 1.0, 2.0, 3.0, 1.0]
initializing variance components using MM-Algorithm
gcm.Σ = [0.34918505305084624, 1.113346719664689e-6]
gcm.β = [1.5151931648552381]
gcm.Σ = [0.34918505305084624, 1.113346719664689e-6]
(gcm.data[1]).d.r = 5.517724656138809


5.517724656138809

In [40]:
@time fit2!(gcm, IpoptSolver(print_level = 5, max_iter = 50, 
        mehrotra_algorithm="yes", warm_start_init_point="yes", hessian_approximation = "exact"))

gcm.β = [1.5151931648552381]
gcm.Σ = [0.35426253306863426, 1.0665540934122337e-6]
new_d = NegativeBinomial{Float64}(r=5.581683440027854, p=0.5)
gcm.β = [1.5151931648552381]
gcm.Σ = [0.35426158745591785, 9.296805525164864e-7]
new_d = NegativeBinomial{Float64}(r=5.581684753636537, p=0.5)
new_d = NegativeBinomial{Float64}(r=5.581686067247769, p=0.5)
new_d = NegativeBinomial{Float64}(r=5.581687380854176, p=0.5)
This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        1

Total number of variables............................:        1
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:      

└ @ GLMCopula /Users/biona001/.julia/dev/GLMCopula/src/parameter_estimation/fit_old.jl:12


164.849868 seconds (1.64 G allocations: 111.107 GiB, 6.36% gc time, 0.01% compilation time)


In [43]:
@show β_true
@show gcm.β
@show gcm.Σ
@show variance_component_1, variance_component_2
@show gcm.∇β
@show gcm.∇Σ
@show gcm.data[1].d
@show GLMCopula.loglikelihood!(gcm, true, true);

β_true = 1.4552872326068422
gcm.β = [1.439683976548931]
gcm.Σ = [0.4607756211463058, 2.222252481133919e-21]
(variance_component_1, variance_component_2) = (0.9, 0.1)
gcm.∇β = [0.001688797885080362]
gcm.∇Σ = [0.0026587176025714143, -1218.6411631134538]
(gcm.data[1]).d = NegativeBinomial{Float64}(r=5.581881731513947, p=0.5)
GLMCopula.loglikelihood!(gcm, true, true) = -118655.9419407156
