# Mixed variance component model 

Here the variance looks like

$$\mathbf{\Gamma}_i(\mathbf{\theta}) = \sum_{k=1}^m \theta_k\mathbf{V}_{ik}, \quad \theta_k \ge 0$$

In [1]:
using Revise
using DataFrames, Random, GLM, QuasiCopula
using ForwardDiff, Test, LinearAlgebra
using LinearAlgebra: BlasReal, copytri!
using ToeplitzMatrices
using BenchmarkTools
using ForwardDiff

BLAS.set_num_threads(1)
Threads.nthreads()

┌ Info: Precompiling QuasiCopula [c47b6ae2-b804-4668-9957-eb588c99ffbc]
└ @ Base loading.jl:1342


1

## Mixed marginal distributions with Newton

In [2]:
# simulate data
p = 3    # number of fixed effects, including intercept
m = 1    # number of variance components
samplesize = 10000 # number of sample
d = 25   # number of observation per sample
possible_distributions = [Normal, Bernoulli, Poisson]

# simulate true regression coefficient and variance compoentn params
Random.seed!(2022)
βtrue = rand(Uniform(-2, 2), p)
θtrue = [0.5] # 1 variance component

# randomly choose marginal distributions for each observation within samples
Random.seed!(2022)
vecdist = rand(possible_distributions, d)
T = Float64

# simulate design matrix
Random.seed!(2022)
X_samplesize = [randn(d, p - 1) for i in 1:samplesize]
gcs = Vector{MixedCopulaVCObs{T}}(undef, samplesize)
Γ = θtrue[1] * ones(d, d)

for i in 1:samplesize
    X = [ones(d) X_samplesize[i]]
    η = X * βtrue
    vecd_tmp = Vector{UnivariateDistribution}(undef, d)
    for j in 1:d
        dist = vecdist[j]
        μj = GLM.linkinv(canonicallink(dist()), η[j])
        vecd_tmp[j] = dist(μj)
    end
    multivariate_dist = MultivariateMix(vecd_tmp, Γ)
    y = Vector{Float64}(undef, d)
    res = Vector{Float64}(undef, d)
    rand(multivariate_dist, y, res)
    V = [ones(d, d)]
    gcs[i] = MixedCopulaVCObs(y, X, V)
end
veclink = [canonicallink(vecdist[j]()) for j in 1:d]

gcm = MixedCopulaVCModel(gcs, vecdist, veclink);

In [5]:
# fit model
fittime = @elapsed QuasiCopula.fit!(gcm, IpoptSolver(print_level = 5, 
        max_iter = 100, tol = 10^-4))
@show fittime
@show gcm.β
@show gcm.θ
@show gcm.∇β
@show gcm.∇θ
@show gcm.Hβ

This is Ipopt version 3.13.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        8

Total number of variables............................:        4
                     variables with only lower bounds:        1
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  80  3.5793292e+05 0.00e+00 1.66e-01  -2.5 1.36e-04   2.9 1.00e+00 1.00e+00f  1
  81  3.5793173e+05 0.00e+00 1.62e-01  -2.5 5.94e-05   3.4 1.00e+00 1.00e+00f  1
  82  3.5792905e+05 0.00e+00 1.61e-01  -2.5 1.38e-04   2.9 1.00e+00 1.00e+00f  1
  83  3.5792783e+05 0.00e+00 1.60e-01  -2.5 6.53e-05   3.3 1.00e+00 1.00e+00f  1
  84  3.5792516e+05 0.00e+00 1.58e-01  -2.5 1.54e-04   2.8 1.00e+00 1.00e+00f  1
  85  3.5792392e+05 0.00e+00 1.58e-01  -2.5 7.17e-05   3.3 1.00e+00 1.00e+00f  1
  86  3.5792126e+05 0.00e+00 1.55e-01  -2.5 1.71e-04   2.8 1.00e+00 1.00e+00f  1
  87  3.5791999e+05 0.00e+00 1.55e-01  -2.5 7.83e-05   3.2 1.00e+00 1.00e+00f  1
  88  3.5791731e+05 0.00e+00 1.51e-01  -2.5 1.93e-04   2.7 1.00e+00 1.00e+00f  1
  89  3.5791602e+05 0.00e+00 1.51e-01  -2.5 8.52e-05   3.2 1.00e+00 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  90  3.5791332e+05 0.00e+00

└ @ QuasiCopula /Users/benjaminchu/.julia/dev/QuasiCopula/src/parameter_estimation/mixed_VC.jl:241


fittime = 210.052441456
gcm.β = [1.2005879974500522, 1.5980409484881646, -1.396655948951027]
gcm.θ = [1.003800111789738]
gcm.∇β = [-7751.899167304302, -434.0413768138386, -6880.079985903067]
gcm.∇θ = [-132.89710294762324]
gcm.Hβ = [-6.533859154710575e6 -8.075702164957257e6 6.841360882496561e6; -8.075702164957257e6 -1.578138252212335e7 9.45212662416275e6; 6.841360882496561e6 9.45212662416275e6 -1.2131874053389376e7]


3×3 Matrix{Float64}:
 -6.53386e6  -8.0757e6    6.84136e6
 -8.0757e6   -1.57814e7   9.45213e6
  6.84136e6   9.45213e6  -1.21319e7

In [4]:
# show true parameters
@show βtrue
@show θtrue;

βtrue = [1.1930358188812686, 1.5993942032216824, -1.3995760477711494]
θtrue = [0.5]


## Mixed marginal distributions with Quasi-Newton

In [2]:
# simulate data
p = 3    # number of fixed effects, including intercept
m = 1    # number of variance components
samplesize = 10000 # number of sample
d = 25   # number of observation per sample
possible_distributions = [Normal, Bernoulli, Poisson]

# simulate true regression coefficient and variance compoentn params
Random.seed!(2022)
βtrue = rand(Uniform(-2, 2), p)
θtrue = [0.5] # 1 variance component

# randomly choose marginal distributions for each observation within samples
Random.seed!(2022)
vecdist = rand(possible_distributions, d)
T = Float64

# simulate design matrix
Random.seed!(2022)
X_samplesize = [randn(d, p - 1) for i in 1:samplesize]
gcs = Vector{MixedCopulaVCObs{T}}(undef, samplesize)
Γ = θtrue[1] * ones(d, d)

for i in 1:samplesize
    X = [ones(d) X_samplesize[i]]
    η = X * βtrue
    vecd_tmp = Vector{UnivariateDistribution}(undef, d)
    for j in 1:d
        dist = vecdist[j]
        μj = GLM.linkinv(canonicallink(dist()), η[j])
        vecd_tmp[j] = dist(μj)
    end
    multivariate_dist = MultivariateMix(vecd_tmp, Γ)
    y = Vector{Float64}(undef, d)
    res = Vector{Float64}(undef, d)
    rand(multivariate_dist, y, res)
    V = [ones(d, d)]
    gcs[i] = MixedCopulaVCObs(y, X, V)
end
veclink = [canonicallink(vecdist[j]()) for j in 1:d]

gcm = MixedCopulaVCModel(gcs, vecdist, veclink);

In [3]:
# fit model
fittime = @elapsed QuasiCopula.fit!(gcm, IpoptSolver(print_level = 5, 
        max_iter = 100, tol = 10^-4, limited_memory_max_history = 16, 
        hessian_approximation = "limited-memory"))
@show fittime
@show gcm.β
@show gcm.θ
@show gcm.∇β
@show gcm.∇θ


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.13.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        4
                     variables with only lower bounds:        1
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equal

1-element Vector{Float64}:
 -0.03910816012440321

In [4]:
# show true parameters
@show βtrue
@show θtrue;

βtrue = [1.1930358188812686, 1.5993942032216824, -1.3995760477711494]
θtrue = [0.5]


# Check Correctness

If all marginal distributions are fixed (e.g. all Poisson), then a `MixedCopulaVCModel` should return the same loglikelihood as a `GLMCopulaVCModel`.

**Note:** Need to turn off `initialize_model` for `GLMCopulaVCModel` since `MixedCopulaVCModel` currently initializes everything with 0 or 1. 

In [14]:
using Revise
using DataFrames, Random, GLM, QuasiCopula
using ForwardDiff, Test, LinearAlgebra
using LinearAlgebra: BlasReal, copytri!
using ToeplitzMatrices
using BenchmarkTools
using ForwardDiff

BLAS.set_num_threads(1)
Threads.nthreads()

1

In [15]:
p = 3    # number of fixed effects, including intercept
m = 1    # number of variance components
Random.seed!(12345)
βtrue = rand(Uniform(-2, 2), p)
θtrue = [0.5]
samplesize = 10000
d = 25
dist = Poisson
link = LogLink
# dist = Bernoulli
# link = LogitLink
T = Float64
Γ = θtrue[1] * ones(d, d)
vecdist = [dist() for _ in 1:d]
veclink = [link() for _ in 1:d]

gcs_glm = Vector{GLMCopulaVCObs{T, dist{T}, link}}(undef, samplesize)
gcs_mixed = Vector{MixedCopulaVCObs{T}}(undef, samplesize)

# for reproducibility I will simulate all the design matrices here
Random.seed!(12345)
X_samplesize = [randn(d, p - 1) for i in 1:samplesize]

for i in 1:samplesize
    X = [ones(d) X_samplesize[i]]
    μ = GLM.linkinv.(link(), X * βtrue)
    vecd = Vector{DiscreteUnivariateDistribution}(undef, d)
    for i in 1:d
        vecd[i] = dist(μ[i])
    end
    nonmixed_multivariate_dist = NonMixedMultivariateDistribution(vecd, Γ)
    # simuate single vector y
    y = Vector{Float64}(undef, d)
    res = Vector{Float64}(undef, d)
    rand(nonmixed_multivariate_dist, y, res)
    V = [ones(d, d)]
    gcs_glm[i] = GLMCopulaVCObs(y, X, V, dist(), link())
    gcs_mixed[i] = MixedCopulaVCObs(y, X, V)
end

gcm_glm = GLMCopulaVCModel(gcs_glm)
gcm_mixed = MixedCopulaVCModel(gcs_mixed, vecdist, veclink);

In [105]:
# Fit GLMCopulaVC with all the same marginals
QuasiCopula.fit!(gcm_glm, IpoptSolver(print_level = 5, 
    max_iter = 100, tol = 10^-8, limited_memory_max_history = 50, 
    accept_after_max_steps = 4, hessian_approximation = "limited-memory"))

This is Ipopt version 3.13.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        4
                     variables with only lower bounds:        1
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

-377367.75738999934

In [106]:
# Mixed
QuasiCopula.fit!(gcm_mixed, IpoptSolver(print_level = 5, 
    max_iter = 100, tol = 10^-8, limited_memory_max_history = 50, 
    accept_after_max_steps = 4, hessian_approximation = "limited-memory"))

This is Ipopt version 3.13.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        4
                     variables with only lower bounds:        1
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

-377367.75738999835

In [107]:
@show βtrue
@show θtrue

@show gcm_glm.β
@show gcm_glm.θ
@show gcm_glm.∇β
@show gcm_glm.∇θ

@show gcm_mixed.β
@show gcm_mixed.θ
@show gcm_mixed.∇β
@show gcm_mixed.∇θ;

βtrue = [0.250855540422787, 1.3997579145162504, -0.5135785925430074]
θtrue = [0.5]
gcm_glm.β = [0.25191084984222173, 1.3996424557990568, -0.5131796224630202]
gcm_glm.θ = [0.5234336103117214]
gcm_glm.∇β = [-6.231637428300019e-11, -4.223751126630759e-9, -2.743858473763794e-9]
gcm_glm.∇θ = [-2.5807962478197055e-7]
gcm_mixed.β = [0.2519108497194852, 1.3996424558254064, -0.513179622540665]
gcm_mixed.θ = [0.5234336105830091]
gcm_mixed.∇β = [6.709103225288615e-5, 5.3386114069908785e-5, 4.9410723127252254e-5]
gcm_mixed.∇θ = [-5.195367152044383e-7]
