# Longitudinal QuasiCopula GWAS with Mixed Marginals

Here we adopt the variance component model framework

$$\mathbf{\Gamma}_i(\mathbf{\theta}) = \sum_{k=1}^m \theta_k\mathbf{V}_{ik}, \quad \theta_k \ge 0$$

In [3]:
using Revise
using DataFrames, Random, GLM, QuasiCopula
using ForwardDiff, Test, LinearAlgebra
using LinearAlgebra: BlasReal, copytri!
using ToeplitzMatrices
using BenchmarkTools
using SnpArrays
using MendelIHT

BLAS.set_num_threads(1)
Threads.nthreads()

┌ Info: Precompiling MendelIHT [921c7187-1484-5754-b919-5d3ed9ac03c4]
└ @ Base loading.jl:1342


1

## Simulate data

In [4]:
# simulate data
p = 3    # number of fixed effects, including intercept
m = 1    # number of variance componentsac
samplesize = 1000 # number of sample
d = 5   # number of observation per sample
q = 100  # number of SNPs
k = 10   # number of causal SNPs
T = Float64

# simulate nongenetic coefficient and variance component params
Random.seed!(2022)
βtrue = rand(Uniform(-2, 2), p)
θtrue = [0.5] # 1 variance component
Γ = θtrue[1] * ones(d, d)

# randomly choose marginal distributions for each observation within samples
Random.seed!(2022)
possible_distributions = [Normal, Bernoulli, Poisson]
vecdist = rand(possible_distributions, d)

# simulate non-genetic design matrix
Random.seed!(2022)
X_samplesize = [randn(d, p - 1) for i in 1:samplesize]
gcs = Vector{MixedCopulaVCObs{T}}(undef, samplesize)

# simulate random SnpArray with 100 SNPs and randomly choose 10 SNPs to be causal
Random.seed!(2022)
G = simulate_random_snparray(undef, samplesize, q)
Gfloat = convert(Matrix{T}, G, center=true, scale=true)
γtrue = zeros(q)
γtrue[1:k] .= rand(Uniform(-0.5, 0.5), k)
shuffle!(γtrue)
η_G = Gfloat * γtrue

for i in 1:samplesize
    X = [ones(d) X_samplesize[i]]
    η = X * βtrue
    η .+= η_G[i] # add genetic effects
    vecd_tmp = Vector{UnivariateDistribution}(undef, d)
    for j in 1:d
        dist = vecdist[j]
        μj = GLM.linkinv(canonicallink(dist()), η[j])
        vecd_tmp[j] = dist(μj)
    end
    multivariate_dist = MultivariateMix(vecd_tmp, Γ)
    y = Vector{Float64}(undef, d)
    res = Vector{Float64}(undef, d)
    rand(multivariate_dist, y, res)
    V = [ones(d, d)]
    gcs[i] = MixedCopulaVCObs(y, X, V)
end
veclink = [canonicallink(vecdist[j]()) for j in 1:d]
gcm = MixedCopulaVCModel(gcs, vecdist, veclink);

## Fit Null model

TODO: 
+ If use Newton, need to fix calculation of $H_\theta$. 
+ If using Quasi-Newton, need to figure out how to get $H_\beta$

In [15]:
fittime = @elapsed QuasiCopula.fit!(gcm, IpoptSolver(print_level = 5, 
        max_iter = 100, tol = 10^-4))
@show fittime
@show gcm.β
@show gcm.θ
@show gcm.∇β
@show gcm.∇θ
@show gcm.Hβ


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.13.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        8

Total number of variables............................:        4
                     variables with only lower bounds:        1
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equal

  75  3.9518009e+06 0.00e+00 2.18e+00  -1.0 4.00e-03   2.6 1.00e+00 1.00e+00f  1
  76  3.9512161e+06 0.00e+00 2.18e+00  -1.0 1.61e-03   3.0 1.00e+00 1.00e+00f  1
  77  3.9499744e+06 0.00e+00 2.11e+00  -1.0 4.78e-03   2.6 1.00e+00 1.00e+00f  1
  78  3.9493657e+06 0.00e+00 2.12e+00  -1.0 1.72e-03   3.0 1.00e+00 1.00e+00f  1
  79  3.9480752e+06 0.00e+00 2.03e+00  -1.0 5.92e-03   2.5 1.00e+00 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  80  3.9474316e+06 0.00e+00 2.04e+00  -1.0 1.79e-03   2.9 1.00e+00 1.00e+00f  1
  81  3.9461502e+06 0.00e+00 2.09e+00  -1.0 8.12e-03   2.5 1.00e+00 1.00e+00f  1
  82  3.9454288e+06 0.00e+00 1.96e+00  -1.0 1.80e-03   2.9 1.00e+00 1.00e+00f  1
  83  3.9451105e+06 0.00e+00 1.94e+00  -1.0 8.05e-04   3.3 1.00e+00 1.00e+00f  1
  84  3.9443869e+06 0.00e+00 1.93e+00  -1.0 1.98e-03   2.8 1.00e+00 1.00e+00f  1
  85  3.9440607e+06 0.00e+00 1.94e+00  -1.0 9.03e-04   3.3 1.00e+00 1.00e+00f  1
  86  3.9433306e+06 0.00e+00

└ @ QuasiCopula /Users/benjaminchu/.julia/dev/QuasiCopula/src/parameter_estimation/mixed_VC.jl:245


fittime = 175.113557012
gcm.β = [1.6707113948354058, 1.5807185869256557, -1.4674095580337179]
gcm.θ = [1.0173353422682279]
gcm.∇β = [145043.2875270763, 67299.1409601921, 235396.61417666727]
gcm.∇θ = [434.6182063733732]
gcm.Hβ = [-7.335222336720487e6 -1.0765000611570114e7 9.540695621992571e6; -1.0765000611570114e7 -2.3359462011068907e7 1.4377739638287297e7; 9.540695621992571e6 1.4377739638287297e7 -1.923445256857499e7]


3×3 Matrix{Float64}:
 -7.33522e6  -1.0765e7    9.5407e6
 -1.0765e7   -2.33595e7   1.43777e7
  9.5407e6    1.43777e7  -1.92345e7

In [16]:
@show βtrue
@show θtrue;

βtrue = [1.1930358188812686, 1.5993942032216824, -1.3995760477711494]
θtrue = [0.5]


## Score test

In [24]:
loglikeki
@time pvals = QuasiCopula.GWASCopulaVCModel(gcm, G)

  4.211233 seconds (5.00 M allocations: 1.147 GiB, 39.92% gc time)


100-element Vector{Float64}:
 0.004337542727853779
 2.1452880888061157e-120
 1.8186185524609303e-200
 0.0
 0.0
 0.22827815699093829
 7.794229446423376e-145
 8.622503090518413e-57
 0.0
 0.0
 2.499630194724355e-56
 0.0
 0.0
 ⋮
 3.0311828041483e-228
 0.2699892024188981
 0.0
 0.0
 1.701935579003e-312
 2.698389471827338e-134
 0.0
 4.9877817839589804e-260
 0.0
 8.863054667621717e-39
 9.7800182886342e-137
 0.0

In [27]:
correct_snps = findall(!iszero, γtrue)
signif_snps = findall(x -> x < 0.05/length(pvals), pvals)
power = length(correct_snps ∩ signif_snps) / length(correct_snps)

@show length(signif_snps)
@show power

length(signif_snps) = 95
power = 1.0


1.0