# Example of Random M Variance Components Simulation

Authors: Sarah Ji, Hua Zhou, Janet Sinsheimer, Kenneth Lange

In this notebook I demo how to simulte from the LMM/VCM framework with the following parameters. We then benchmark against the same simulation process using the MatrixNormal() function in the Distributions.jl package.

### Double check that you are using Julia version 1.0 or higher by checking the machine information

In [1]:
versioninfo()

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


In [2]:
using DataFrames, Random, LinearAlgebra, TraitSimulation, Distributions, BenchmarkTools
Random.seed!(1234);

# Generating Random Design Matrix, Coefficient Vector and Variance Component Matrices

Here for m = 10 random Variance Components, we generate m random covariance matrices, a random design matrix and p regression coefficients to illustrate the simulation of a d-dimensional response matrix for a sample of n = 1000 people.


In [3]:
n = 1000   # no. observations
d = 2      # dimension of responses
m = 10      # no. variance components
p = 2;      # no. covariates


    function generateSPDmatrix(n)
        A = rand(n)
        m = 0.5 * (A * A')
        PDmat = m + (n * Diagonal(ones(n)))
    end


    function RVCModel(n::Int64, p::Int64, d::Int64, m::Int64)
        # n-by-p design matrix
        X = randn(n, p)

        # p-by-d mean component regression coefficient for each trait
        B = hcat(ones(p, 1), rand(p))  

        V = ntuple(x -> zeros(n, n), m) 
        for i = 1:m-1
          copy!(V[i], generateSPDmatrix(n))
        end
        copy!(V[end], Diagonal(ones(n))) # last covarianec matrix is identity

        # a tuple of m d-by-d variance component parameters
        Σ = ntuple(x -> zeros(d, d), m) 
        for i in 1:m
          copy!(Σ[i], generateSPDmatrix(d))
        end

        return(X, B, Σ, V)
    end

In [4]:
X, B, Σ, V = RVCModel(n, p, d, m);
Random_VCM_Trait = DataFrame(VCM_simulation(X, B, Σ, V), [:SimTrait1, :SimTrait2])

Unnamed: 0_level_0,SimTrait1,SimTrait2
Unnamed: 0_level_1,Float64,Float64
1,-69.5177,-125.128
2,-184.975,179.507
3,62.2697,-29.6645
4,-77.7197,-143.363
5,215.216,-109.543
6,45.6826,42.9817
7,-47.6316,-128.685
8,-43.8541,-191.055
9,95.7323,-119.535
10,89.6551,-251.84


# Comparing Benchmarking with the Distributions.jl package MatrixNormal distribution. 

In our VarianceComponent type, we store the cholesky decomposition of each $\Sigma_i$ and $V_i$, computed outside of simulation within the vc_vector of VarianceComponent types. This is important since the more often than not, users have to run the simulation many times for their desired goal. From our benchmarking below, we show that when we use the simulation package to simulate traits n_reps times, using the VariaceComponent type is much faster and memory efficient than calling the available julia MatrixNormal distribution m times for the following parameters:

In [5]:
@show n # sample size
@show m # number of random variance componenents 
@show d # number of traits
@show p; # number of fixed effects

n = 1000
m = 10
d = 2
p = 2


## Compare for m = 1 variance component

For only one variance component we are roughly four 2x more memory efficient and 3.7x faster at simulating this bivariate trait

In [6]:
LMMtraitobj = LMMTrait(X*B, VarianceComponent(Σ[1], V[1]))
@benchmark simulate(LMMtraitobj)

BenchmarkTools.Trial: 
  memory estimate:  7.66 MiB
  allocs estimate:  4
  --------------
  minimum time:     2.854 ms (0.00% GC)
  median time:      3.221 ms (0.00% GC)
  mean time:        4.109 ms (21.52% GC)
  maximum time:     9.386 ms (32.66% GC)
  --------------
  samples:          1213
  evals/sample:     1

In [7]:
function MN_J(X, B, Σ, V; n_reps = 1)
    n, p = size(X*B)
    sim = [zeros(n, p) for i in 1:n_reps]
    for i in 1:n_reps
        sim[i] = rand(MatrixNormal(X*B, V, Σ))
    end
    return(sim)
end

@benchmark MN_J($X, $B, $Σ[1], $V[1])

BenchmarkTools.Trial: 
  memory estimate:  15.38 MiB
  allocs estimate:  25
  --------------
  minimum time:     9.429 ms (0.00% GC)
  median time:      12.091 ms (16.40% GC)
  mean time:        12.994 ms (12.56% GC)
  maximum time:     22.056 ms (9.21% GC)
  --------------
  samples:          385
  evals/sample:     1

## Compare simulation for m = 10 variance components

still about 2x memory efficient but now 3.2x faster compared to the Distributions package

In [8]:
vc_vector = [VarianceComponent(Σ[i], V[i]) for i in eachindex(V)]
LMMtraitobjm = LMMTrait(X*B, vc_vector);
@benchmark simulate(LMMtraitobjm)

BenchmarkTools.Trial: 
  memory estimate:  76.34 MiB
  allocs estimate:  23
  --------------
  minimum time:     30.426 ms (13.17% GC)
  median time:      32.193 ms (12.91% GC)
  mean time:        33.317 ms (13.97% GC)
  maximum time:     50.199 ms (14.93% GC)
  --------------
  samples:          150
  evals/sample:     1

In [9]:
function MN_Jm(X, B, Σ, V; n_reps = 1)
    n, p = size(X*B)
    m = length(V)
    sim = [zeros(n, p) for i in 1:n_reps]
    for i in 1:n_reps
        for j in 1:m
            dist = MatrixNormal(X*B, V[j], Σ[j])
            sim[i] += rand(dist)
        end
    end
    return(sim)
end

@benchmark vecs = MN_Jm($X, $B, $Σ, $V)

BenchmarkTools.Trial: 
  memory estimate:  153.70 MiB
  allocs estimate:  233
  --------------
  minimum time:     100.515 ms (7.89% GC)
  median time:      104.067 ms (9.01% GC)
  mean time:        107.199 ms (8.69% GC)
  maximum time:     149.985 ms (7.86% GC)
  --------------
  samples:          47
  evals/sample:     1

## Citations: 

[1] Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM (2013) Mendel: The Swiss army knife of genetic analysis programs. Bioinformatics 29:1568-1570.`


[2] OPENMENDEL: a cooperative programming project for statistical genetics.
[Hum Genet. 2019 Mar 26. doi: 10.1007/s00439-019-02001-z](https://www.ncbi.nlm.nih.gov/pubmed/?term=OPENMENDEL).
