# Example of Random M Variance Components Simulation

Authors: Sarah Ji, Hua Zhou, Janet Sinsheimer, Kenneth Lange

Here for m = 10 random Variance Components, we generate m random covariance matrices, a random design matrix and p regression coefficients to illustrate the simulation of a d-dimensional response matrix for a sample of n = 1000 people.


### Double check that you are using Julia version 1.0 or higher by checking the machine information

In [1]:
versioninfo()

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


In [2]:
using DataFrames, Random, LinearAlgebra
Random.seed!(1234);
n = 100   # no. observations
d = 1      # dimension of responses
m = 10      # no. variance components
p = 2      # no. covariates

# n-by-p design matrix
X = randn(n, p)

# p-by-d mean component regression coefficient
B = ones(p, d)  

# a tuple of m covariance matrices
V = ntuple(x -> zeros(n, n), m) 
for i = 1:m-1
  Vi = [j ≥ i ? i * (n - j + 1) : j * (n - i + 1) for i in 1:n, j in 1:n]
  copy!(V[i], Vi * Vi')
end
copy!(V[m], Diagonal(ones(n))) # last covarianec matrix is idendity

# a tuple of m d-by-d variance component parameters
Σ = ntuple(x -> zeros(d, d), m) 
for i in 1:m
  Σi = [j ≥ i ? i * (d - j + 1) : j * (d - i + 1) for i in 1:d, j in 1:d]
  copy!(Σ[i], Σi' * Σi)
end

┌ Info: Recompiling stale cache file /Users/sarahji/.julia/compiled/v1.2/TraitSimulation/VikWX.ji for TraitSimulation [dec3038e-29bc-11e9-2207-9f3d5855a202]
└ @ Base loading.jl:1240
│ - If you have TraitSimulation checked out for development and have
│   added Revise as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with TraitSimulation


In [3]:
Random.seed!(1234);

In [4]:
using BenchmarkTools
Random.seed!(1234);
n = 1000   # no. observations
d = 2      # dimension of responses
m = 10      # no. variance components
p = 2      # no. covariates

# n-by-p design matrix
X = randn(n, p)

# p-by-d mean component regression coefficient
B = ones(p, d)  

# a tuple of m covariance matrices
V = ntuple(x -> zeros(n, n), m) 
for i = 1:m-1
  Vi = [j ≥ i ? i * (n - j + 1) : j * (n - i + 1) for i in 1:n, j in 1:n]
  copy!(V[i], Vi * Vi')
end
copy!(V[m], Diagonal(ones(n))) # last covarianec matrix is idendity

# a tuple of m d-by-d variance component parameters
Σ = ntuple(x -> zeros(d, d), m) 
for i in 1:m
  Σi = [j ≥ i ? i * (d - j + 1) : j * (d - i + 1) for i in 1:d, j in 1:d]
  copy!(Σ[i], Σi' * Σi)
end

Random_VCM_Trait = DataFrame(VCM_simulation(X, B, V, Σ), [:SimTrait1, :SimTrait2])

UndefVarError: UndefVarError: simulate not defined

# Comparing with the MatrixNormal from the distributions package for a M variance components, we beat on memory but not on speed. 

In [5]:
@benchmark VCM_simulation($X, $B, $Σ, $V)

UndefVarError: UndefVarError: simulate not defined

# Cholesky computed outside of simulation within the vc_vector of VarianceComponent types

This shows that when we use the simulation package to simulate traits multiple times for whatever use then storing the cholesky decomposition of the variance covariancce matrices outside of the simulation to use n_reps times is much faster and memory efficient than using the available julia MatrixNormal distribution package. 

In [6]:
vc_vector = [VarianceComponent(Σ[i], V[i]) for i in eachindex(V)]
@benchmark VCM_simulation($X, $B, $vc_vector)

UndefVarError: UndefVarError: simulate not defined

## Using Julia's MatrixNormal distribution in the Distributions package for simulation

In [7]:
function MN_J(X, B, V, Σ)
    n, p = size(X*B)
    evec = zeros(n, p)
    m = length(V)
    for i in 1:m
        dist = MatrixNormal(X*B, V[i], Σ[i])
        evec += rand(dist)
    end
    return(evec)
end

@benchmark vecs = MN_J($X, $B, $V, $Σ)

UndefVarError: UndefVarError: MatrixNormal not defined

# Comparing with the MatrixNormal from the distributions package for m = 10 variance components, we beat! 

In [8]:
LMMtraitobj = LMMTrait(X*B, VarianceComponent(Σ[1], V[1]))
@benchmark simulate(LMMtraitobj, 10)

UndefVarError: UndefVarError: simulate not defined

In [9]:
function MN_J(X, B, V, Σ; n_reps = 1)
    n, p = size(X*B)
    sim = [zeros(n, p) for i in 1:n]
    for i in 1:n_reps
        sim[i] = rand(MatrixNormal(X*B, V, Σ))
    end
    return(sim)
end

@benchmark MN_J($X, $B, $V[1], $Σ[1], n_reps = 10)

UndefVarError: UndefVarError: MatrixNormal not defined

## Citations: 

[1] Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM (2013) Mendel: The Swiss army knife of genetic analysis programs. Bioinformatics 29:1568-1570.`


[2] OPENMENDEL: a cooperative programming project for statistical genetics.
[Hum Genet. 2019 Mar 26. doi: 10.1007/s00439-019-02001-z](https://www.ncbi.nlm.nih.gov/pubmed/?term=OPENMENDEL).
