# TraitSimulation for Variance Component Models (VCM):

Authors: Sarah Ji, Janet Sinsheimer, Ken Lange, Eric Sobel

In this notebook, we use the GAW 2019 data to demonstrate how to simulate phenotypic traits from the Variance Component Model (VCM) framework under different scenarios. We let the user specify the effect sizes for non-genetic covariates, and we provide the user with an option to simulate their own effect sizes, given a known distribution. In both models $\mu$ is the mean, $x_i$ is the allele count for snp $i$, GRM is kinship matrix as derived from the genetic relationship matrix (GRM) across only the common snps with minor allele frequency $\ge 0.05$, and I is the identity matrix. At the end of each example, we demonstrate how to write the results of each simulation to a file on the users own machine. The notebook is organized as follows: <br>

**Example 1: User specified mixed effects model with Interaction in the Fixed Effects**

**Example 2: Rare Variant Model with effect sizes generated off minor allele frequency**

**a) Univariate Trait Simulation**

**b) Bivariate Trait Simulation**

### Double check that you are using Julia version 1.0 or higher by checking the machine information

In [1]:
using Random, SnpArrays, TraitSimulation, VarianceComponentModels, StatsBase, DataFrames
using LinearAlgebra, Distributions, CSV, Plots
Random.seed!(1234);
versioninfo()

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


In [2]:
# Load genotype data in PLINK format
data = SnpArray("simphen_849.bed")

849×8348674 SnpArray:
 0x03  0x02  0x03  0x03  0x03  0x03  …  0x03  0x03  0x03  0x03  0x02  0x03
 0x03  0x02  0x03  0x03  0x03  0x03     0x03  0x02  0x03  0x03  0x03  0x03
 0x03  0x02  0x02  0x02  0x02  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x03  0x03  0x03     0x03  0x02  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x03  0x03  0x03     0x03  0x03  0x03  0x03  0x02  0x03
 0x03  0x01  0x03  0x03  0x03  0x03  …  0x03  0x03  0x03  0x03  0x02  0x03
 0x03  0x03  0x02  0x02  0x02  0x02     0x03  0x03  0x03  0x03  0x02  0x03
 0x03  0x03  0x03  0x03  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x02  0x03  0x03  0x03  0x03     0x02  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x03  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x00  0x02  0x02  0x02  0x02  …  0x03  0x03  0x03  0x03  0x02  0x03
 0x03  0x03  0x03  0x03  0x03  0x03     0x03  0x03  0x03  0x03  0x02  0x03
 0x03  0x01  0x03  0x03  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
   

# GAW19 Example Data

In this GAW dataset there are 849 individuals in 20 families and 8,348,674 SNP's. We used data from the Genetic Analysis Workshop (GAW) 19, which are provided for 1943 Hispanic samples that have been whole-exome sequenced. Our analyses are based on the genotype calls for 959 individuals (464 directly sequenced and the rest imputed) provided in the chrNN-geno.csv.gz files, of which we excluded 110 individuals based on genotyping success rates below 98 %. The largest family contains 107 individuals, the smallest, 27.

1. Genetic Analysis Workshop 19: methods and strategies for analyzing human sequence and gene expression data in extended families and unrelated individuals, Corinne D. EngelmanEmail author, Celia M. T. Greenwood, Julia N. Bailey, Rita M. Cantor, Jack W. KentJr, Inke R. König, Justo Lorenzo Bermejo, Phillip E. Melton, Stephanie A. Santorico, Arne Schillert, Ellen M. Wijsman, Jean W. MacCluer and Laura Almasy, BMC Proceedings201610(Suppl 7):19
https://doi.org/10.1186/s12919-016-0007-z 

2. Genome-wide QTL and eQTL analyses using Mendel, Hua Zhou, Jin Zhou, Tao Hu, Eric M. Sobel, Kenneth Lange
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133530/

Information on the individual can be found in the .fam file. We retrieve the `familyid` (Pedigree ID) along with the `personid` for each of the 849 individuals in the sample. 

In [3]:
snpdata = SnpData("simphen_849");

In [4]:
# rows are people; columns are SNPs
people, snps = size(data)

(849, 8348674)

We obtain summary statistics for later use, mainly the minor allele frequency (maf).

In [5]:
minor_allele_frequency = maf(data)

8348674-element Array{Float64,1}:
 0.014201183431952646 
 0.4032258064516129   
 0.07116788321167888  
 0.07099514563106801  
 0.07776427703523692  
 0.05110837438423643  
 0.051724137931034475 
 0.000589622641509413 
 0.0011778563015312216
 0.07334963325183375  
 0.0017667844522968323
 0.0011778563015312216
 0.0011778563015312216
 ⋮                    
 0.0011778563015312216
 0.011189634864546494 
 0.006478209658421719 
 0.0017667844522968323
 0.004711425206124886 
 0.0005889281507656108
 0.02414605418138982  
 0.01531213191990577  
 0.0223792697290931   
 0.002944640753828054 
 0.49469339622641506  
 0.0017667844522968323

In [6]:
# columns are: :chrom, :snpid,
snpdefbimfile = snpdata.snp_info # store the snp_info with the snp names

Unnamed: 0_level_0,chromosome,snpid,genetic_distance,position,allele1,allele2
Unnamed: 0_level_1,String,String,Float64,Int64,Categorical…,Categorical…
1,1,c1_54490,0.0,54490,A,G
2,1,c1_55550,0.0,55550,T,A
3,1,c1_57033,0.0,57033,C,T
4,1,c1_57064,0.0,57064,A,G
5,1,c1_57818,0.0,57818,A,C
6,1,c1_58432,0.0,58432,C,T
7,1,c1_58448,0.0,58448,A,G
8,1,c1_58814,0.0,58814,A,G
9,1,c1_59492,0.0,59492,G,A
10,1,c1_60829,0.0,60829,T,C


We subset the SNP names into a vector called `snpid`

In [7]:
snpid  = snpdefbimfile[!, :snpid] # store the snp names in the snpid vector;

# Example 1: User Specified Model

Say, for instance, we know which SNP's we want to have an effect on the Trait. 

To save computing time and memory, we can simply call them by their names in `snpid`, and subset only those specified SNP's for the analysis.

In this example, we choose three snp's on three different chromosomes to prevent Linkage disequilibrium (LD). Say I know the associated SNP's are located on the tenth, eleventh and twenty first respective chromosomes to have an effect on the trait. We include the three SNP's and the interaction of the SNP on chromosome 11 and 21.

We first get the indices of the specified snps to subset the dataset. The snps of interest in this example are in the 10th, 5348671th and 8348112th respective columns of the dataset.

Note: While prior knowledge of which SNP's have an effect on the trait can be helpful, it is important to consider the proper assumptions of the SNP's before performing any analyses. In particular, we will later check to make sure that all the minor allele frequencies are large enough so that a fixed effect model makes sense, and the SNP's are not monomorphic. 

In [8]:
specified_indices = findall(x -> (x == "c1_57033" || x == "c11_2993197"  || x == "c21_48019937"), snpid)
# Use can change this SNP if you would like to assess another's snps effect on the trait, e.g.:
# specified_indices = findall(x -> x == "c1_61920", snpid)

3-element Array{Int64,1}:
       3
 5348671
 8348112

In [9]:
loci = convert(Matrix{Float64}, @view(data[:, specified_indices]), impute = true)
X_gen = DataFrame(loci)
rename!(X_gen, Symbol.(snpid[specified_indices]))
X_gen[!, :intxn] = X_gen[!, :c11_2993197].*X_gen[!, :c21_48019937]
X_gen

Unnamed: 0_level_0,c1_57033,c11_2993197,c21_48019937,intxn
Unnamed: 0_level_1,Float64,Float64,Float64,Float64
1,2.0,2.0,2.0,4.0
2,2.0,2.0,2.0,4.0
3,1.0,2.0,1.0,2.0
4,2.0,2.0,1.0,2.0
5,2.0,2.0,2.0,4.0
6,2.0,2.0,2.0,4.0
7,1.0,2.0,2.0,4.0
8,2.0,0.0,2.0,0.0
9,2.0,2.0,2.0,4.0
10,2.0,1.0,2.0,2.0


# Polymorphic Loci 

We check that the minor allele frequencies for our three specified SNP's are greater than 0.05 so we will proceed with just these SNP's in our fixed effect model. 

In [10]:
minor_allele_frequency[specified_indices] .> 0.05

3-element BitArray{1}:
 1
 1
 1

For the three specified SNP's, we simulate effect sizes based off of their minor allele frequencies. These effect sizes will be used throughout example 1. 

# Simulate or Specify Non-genetic Covariates

In [11]:
n = people
famfile = snpdata.person_info
IndividualID = famfile[:, 1:2];

# map sex from M/F to 1/0
sex = map(x -> strip(x) == "F" ? 0.0 : 1.0, famfile[!, :sex]);

# pdf_sex = Bernoulli(0.51)
pdf_age = Normal(45, 8)

# sex = rand(pdf_sex, n)
# # simulate age under the specified pdf_age and standardize to be ~ N(0, 1)
age = zscore(rand(pdf_age, n))
intercept = ones(n)
X_non_gen = DataFrame(intercept = intercept, age = age, sex = sex)
sim_effectsize = simulate_effect_size(minor_allele_frequency[specified_indices])
intxn_effect = 0.3

β_cov = [1.0, 0.0002, 0.2]
β = vcat(β_cov, sim_effectsize, intxn_effect)
X_design = [X_non_gen X_gen]

Unnamed: 0_level_0,intercept,age,sex,c1_57033,c11_2993197,c21_48019937,intxn
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,1.0,0.883123,1.0,2.0,2.0,2.0,4.0
2,1.0,-0.878451,1.0,2.0,2.0,2.0,4.0
3,1.0,-0.472917,1.0,1.0,2.0,1.0,2.0
4,1.0,-0.879617,1.0,2.0,2.0,1.0,2.0
5,1.0,0.88019,1.0,2.0,2.0,2.0,4.0
6,1.0,2.22194,1.0,2.0,2.0,2.0,4.0
7,1.0,0.550011,1.0,1.0,2.0,2.0,4.0
8,1.0,-0.25112,1.0,2.0,0.0,2.0,0.0
9,1.0,0.519661,1.0,2.0,2.0,2.0,4.0
10,1.0,-0.495326,1.0,2.0,1.0,2.0,2.0


# Genetic Relationship Matrix (GRM)

We estimate the kinship using the grm function in `SnpArrays` from only the common snps with minor allele frequency(maf) > 0.05.

We will use the same values of GRM, I and Σ for all 3 examples. 

This example simulates an infinitesmal effect model in which each SNP contributes a small amount to the variance of the trait. In this case, methods to identify SNPs that contribute to the trait value will be unsuccessful and any SNPs identified are false positives.  As such, this simulation can be when the null hypothesis is desired. 

Another issue of importance $E(GRM) = \Phi$ and $V_p = 2V_a \Phi + V_e I$ , so we are in fact simulating a trait that has $V_p = 1.0, h^2 = 0.30$.

We take a look at the Genetic Relationship Matrix computed through `SnpArrays`. 
As a rule of thumb, we should see values close to a half on the diagonal of the GRM.

In [12]:
# Compute GRM using the grm function in SnpArrays
GRM = grm(data)

849×849 Array{Float64,2}:
  0.439528      0.00578792   -0.00513657   …  -0.0102565    -0.00735002 
  0.00578792    0.470776      0.000283674      0.00113738    0.00338513 
 -0.00513657    0.000283674   0.494281         0.00753482    0.000493164
  0.00397425    0.00685784   -0.00111751      -0.00632067   -0.0036096  
 -0.00947635    0.0052004     0.00138248       0.0120027     0.00402512 
  0.0122564     0.00843351    0.00381375   …  -0.00097642   -0.00135375 
  3.40091e-5    2.97977e-5    0.00906778      -0.00214236    0.000921063
 -0.0198685    -0.00111119    0.00533031       0.0141974     0.00952904 
 -0.00775167   -0.00142722   -0.0017737        0.00681067    0.00854752 
 -0.00505832    0.00025334    0.00538903       0.00195095   -0.000926909
 -0.00379525   -0.00118807    0.00577121   …   0.000668809  -0.00292677 
 -0.00840363   -0.00287342    0.00480007       0.00490979    0.00382081 
 -0.00593582   -0.00042752    0.0052472       -0.0031738     0.000446541
  ⋮                      

In [13]:
I_n = Matrix{Float64}(I, size(GRM));
totalvc = @vc [0.1][:, :] ⊗ (GRM + I_n) + [0.9][:, :] ⊗ I_n
# # # Create the simulation model 
vcm_model1 =  VCMTrait(Matrix(X_design), β, totalvc)

Variance Component Model
  * number of traits: 1
  * number of variance components: 2
  * sample size: 849

In [14]:
# simulate the trait
y_1 = DataFrame(Trait = simulate(vcm_model1)[:])

Unnamed: 0_level_0,Trait
Unnamed: 0_level_1,Float64
1,-0.461096
2,0.436411
3,0.981048
4,-1.06051
5,0.244939
6,2.03927
7,0.261482
8,-1.66674
9,0.415643
10,0.841028


## Saving Simulation Results to Local Machine

Next we output the SNPs and the coefficients used to simulate this trait along with the simulated trait values and corresponding design matrix for each of the 849 individuals, labeled by their pedigree ID and person ID.

In addition, we output the genotypes for the variants used to simulate this trait. Note that we can impute missing genotypes by turning the argument `impute = true`.


In [15]:
Coefficients = DataFrame(Coefficients = β)
Covariates = DataFrame(covariates = names(X_design))
Trait1_SNPs = hcat(Coefficients, Covariates)

Unnamed: 0_level_0,Coefficients,covariates
Unnamed: 0_level_1,Float64,Symbol
1,1.0,intercept
2,0.0002,age
3,0.2,sex
4,-0.388946,c1_57033
5,-0.244657,c11_2993197
6,-0.376091,c21_48019937
7,0.3,intxn


In [16]:
Trait1_data = [IndividualID X_design]

Unnamed: 0_level_0,fid,iid,intercept,age,sex,c1_57033,c11_2993197
Unnamed: 0_level_1,Abstract…,Abstract…,Float64,Float64,Float64,Float64,Float64
1,2,T2DG0200001,1.0,0.883123,1.0,2.0,2.0
2,2,T2DG0200002,1.0,-0.878451,1.0,2.0,2.0
3,2,T2DG0200003,1.0,-0.472917,1.0,1.0,2.0
4,2,T2DG0200004,1.0,-0.879617,1.0,2.0,2.0
5,2,T2DG0200005,1.0,0.88019,1.0,2.0,2.0
6,2,T2DG0200006,1.0,2.22194,1.0,2.0,2.0
7,2,T2DG0200007,1.0,0.550011,1.0,1.0,2.0
8,2,T2DG0200008,1.0,-0.25112,1.0,2.0,0.0
9,2,T2DG0200009,1.0,0.519661,1.0,2.0,2.0
10,2,T2DG0200012,1.0,-0.495326,1.0,2.0,1.0


In [17]:
CSV.write("Trait1_data.csv", Trait1_data)
CSV.write("Trait1_SNPs.csv", Trait1_SNPs)

"Trait1_SNPs.csv"

# Example 2: Rare Variant Model -- Genotypes of the uncommon SNP's 

In this example we first subset only the rare SNP's with minor allele frequency greater than 0.0005 but less than 0.02, then we simulate traits on the first 20 rare SNP's as fixed effects.

In [18]:
rare_snp_indx = 0.001 .≤ minor_allele_frequency .≤ 0.02
maf_rare = minor_allele_frequency[rare_snp_indx]

4603083-element Array{Float64,1}:
 0.014201183431952646 
 0.0011778563015312216
 0.0017667844522968323
 0.0011778563015312216
 0.0011778563015312216
 0.0011778563015312216
 0.0023612750885477762
 0.002355712603062443 
 0.0017751479289941363
 0.0035335689045936647
 0.0017751479289941363
 0.0011778563015312216
 0.0011778563015312216
 ⋮                    
 0.0017667844522968323
 0.018256772673733823 
 0.002951593860684776 
 0.002355712603062443 
 0.0011778563015312216
 0.011189634864546494 
 0.006478209658421719 
 0.0017667844522968323
 0.004711425206124886 
 0.01531213191990577  
 0.002944640753828054 
 0.0017667844522968323

## Generating Effect Sizes 

Below we demonstrate how to simulate effect sizes for each SNP, conditional on its minor allele frequency and a known distribution.

We include these distributions to model realistic scenarios where the rarest snps have the largest effect size.  


## Chisquared(df = 1)

We want to use allele frequency as x and find f(x) where f is the pdf for the chisquare (df=1) density, so that the rarest snps have the biggest effect sizes.

In [19]:
# Generating Effect Sizes from Chisquared(df = 1) density
using StatsFuns
chisq_coeff = zeros(20)
for i in 1:20
chisq_coeff[i] = chisqpdf(1, maf_rare[i])
end

Take a look at the simulated coefficients on the left, next to the corresponding minor allele frequency. Notice how the more rare SNP's have the largest effect sizes.

In [20]:
simulated_effectsizes_chisq = round.(chisq_coeff, digits = 2)
DataFrame(chisq_coeff = simulated_effectsizes_chisq, maf = maf_rare[1:20])

Unnamed: 0_level_0,chisq_coeff,maf
Unnamed: 0_level_1,Float64,Float64
1,3.32,0.0142012
2,11.62,0.00117786
3,9.48,0.00176678
4,11.62,0.00117786
5,11.62,0.00117786
6,11.62,0.00117786
7,8.2,0.00236128
8,8.21,0.00235571
9,9.46,0.00177515
10,6.7,0.00353357


## Exponential

For demonstration purposes, we use simulated_effectsizes3 = 3*exp.(-200*maf_rare\[1:20\]), rounded to the second digit, throughout this example. However, named distribution can also be used to simulate effect sizes. 

In [21]:
simulated_effectsizes_exp = round.(3*exp.(-200*maf_rare[1:20]), digits = 2)

20-element Array{Float64,1}:
 0.18
 2.37
 2.11
 2.37
 2.37
 2.37
 1.87
 1.87
 2.1 
 1.48
 2.1 
 2.37
 2.37
 1.31
 0.06
 0.08
 0.07
 2.37
 1.87
 2.37

In [22]:
rare_indices = findall(x-> x==1, rare_snp_indx)[1:20]
rare_loci2 = convert(Matrix{Float64}, @view(data[:, rare_indices]), impute = true)

X_gen2 = DataFrame(rare_loci2)
rename!(X_gen2, Symbol.(snpid[rare_snp_indx][1:20]))

X_rare_design = [X_non_gen X_gen2]

Unnamed: 0_level_0,intercept,age,sex,c1_54490,c1_59492,c1_61462,c1_61920,c1_62162
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,1.0,0.883123,1.0,2.0,2.0,2.0,2.0,2.0
2,1.0,-0.878451,1.0,2.0,2.0,2.0,2.0,2.0
3,1.0,-0.472917,1.0,2.0,2.0,2.0,2.0,2.0
4,1.0,-0.879617,1.0,2.0,2.0,2.0,2.0,2.0
5,1.0,0.88019,1.0,2.0,2.0,2.0,2.0,2.0
6,1.0,2.22194,1.0,2.0,2.0,2.0,2.0,2.0
7,1.0,0.550011,1.0,2.0,2.0,2.0,2.0,2.0
8,1.0,-0.25112,1.0,2.0,2.0,2.0,2.0,2.0
9,1.0,0.519661,1.0,2.0,2.0,2.0,2.0,2.0
10,1.0,-0.495326,1.0,2.0,2.0,2.0,2.0,2.0


## Simulate Trait

The following code simulates a trait under the mixed effect model with two variance components as well as a fixed effect μ.


In [23]:
β_exp = vcat(β_cov, simulated_effectsizes_exp)
# # # Create the simulation model (:T is the name of the simulated trait)
vcm_model2 = VCMTrait(Matrix(X_rare_design), β_exp, totalvc)
sigma, v = vcobjtuple(totalvc)
Σ = [sigma...]
V = [v...]
vcmOBJ2 = VCMTrait(Matrix(X_non_gen), β_cov, rare_loci2, simulated_effectsizes_exp, Σ, V)

Variance Component Model
  * number of traits: 1
  * number of variance components: 2
  * sample size: 849

In [24]:
# Generate the simulations
y_2 = DataFrame(Trait = simulate(vcmOBJ2)[:])

Unnamed: 0_level_0,Trait
Unnamed: 0_level_1,Float64
1,66.3349
2,68.7386
3,69.6754
4,67.1783
5,68.6255
6,67.8581
7,67.8991
8,66.2767
9,69.3029
10,65.6474


## Saving Simulation Results to Local Machine

Write the newly simulated trait into a comma separated (csv) file for later use. Note that the user can specify the separator to '\t' for tab separated, or another separator of choice. 

Here we output the simulated trait values for each of the 849 individuals, labeled by their pedigree ID and person ID.

In addition, we output the genotypes for the variants used to simulate this trait. Note that we can impute missing genotypes by turning the `impute` argument = true.

In [25]:
Trait2_data = [IndividualID hcat(y_2, X_rare_design)]

Unnamed: 0_level_0,fid,iid,Trait,intercept,age,sex,c1_54490,c1_59492
Unnamed: 0_level_1,Abstract…,Abstract…,Float64,Float64,Float64,Float64,Float64,Float64
1,2,T2DG0200001,66.3349,1.0,0.883123,1.0,2.0,2.0
2,2,T2DG0200002,68.7386,1.0,-0.878451,1.0,2.0,2.0
3,2,T2DG0200003,69.6754,1.0,-0.472917,1.0,2.0,2.0
4,2,T2DG0200004,67.1783,1.0,-0.879617,1.0,2.0,2.0
5,2,T2DG0200005,68.6255,1.0,0.88019,1.0,2.0,2.0
6,2,T2DG0200006,67.8581,1.0,2.22194,1.0,2.0,2.0
7,2,T2DG0200007,67.8991,1.0,0.550011,1.0,2.0,2.0
8,2,T2DG0200008,66.2767,1.0,-0.25112,1.0,2.0,2.0
9,2,T2DG0200009,69.3029,1.0,0.519661,1.0,2.0,2.0
10,2,T2DG0200012,65.6474,1.0,-0.495326,1.0,2.0,2.0


Next we output the coefficients and SNP's used to simulate this trait.

In [26]:
Coefficients = DataFrame(Coefficients = simulated_effectsizes_exp)
SNPs = DataFrame(SNPs = snpid[rare_snp_indx][1:20])
Trait2_SNPs = hcat(Coefficients, SNPs)

Unnamed: 0_level_0,Coefficients,SNPs
Unnamed: 0_level_1,Float64,String
1,0.18,c1_54490
2,2.37,c1_59492
3,2.11,c1_61462
4,2.37,c1_61920
5,2.37,c1_62162
6,2.37,c1_67580
7,1.87,c1_74902
8,1.87,c1_76838
9,2.1,c1_86329
10,1.48,c1_86331


In [27]:
CSV.write("Trait2_data.csv", Trait2_data)
CSV.write("Trait2_SNPs.csv", Trait2_SNPs)

"Trait2_SNPs.csv"

# Example 3: Bivariate Rare Variant Model -- 

We extend the univariate variance component model above to simulate a bivariate trait. We will use the chi-squared effect sizes for the second trait.

In [28]:
β_chisq = vcat(β_cov, simulated_effectsizes_chisq)
fixed_effects = [β_exp β_chisq]

23×2 Array{Float64,2}:
 1.0      1.0   
 0.0002   0.0002
 0.2      0.2   
 0.18     3.32  
 2.37    11.62  
 2.11     9.48  
 2.37    11.62  
 2.37    11.62  
 2.37    11.62  
 1.87     8.2   
 1.87     8.21  
 2.1      9.46  
 1.48     6.7   
 2.1      9.46  
 2.37    11.62  
 2.37    11.62  
 1.31     6.18  
 0.06     2.79  
 0.08     2.93  
 0.07     2.88  
 2.37    11.6   
 1.87     8.2   
 2.37    11.6   

In [29]:
using LinearAlgebra
Σ_A = [4 1; 1 4]
Σ_E = [2.0 0.0; 0.0 2.0];
variance_formula = @vc Σ_A ⊗ (GRM + I_n)  + Σ_E ⊗ I_n
Σ_2 = [Σ_A, Σ_E]

# # Create the simulation model 
vcm_model =  VCMTrait(Matrix(X_rare_design), fixed_effects, variance_formula)

Variance Component Model
  * number of traits: 2
  * number of variance components: 2
  * sample size: 849

In [30]:
# Generate the simulations
y_3 = DataFrame(simulate(vcm_model))
rename!(y_3, [Symbol("Trait$i") for i in 1:length(variance_formula)])

Unnamed: 0_level_0,Trait1,Trait2
Unnamed: 0_level_1,Float64,Float64
1,68.1778,335.633
2,67.763,343.467
3,62.153,341.028
4,70.5878,341.497
5,71.8811,345.071
6,74.0223,340.659
7,71.7453,343.38
8,67.4223,337.422
9,67.5488,339.466
10,72.6452,344.51


## Saving Simulation Results to Local Machine

In [31]:
Trait3_data = [IndividualID hcat(y_3, X_rare_design)]

Unnamed: 0_level_0,fid,iid,Trait1,Trait2,intercept,age,sex,c1_54490
Unnamed: 0_level_1,Abstract…,Abstract…,Float64,Float64,Float64,Float64,Float64,Float64
1,2,T2DG0200001,68.1778,335.633,1.0,0.883123,1.0,2.0
2,2,T2DG0200002,67.763,343.467,1.0,-0.878451,1.0,2.0
3,2,T2DG0200003,62.153,341.028,1.0,-0.472917,1.0,2.0
4,2,T2DG0200004,70.5878,341.497,1.0,-0.879617,1.0,2.0
5,2,T2DG0200005,71.8811,345.071,1.0,0.88019,1.0,2.0
6,2,T2DG0200006,74.0223,340.659,1.0,2.22194,1.0,2.0
7,2,T2DG0200007,71.7453,343.38,1.0,0.550011,1.0,2.0
8,2,T2DG0200008,67.4223,337.422,1.0,-0.25112,1.0,2.0
9,2,T2DG0200009,67.5488,339.466,1.0,0.519661,1.0,2.0
10,2,T2DG0200012,72.6452,344.51,1.0,-0.495326,1.0,2.0


In [32]:
Coefficients = DataFrame(Coefficients_Trait1 = simulated_effectsizes_exp, Coefficients_Trait2 = simulated_effectsizes_chisq)
SNPs = DataFrame(SNPs = snpid[rare_snp_indx][1:20])
Trait3_SNPs = hcat(Coefficients, SNPs)

Unnamed: 0_level_0,Coefficients_Trait1,Coefficients_Trait2,SNPs
Unnamed: 0_level_1,Float64,Float64,String
1,0.18,3.32,c1_54490
2,2.37,11.62,c1_59492
3,2.11,9.48,c1_61462
4,2.37,11.62,c1_61920
5,2.37,11.62,c1_62162
6,2.37,11.62,c1_67580
7,1.87,8.2,c1_74902
8,1.87,8.21,c1_76838
9,2.1,9.46,c1_86329
10,1.48,6.7,c1_86331


In [33]:
CSV.write("Trait3_data.csv", Trait3_data)
CSV.write("Trait3_SNPs.csv", Trait3_SNPs)

"Trait3_SNPs.csv"

## Citations: 

[1] Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM (2013) Mendel: The Swiss army knife of genetic analysis programs. Bioinformatics 29:1568-1570.`


[2] OPENMENDEL: a cooperative programming project for statistical genetics.
[Hum Genet. 2019 Mar 26. doi: 10.1007/s00439-019-02001-z](https://www.ncbi.nlm.nih.gov/pubmed/?term=OPENMENDEL).