# Estimating Heritability and Testing SNP Association using Maximum Likelihoods of Variance Component Models

Authors: Sarah Ji, Janet Sinsheimer and Hua Zhou

We will use a variance component model (also called a linear mixed model) to estimate heritability of a trait and then test for association to specified markers. We are conducting here the equivalent of a replication or a candidate SNP approach.  Normally if we had no prior hypothesis regarding particular loci (candidate gene approach), we would be first be testing markers genomewide using a GWAS approach that can handle pedigree data appropriately (see later lesson).  In that case, we would probably be using a fast score test approach rather than by maximum likelihood. However, maximum likelihood provides more accurate inference and parameters estimates and can be used to refine interference after screening.  We will also use the variance component frame work with maximum likelihood tests when conducting Mendelian randomization with families so it's useful for us to understand how the Julia package, VarianceComponentModels.jl, works and how to load in large data sets into Julia using the package, SnpArrays.jl. 



## Data files

In the GAW19 dataset there are 849 individuals in 20 families and 8,348,674 loci located on the odd chromosomes. All of these individuals were genotyped and a fraction of them also have sequence data. The majority of the sequenced loci are imputed for the remaining individuals. The largest family contains 107 individuals, the smallest, 27.

For illustration purposes, we changed the names of the loci to fit Julia's style guide Julia Style Guide.

Genetic Analysis Workshop 19: methods and strategies for analyzing human sequence and gene expression data in extended families and unrelated individuals, Corinne D. EngelmanEmail author, Celia M. T. Greenwood, Julia N. Bailey, Rita M. Cantor, Jack W. KentJr, Inke R. König, Justo Lorenzo Bermejo, Phillip E. Melton, Stephanie A. Santorico, Arne Schillert, Ellen M. Wijsman, Jean W. MacCluer and Laura Almasy, BMC Proceedings201610(Suppl 7):19 https://doi.org/10.1186/s12919-016-0007-z

Genome-wide QTL and eQTL analyses using Mendel, Hua Zhou, Jin Zhou, Tao Hu, Eric M. Sobel, Kenneth Lange https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133530/

Typing ";" brings lets us use unix commands. Before starting the analysis we first use the unix command "ls" to check that the files are in our working directory.   


In [1]:
# if on the server, we create symbolic links to the actual data in current folder
if !(isfile("PEDdataLMM.txt") && isfile("SNPnamesLMM.txt") && isfile("SNPdataLMM.bed"))
    run(`ln -s /home/jsmdata/gaw18-chr1-chr13/SNPdataLMM.bed ./SNPdataLMM.bed`)
    run(`ln -s /home/jsmdata/gaw18-chr1-chr13/PEDdataLMM.txt ./PEDdataLMM.txt`)
    run(`ln -s /home/jsmdata/gaw18-chr1-chr13/SNPnamesLMM.txt ./SNPnamesLMM.txt`)
end

In [3]:
;ls -l PEDdataLMM.txt SNPnamesLMM.txt SNPdataLMM.bed

-rw-r--r--@ 1 huazhou  staff      34398 Jul 23 09:37 PEDdataLMM.txt
-rw-r--r--@ 1 huazhou  staff  407279007 Jul 23 06:33 SNPdataLMM.bed
-rw-r--r--@ 1 huazhou  staff   24316376 Jul 23 09:48 SNPnamesLMM.txt


## Read in the data

Take a look at the first 10 lines of the pedigree file, pedLMM07052918.fam. The columns are comma separated. This file is in the classic Mendel format, Family Id, Person ID, Father ID, Mother Id, sex as F (female) or M (male), monozygotic twin indicator, simtrait1 and simtrait2.  The traits were simulated using the Julia program TraitSimulation.jl using genotyped and imputed variants to have a main effect from two snps, one on chromosome 1 and the other on chromosome 13 as well as an interaction between these two SNPs.  If you look at the file carefully you will note that we don't know how individuals within a pedigree are related.  That is o.k. when using variant component models because we can use the GRM as a proxy for the Kinship matrix. 

In [4]:
;head PEDdataLMM.txt

2,200001,,,M,,0.870472967,1.245755458
2,200002,,,F,,-1.686156772,0.435395909
2,200003,,,F,,0.666343843,-0.150266906
2,200004,,,F,,0.083925361,-0.724815477
2,200005,,,M,,0.786505745,0.830477935
2,200006,,,M,,1.563371074,0.172045237
2,200007,,,F,,1.088919384,-0.395792086
2,200008,,,F,,-1.151107782,2.11290715
2,200009,,,F,,-0.36535092,0.759092316
2,200012,,,M,,-0.630388002,1.81987199


Read in the pedigree file into an array.

In [5]:
# columns are: :famid, :id, :moid, :faid, :sex, :twin, :simtrait1,:simtrait2
pedLMM = readcsv("PEDdataLMM.txt", Any; header = false)

849×8 Array{Any,2}:
  2   200001  ""  ""  "M"  ""   0.870473    1.24576 
  2   200002  ""  ""  "F"  ""  -1.68616     0.435396
  2   200003  ""  ""  "F"  ""   0.666344   -0.150267
  2   200004  ""  ""  "F"  ""   0.0839254  -0.724815
  2   200005  ""  ""  "M"  ""   0.786506    0.830478
  2   200006  ""  ""  "M"  ""   1.56337     0.172045
  2   200007  ""  ""  "F"  ""   1.08892    -0.395792
  2   200008  ""  ""  "F"  ""  -1.15111     2.11291 
  2   200009  ""  ""  "F"  ""  -0.365351    0.759092
  2   200012  ""  ""  "M"  ""  -0.630388    1.81987 
  2   200013  ""  ""  "M"  ""  -0.277382    1.46866 
  2   200018  ""  ""  "M"  ""   1.34288     0.417684
  2   200023  ""  ""  "F"  ""   1.88722    -0.819818
  ⋮                        ⋮                        
 47  4701128  ""  ""  "F"  ""   1.76189     2.17118 
 47  4701129  ""  ""  "M"  ""   1.35565     0.151142
 47  4701130  ""  ""  "M"  ""  -0.0690388   2.2869  
 47  4701131  ""  ""  "F"  ""   0.419444    2.81945 
 47  4701132  ""  ""  "F" 

We don't need to retain the ids so we retrieve the two phenotype data and put them in an array Y.

In [6]:
simtrait1 = convert(Vector{Float64}, pedLMM[:, 7])
simtrait2 = convert(Vector{Float64}, pedLMM[:, 8])
Y = [simtrait1 simtrait2]

849×2 Array{Float64,2}:
  0.870473    1.24576 
 -1.68616     0.435396
  0.666344   -0.150267
  0.0839254  -0.724815
  0.786506    0.830478
  1.56337     0.172045
  1.08892    -0.395792
 -1.15111     2.11291 
 -0.365351    0.759092
 -0.630388    1.81987 
 -0.277382    1.46866 
  1.34288     0.417684
  1.88722    -0.819818
  ⋮                   
  1.76189     2.17118 
  1.35565     0.151142
 -0.0690388   2.2869  
  0.419444    2.81945 
  1.47836    -0.276582
 -0.274552   -0.453944
  0.177762   -0.350015
 -0.464458    5.33595 
  0.0336769   0.11002 
  1.26861    -0.2     
 -0.319195    1.37059 
  0.114271    6.70812 

We retrieve sex data coded as 0 (male) or 1 (female), which means male is the reference group.  You can change the 
code to sex = map(x -> strip(x) == "M"? 1.0 : 0.0,  pedLMM[:, 5]) if you want female to be the reference group. 

In [7]:
sex = map(x -> strip(x) == "F"? 1.0 : 0.0,  pedLMM[:, 5])

849-element Array{Float64,1}:
 0.0
 1.0
 1.0
 1.0
 0.0
 0.0
 1.0
 1.0
 1.0
 0.0
 0.0
 0.0
 1.0
 ⋮  
 1.0
 0.0
 0.0
 1.0
 1.0
 1.0
 1.0
 1.0
 0.0
 0.0
 0.0
 1.0

Take a look at the first 10 lines of the SNP definition file before we read in into an array using a unix command.

In [8]:
;head SNPnamesLMM.txt

c1_54490
c1_55550
c1_57033
c1_57064
c1_57818
c1_58432
c1_58448
c1_58814
c1_59492
c1_60829


Read in the SNP definition file into a Julia array, skipping the first 2 lines.

In [9]:
# columns are: :snpid, :chrom, :pos, :allele1, :allele2, :groupname
snpLMM = readcsv("SNPnamesLMM.txt", Any; header = false)

1912108×1 Array{Any,2}:
 "c1_54490"     
 "c1_55550"     
 "c1_57033"     
 "c1_57064"     
 "c1_57818"     
 "c1_58432"     
 "c1_58448"     
 "c1_58814"     
 "c1_59492"     
 "c1_60829"     
 "c1_61462"     
 "c1_61920"     
 "c1_62162"     
 ⋮              
 "c13_115108268"
 "c13_115108306"
 "c13_115108610"
 "c13_115108793"
 "c13_115108796"
 "c13_115108812"
 "c13_115108856"
 "c13_115108859"
 "c13_115108993"
 "c13_115109775"
 "c13_115109781"
 "c13_115109787"

We don't need the relative position of the snps in this case so we just retrieve SNP IDs.

In [10]:
snpid = map(x -> strip(string(x)), snpLMM[:, 1])

1912108-element Array{SubString{String},1}:
 "c1_54490"     
 "c1_55550"     
 "c1_57033"     
 "c1_57064"     
 "c1_57818"     
 "c1_58432"     
 "c1_58448"     
 "c1_58814"     
 "c1_59492"     
 "c1_60829"     
 "c1_61462"     
 "c1_61920"     
 "c1_62162"     
 ⋮              
 "c13_115108268"
 "c13_115108306"
 "c13_115108610"
 "c13_115108793"
 "c13_115108796"
 "c13_115108812"
 "c13_115108856"
 "c13_115108859"
 "c13_115108993"
 "c13_115109775"
 "c13_115109781"
 "c13_115109787"

Read in the SNP binary file using the SnpArray.jl package.

In [11]:
using SnpArrays
snpbinLMM = SnpArray("SNPdataLMM"; people = size(pedLMM, 1), snps = size(snpLMM, 1))

849×1912108 SnpArrays.SnpArray{2}:
 (true, true)   (false, true)   …  (true, true)   (true, true) 
 (true, true)   (false, true)      (true, true)   (true, true) 
 (true, true)   (false, true)      (true, true)   (true, true) 
 (true, true)   (true, true)       (true, true)   (true, true) 
 (true, true)   (true, true)       (true, true)   (true, true) 
 (true, true)   (true, false)   …  (true, true)   (true, true) 
 (true, true)   (true, true)       (true, true)   (true, true) 
 (true, true)   (true, true)       (true, true)   (true, true) 
 (true, true)   (false, true)      (true, true)   (true, true) 
 (true, true)   (true, true)       (true, true)   (true, true) 
 (true, true)   (false, false)  …  (false, true)  (false, true)
 (true, true)   (true, true)       (true, true)   (true, true) 
 (true, true)   (true, false)      (false, true)  (false, true)
 ⋮                              ⋱                              
 (true, true)   (true, false)      (true, true)   (true, true) 
 (tru

[1m[36mINFO: [39m[22m[36mv1.0 BED file detected
[39m

### Filtering the variant data to improve the quality of the GRM

First we check what the minor allele frequencies are. We can see by checking the quantiles that many of the loci are invariant or rather rare. By default the GRM function uses only variants with minor allele frequencies greater than 0.01 but we want to impose additional restrictions so that the MAF >0.05 and the percent success rate is >98% to avoid potential biases

In [12]:
maf, minor_allele, missings_by_snp, missings_by_person = summarize(snpbinLMM)

([0.0142012, 0.403226, 0.0711679, 0.0709951, 0.0777643, 0.0511084, 0.0517241, 0.000589623, 0.00117786, 0.0733496  …  0.0200946, 0.00117925, 0.00117925, 0.00117925, 0.0125149, 0.0125149, 0.0560191, 0.00593824, 0.00593824, 0.00475059], Bool[true, true, true, true, true, true, true, true, true, true  …  true, true, true, true, true, true, true, true, true, true], [4, 43, 27, 25, 26, 37, 37, 1, 0, 31  …  3, 1, 1, 1, 10, 10, 10, 7, 7, 7], [317, 317, 487, 3460, 407, 210, 483, 488, 413, 1100  …  588, 824, 667, 1352, 1183, 3403, 3411, 3473, 138, 2328])

In [13]:
quantile(maf, [0.0 .25 .5 .75 1.0])

1×5 Array{Float64,2}:
 0.0  0.00294464  0.0100118  0.115566  0.5

In [14]:
#first we filter out snps with genotype success rates < 98% and get the snp id's of snps with MAF>0.98
snp_idx, _ = filter(snpbinLMM, 0.98)

(Bool[true, false, false, false, false, false, false, true, true, false  …  true, true, true, true, true, true, true, true, true, true], Bool[true, true, true, true, true, true, true, true, true, true  …  true, true, true, true, true, true, true, true, true, true])

In [15]:
#now we find the index of the common snps (MAF greater than or equal to 0.05) with success rates >0.98
common_index = snp_idx .& (0.05 .≤ maf);

In [16]:
# now we put these snps into an array for use with the GRM function. 
data_common = snpbinLMM[ : , common_index]

849×635166 SnpArrays.SnpArray{2}:
 (true, true)   (true, true)  (true, true)  …  (true, true)    (true, true) 
 (true, true)   (true, true)  (true, true)     (true, true)    (true, true) 
 (true, true)   (true, true)  (true, true)     (true, true)    (true, true) 
 (true, false)  (true, true)  (true, true)     (true, true)    (true, true) 
 (true, true)   (true, true)  (true, true)     (true, true)    (true, true) 
 (false, true)  (true, true)  (true, true)  …  (false, true)   (false, true)
 (true, true)   (true, true)  (true, true)     (true, true)    (true, true) 
 (false, true)  (true, true)  (true, true)     (true, true)    (true, true) 
 (true, true)   (true, true)  (true, true)     (false, true)   (true, true) 
 (false, true)  (true, true)  (true, true)     (true, true)    (true, true) 
 (true, true)   (true, true)  (true, true)  …  (true, true)    (true, true) 
 (true, true)   (true, true)  (true, true)     (false, true)   (true, true) 
 (false, true)  (true, true)  (true, true)

## Kinship via Genetic Relationship Matrix (GRM)

Recall that in using variance components (linear mixed models) we need a measure of the relatedness among individuals. Under the GRM formulation, the estimate of the global kinship coefficient of individuals $i$ and $j$ is
$$ \widehat\Phi_{GRMij}^  = \frac{1}{2S} \sum_{k=1}^S \frac{(x_{ik} -2p_k)(x_{jk} - 2p_k)}{2 p_k (1-p_k)}$$,
where $k$ ranges over the selected $S$ SNPs, $p_k$ is the minor allele frequency of SNP $k$, and $x_{ik}$ is the number of minor alleles in individual $i$s genotype at SNP $k$.

## Calculate the GRM matrix

By default, `grm` excludes SNPs with maf < 0.01 but we will use only the common snps with good success rates. 

In [17]:
Φgrm = grm(data_common)

849×849 Array{Float64,2}:
  0.491392      0.00459472   -0.0103735   …  -0.0225547    -0.0127283  
  0.00459472    0.500652      0.00993115      0.0068817     0.00967119 
 -0.0103735     0.00993115    0.506954        0.0255708     0.00607559 
  0.0168716     0.00996551   -0.00516524     -0.012727     -0.00295819 
 -0.0191581     0.0109871     0.00293376      0.0178821     0.004559   
  0.0192688     0.0185449    -0.00111927  …  -0.0120271    -0.000526727
 -0.00254853   -0.000830856   0.00685967     -0.00452714    0.000533043
 -0.0477185    -0.00525981    0.0222129       0.0424068     0.00935023 
 -0.0240338    -0.00327511    0.0223792       0.0230544     0.0218116  
  0.000385893   0.00657127    0.00521743     -0.000500503   0.000277666
 -0.00394476    0.0054108     0.00015929  …   0.00790859    0.0163665  
 -0.00147462    0.00466637    0.00365234      0.00413579   -0.00129428 
 -0.0131411    -0.00201507    0.00271812      0.00357805    0.00471774 
  ⋮                                   

## Fit the null variance component model

Recall that we are using a variance component model with simtrait1 as the outcome. Under the null hypothesis simtrait1 is associated with sex (as a fixed effect).  We also need to account for the relatedness among individuals.  To do that we include a random effect and use the GRM matrix to describe the covariation structure. 
    $$ Y_{2i} = \mu +\beta_{sex} sex_i + A_i + e_i$$ 
    $$ A_i \sim N(0,\sigma^2_a)$$ $$e_i \sim N(0,\sigma^2_e)$$
    $$ Cov(Y_{2i},Y_{2j})=2\Phi_{ij} \sigma^2_a + 1_{i = j}\sigma^2_e$$

In [18]:
using VarianceComponentModels
# Null data model has two variance components but no SNP fixed effects

# form data as VarianceComponentVariate matrix 
#change the next commands if you want to run trait 2 or both traits (Y)
X = [ones(length(simtrait1)) sex]
#X = [ones(length(simtrait2)) sex]
nulldata = VarianceComponentVariate(Y[:,1], X, (2Φgrm, eye(length(simtrait1))))
#nulldata = VarianceComponentVariate(Y[:,2], X, (2Φgrm, eye(length(simtrait2))))

VarianceComponentModels.VarianceComponentVariate{Float64,2,Array{Float64,1},Array{Float64,2},Array{Float64,2}}([0.870473, -1.68616, 0.666344, 0.0839254, 0.786506, 1.56337, 1.08892, -1.15111, -0.365351, -0.630388  …  -0.0690388, 0.419444, 1.47836, -0.274552, 0.177762, -0.464458, 0.0336769, 1.26861, -0.319195, 0.114271], [1.0 0.0; 1.0 1.0; … ; 1.0 0.0; 1.0 1.0], ([0.982784 0.00918943 … -0.0451094 -0.0254565; 0.00918943 1.0013 … 0.0137634 0.0193424; … ; -0.0451094 0.0137634 … 1.01484 0.131558; -0.0254565 0.0193424 … 0.131558 0.994595], [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))

When we run an alternative model with the additional effects of the SNPs, it can be helpful to start from our best estimates from the null model. Initialize the variance component model parameters.

In [19]:
nullmodel = VarianceComponentModel(nulldata)

VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0; 0.0], ([1.0], [1.0]), Array{Float64}(0,2), Char[], Float64[], -Inf, Inf)

In [20]:
@time nulllogl, nullmodel, = fit_mle!(nullmodel, nulldata; algo = :FS)

(-1234.4161934338194, VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.375435; 0.127781], ([0.545733], [0.658928]), Array{Float64}(0,2), Char[], Float64[], -Inf, Inf), ([0.0890036], [0.0620838]), [0.00792164 -0.00367441; -0.00367441 0.0038544], [0.0488514; 0.0692372], [0.00238645 -0.0027781; -0.0027781 0.00479379])


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.8, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equa

In [21]:
# null model log-likelihood for no SNP effects
nulllogl

-1234.4161934338194

In [22]:
# null model mean effects - in this case a grand mean and a sex effect
nullmodel.B

2×1 Array{Float64,2}:
 0.375435
 0.127781

In [23]:
# null model additive genetic variance
nullmodel.Σ[1]

1×1 Array{Float64,2}:
 0.545733

In [24]:
# null model environmental variance
nullmodel.Σ[2]

1×1 Array{Float64,2}:
 0.658928

### Heritability 
Calculate the proportion of the variance that can be attributed to additive genetic effects, the narrow sense heritability.  We calculate it here without any SNPs included. 

In [25]:
her_null = nullmodel.Σ[1]/(nullmodel.Σ[1]+nullmodel.Σ[2])

1×1 Array{Float64,2}:
 0.453018

# Fit variance component model with the causal SNPs

## Processing the SNP data
These data were simulated under a scenario in which two snps have large main effects and they interact. First we find the indexes of the two SNPs. 

In [26]:
#ind_c1_1360580 = find(x -> x == "c1_1360580", snpid)[1]
# Use can change this SNP if you would like to assess another's snps effect on the trait, e.g.:
#ind_c13_56233373 = find(x -> x == "c13_56233373", snpid)[1]
#ind_c1_754121 = find(x -> x == "c1_754121", snpid)[1]
ind_c1_1235710 = find(x -> x == "c1_1235710", snpid)[1]
ind_c13_56233373 = find(x -> x == "c13_56233373", snpid)[1]


1542838

Now we convert the SNP data into 0, 1, or 2 copies of the minor allele and form the data for the interaction of the two SNPs. 

In [27]:
#snp_c1_1360580 = convert(Vector{Float64}, snpbinLMM[:, ind_c1_1360580])
#snp_c13_56233373 = convert(Vector{Float64}, snpbinLMM[:, ind_c13_56233373])
#snp_c1_754121 = convert(Vector{Float64}, snpbinLMM[:, ind_c1_754121])
snp_c1_1235710 = convert(Vector{Float64}, snpbinLMM[:, ind_c1_1235710])
snp_c13_56233373 = convert(Vector{Float64}, snpbinLMM[:, ind_c13_56233373])
interact = snp_c1_1235710 .* snp_c13_56233373

849-element Array{Float64,1}:
 0.0
 0.0
 1.0
 0.0
 0.0
 0.0
 0.0
 0.0
 1.0
 0.0
 0.0
 0.0
 0.0
 ⋮  
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

## Look at the effect of a Single SNP snp_c1_1235710

In [28]:
# form data as VarianceComponentVariate - put the data in a form that VarianceComponentModels can use
Xalt = [ones(length(simtrait1)) sex snp_c1_1235710]
#Xalt = [ones(length(simtrait1)) sex snp_c13_56233373]
altdata = VarianceComponentVariate(Y[:,1], Xalt, (2Φgrm, eye(length(simtrait1))))
#altdata = VarianceComponentVariate(Y[:,2], Xalt, (2Φgrm, eye(length(simtrait2))))

VarianceComponentModels.VarianceComponentVariate{Float64,2,Array{Float64,1},Array{Float64,2},Array{Float64,2}}([0.870473, -1.68616, 0.666344, 0.0839254, 0.786506, 1.56337, 1.08892, -1.15111, -0.365351, -0.630388  …  -0.0690388, 0.419444, 1.47836, -0.274552, 0.177762, -0.464458, 0.0336769, 1.26861, -0.319195, 0.114271], [1.0 0.0 2.0; 1.0 1.0 0.0; … ; 1.0 0.0 0.0; 1.0 1.0 0.0], ([0.982784 0.00918943 … -0.0451094 -0.0254565; 0.00918943 1.0013 … 0.0137634 0.0193424; … ; -0.0451094 0.0137634 … 1.01484 0.131558; -0.0254565 0.0193424 … 0.131558 0.994595], [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))

In [29]:
altmodel = VarianceComponentModel(altdata)

VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0; 0.0; 0.0], ([1.0], [1.0]), Array{Float64}(0,3), Char[], Float64[], -Inf, Inf)

### Set the starting values for the maximum likelihood estimation
Use the null model estimates as start values for the alternative model.

In [30]:
altmodel.B[1:2, :] = nullmodel.B
altmodel.B

3×1 Array{Float64,2}:
 0.375435
 0.127781
 0.0     

In [31]:
copy!(altmodel.Σ[1], nullmodel.Σ[1])
copy!(altmodel.Σ[2], nullmodel.Σ[2])
altmodel.Σ

([0.545733], [0.658928])

In [32]:
@time altlogl1, altmodel, = fit_mle!(altmodel, altdata; algo = :FS)

(-1181.9536281052985, VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([-0.0702014; 0.158022; 0.597808], ([0.409866], [0.628686]), Array{Float64}(0,3), Char[], Float64[], -Inf, Inf), ([0.0739826], [0.0557404]), [0.00547343 -0.00271693; -0.00271693 0.00310699], [0.0628492; 0.06562; 0.0562849], [0.00395002 -0.00261483 -0.00236535; -0.00261483 0.00430598 0.000166773; -0.00236535 0.000166773 0.003168])

This is Ipopt version 3.12.8, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

In [33]:
# alt model log-likelihood for the single SNP, snp_c1_1235710
altlogl1

-1181.9536281052985

In [34]:
# alt model mean effects
altmodel.B

3×1 Array{Float64,2}:
 -0.0702014
  0.158022 
  0.597808 

In [35]:
# alt model additive genetic variance
altmodel.Σ[1]

1×1 Array{Float64,2}:
 0.409866

In [36]:
# alt model environmental variance
altmodel.Σ[2]

1×1 Array{Float64,2}:
 0.628686

Notice that the additive genetic variance and the environmental variance have both decreased.

To test the significance of the SNP, we use LRT

In [37]:
using Distributions
LRT1=2(altlogl1 - nulllogl)

104.92513065704179

In [38]:
#change the degrees of freedom if running a bivariate outcome
pval_snp_c1_1235710 = ccdf(Chisq(1), LRT1)

1.2683901431398707e-24

SNP, c1_1235710, has a significant effect. 

## Check for an interaction.  
### First calculate the log likelihood for additive effects of the two snps c1_1235710 and c13_56233373 without the interaction
Repeat the steps we took above:

In [39]:
# form data as VarianceComponentVariate
Xalt2 = [ones(length(simtrait1)) sex snp_c1_1235710 snp_c13_56233373]
altdata2 = VarianceComponentVariate(Y[:,1], Xalt2, (2Φgrm, eye(length(simtrait1))))

VarianceComponentModels.VarianceComponentVariate{Float64,2,Array{Float64,1},Array{Float64,2},Array{Float64,2}}([0.870473, -1.68616, 0.666344, 0.0839254, 0.786506, 1.56337, 1.08892, -1.15111, -0.365351, -0.630388  …  -0.0690388, 0.419444, 1.47836, -0.274552, 0.177762, -0.464458, 0.0336769, 1.26861, -0.319195, 0.114271], [1.0 0.0 2.0 0.0; 1.0 1.0 0.0 0.0; … ; 1.0 0.0 0.0 0.0; 1.0 1.0 0.0 0.0], ([0.982784 0.00918943 … -0.0451094 -0.0254565; 0.00918943 1.0013 … 0.0137634 0.0193424; … ; -0.0451094 0.0137634 … 1.01484 0.131558; -0.0254565 0.0193424 … 0.131558 0.994595], [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))

In [40]:
altmodel2 = VarianceComponentModel(altdata2)

VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0; 0.0; 0.0; 0.0], ([1.0], [1.0]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)

In [41]:
altmodel2.B[1:2, :] = nullmodel.B
altmodel2.B

4×1 Array{Float64,2}:
 0.375435
 0.127781
 0.0     
 0.0     

In [42]:
copy!(altmodel2.Σ[1], nullmodel.Σ[1])
copy!(altmodel2.Σ[2], nullmodel.Σ[2])
altmodel2.Σ

([0.545733], [0.658928])

In [43]:
@time altlogl2, altmodel2, = fit_mle!(altmodel2, altdata2; algo = :FS)

(-1175.347072731107, VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([-0.156858; 0.16176; 0.607078; 0.391102], ([0.373105], [0.639098]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf), ([0.0707492], [0.0552127]), [0.00500545 -0.00256215; -0.00256215 0.00304845], [0.066827; 0.0653129; 0.0555974; 0.106377], [0.00446585 -0.00261422 -0.00236734 -0.00252455; -0.00261422 0.00426578 0.000166618 0.000114563; -0.00236734 0.000166618 0.00309107 0.00028717; -0.00252455 0.000114563 0.00028717 0.011316])

This is Ipopt version 3.12.8, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

In [44]:
altlogl2

-1175.347072731107

In [45]:
# alt model mean effects
altmodel2.B

4×1 Array{Float64,2}:
 -0.156858
  0.16176 
  0.607078
  0.391102

In [46]:
# alt model additive variance
altmodel2.Σ[1]

1×1 Array{Float64,2}:
 0.373105

In [47]:
# alt model environmental variance
altmodel2.Σ[2]

1×1 Array{Float64,2}:
 0.639098

### Test whether the addition of the second SNP improves the model fit by comparing the loglikelihood with just snp c1_1235710 to the loglikelihood with both snp_c1_1235710 and snp_c13_56233373s

In [48]:
using Distributions
LRT2=2(altlogl2 - altlogl1)

13.213110748383315

In [49]:
#change the degrees of freedom if running a bivariate outcome
pval_snp_c13_56233373 = ccdf(Chisq(2), LRT2)

0.0013514794817585925

## Check for evidence of an interaction between the two SNPs

In [50]:
# form data as VarianceComponentVariate
Xalt3 = [ones(length(simtrait1)) sex snp_c13_56233373 snp_c1_1235710 interact]
altdata3 = VarianceComponentVariate(Y[:,1], Xalt3, (2Φgrm, eye(length(simtrait1))))

VarianceComponentModels.VarianceComponentVariate{Float64,2,Array{Float64,1},Array{Float64,2},Array{Float64,2}}([0.870473, -1.68616, 0.666344, 0.0839254, 0.786506, 1.56337, 1.08892, -1.15111, -0.365351, -0.630388  …  -0.0690388, 0.419444, 1.47836, -0.274552, 0.177762, -0.464458, 0.0336769, 1.26861, -0.319195, 0.114271], [1.0 0.0 … 2.0 0.0; 1.0 1.0 … 0.0 0.0; … ; 1.0 0.0 … 0.0 0.0; 1.0 1.0 … 0.0 0.0], ([0.982784 0.00918943 … -0.0451094 -0.0254565; 0.00918943 1.0013 … 0.0137634 0.0193424; … ; -0.0451094 0.0137634 … 1.01484 0.131558; -0.0254565 0.0193424 … 0.131558 0.994595], [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))

Use the results of the two snp additive model as the starting point for the interaction model

In [51]:
altmodel3 = VarianceComponentModel(altdata3)

VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0; 0.0; … ; 0.0; 0.0], ([1.0], [1.0]), Array{Float64}(0,5), Char[], Float64[], -Inf, Inf)

In [52]:
altmodel3.B[1:4, :] = altmodel2.B
altmodel3.B

5×1 Array{Float64,2}:
 -0.156858
  0.16176 
  0.607078
  0.391102
  0.0     

In [53]:
copy!(altmodel3.Σ[1], altmodel2.Σ[1])
copy!(altmodel3.Σ[2], altmodel2.Σ[2])
altmodel3.Σ

([0.373105], [0.639098])

In [54]:
@time altlogl3, altmodel3, = fit_mle!(altmodel3, altdata3; algo = :FS)

(-1170.5669963336966, VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([-0.0979883; 0.151745; … ; 0.539129; 0.390367], ([0.380276], [0.624391]), Array{Float64}(0,5), Char[], Float64[], -Inf, Inf), ([0.0707463], [0.0544729]), [0.00500504 -0.00253206; -0.00253206 0.00296729], [0.0690825; 0.0649542; … ; 0.0595761; 0.125716], [0.00477239 -0.00264052 … -0.00276467 0.00238124; -0.00264052 0.00421905 … 0.000235547 -0.00040567; … ; -0.00276467 0.000235547 … 0.00354931 -0.00275334; 0.00238124 -0.00040567 … -0.00275334 0.0158045])

This is Ipopt version 3.12.8, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

In [55]:
altlogl3

-1170.5669963336966

In [56]:
# alt model mean effects
altmodel3.B

5×1 Array{Float64,2}:
 -0.0979883
  0.151745 
  0.161075 
  0.539129 
  0.390367 

In [57]:
# alt model additive variance
altmodel3.Σ[1]

1×1 Array{Float64,2}:
 0.380276

In [58]:
# alt model environmental variance
altmodel3.Σ[2]

1×1 Array{Float64,2}:
 0.624391

Test whether the interaction improves the model fit over the effects of the two SNPs alone

In [59]:
using Distributions
LRT3=2(altlogl3 - altlogl2)

9.560152794820624

In [60]:
#change the degrees of freedom if running a bivariate outcome
pval_snp_interact = ccdf(Chisq(1), LRT3)

0.0019884655663902845

Residual Heritability. The proportion of additive genetic variation remaining after including the SNPs and their interaction in the model.  

In [61]:
# ignore if running a bivariate outcome
her_alt=altmodel3.Σ[1]/(altmodel3.Σ[1]+altmodel3.Σ[2])

1×1 Array{Float64,2}:
 0.378509

Portion of the genetic variation explained by the snp is a measure of the effect of the snp on a signal trait. Note that in this simulated example the SNP effect is very large indeed. 

In [62]:
add_proport=(nullmodel.Σ[1]-altmodel3.Σ[1])/nullmodel.Σ[1]

1×1 Array{Float64,2}:
 0.303183

Portion of total variation explained by the snp is an alterative to the above. Again, typically the effects are not nearly so large.  

In [63]:
pheno_proport=(nullmodel.Σ[1]+nullmodel.Σ[2]-altmodel3.Σ[1]-altmodel3.Σ[2])/(nullmodel.Σ[1]+nullmodel.Σ[2])

1×1 Array{Float64,2}:
 0.166017