<h1>Table of Contents<span class="tocSkip"></span></h1>

<div class="toc"><ul class="toc-item"><li><span><a href="#VarianceComponentModels.jl" data-toc-modified-id="VarianceComponentModels.jl-1">VarianceComponentModels.jl</a></span><ul class="toc-item"><li><span><a href="#Package-Features" data-toc-modified-id="Package-Features-1.1">Package Features</a></span></li></ul></li><li><span><a href="#Heritability-Analysis" data-toc-modified-id="Heritability-Analysis-2">Heritability Analysis</a></span><ul class="toc-item"><li><span><a href="#Data-files" data-toc-modified-id="Data-files-2.1">Data files</a></span></li><li><span><a href="#Read-in-binary-SNP-data" data-toc-modified-id="Read-in-binary-SNP-data-2.2">Read in binary SNP data</a></span></li><li><span><a href="#EUR_subset" data-toc-modified-id="EUR_subset-2.3"><code>EUR_subset</code></a></span></li><li><span><a href="#Empirical-kinship-matrix" data-toc-modified-id="Empirical-kinship-matrix-2.4">Empirical kinship matrix</a></span></li><li><span><a href="#Simulating-phenotypes" data-toc-modified-id="Simulating-phenotypes-2.5">Simulating phenotypes</a></span></li><li><span><a href="#Phenotypes" data-toc-modified-id="Phenotypes-2.6">Phenotypes</a></span></li><li><span><a href="#Pre-processing-data-for-heritability-analysis" data-toc-modified-id="Pre-processing-data-for-heritability-analysis-2.7">Pre-processing data for heritability analysis</a></span></li><li><span><a href="#Heritability-of-single-trait" data-toc-modified-id="Heritability-of-single-trait-2.8">Heritability of single trait</a></span></li><li><span><a href="#Multivariate-trait-analysis" data-toc-modified-id="Multivariate-trait-analysis-2.9">Multivariate trait analysis</a></span></li><li><span><a href="#Exercise" data-toc-modified-id="Exercise-2.10">Exercise</a></span></li></ul></li><li><span><a href="#Testing-SNP-association-using-maximum-likelihoods-of-variance-component-models" data-toc-modified-id="Testing-SNP-association-using-maximum-likelihoods-of-variance-component-models-3">Testing SNP association using maximum likelihoods of variance component models</a></span><ul class="toc-item"><li><span><a href="#Fit-the-null-model" data-toc-modified-id="Fit-the-null-model-3.1">Fit the null model</a></span></li><li><span><a href="#Fit-the-alternative-model" data-toc-modified-id="Fit-the-alternative-model-3.2">Fit the alternative model</a></span></li><li><span><a href="#Likelihood-ratio-test" data-toc-modified-id="Likelihood-ratio-test-3.3">Likelihood ratio test</a></span></li><li><span><a href="#Exercise" data-toc-modified-id="Exercise-3.4">Exercise</a></span></li></ul></li></ul></div>

# Heritability analysis and testing SNP association using maximum likelihoods of variance component models 

**ASHG OpenMendel Workshop**

**Juhyun Kim, juhkim111@ucla.edu**

**Department of Biostatistics, UCLA**

**Oct 2020**

Machine information:

In [305]:
versioninfo()

Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9920X CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)


## VarianceComponentModels.jl



### Package Features 
* [Heritability analysis for quantitative traits](https://openmendel.github.io/VarianceComponentModels.jl/latest/man/heritability/#Heritability-Analysis-1) in genetics
* [SNP association testing]()
* Maximum likelihood estimation (MLE) and restricted maximum likelihood estimation (REML) of mean parameters $B$ and variance component parameters $Σ$
* Allow constraints in the mean parameters $B$
* Choice of optimization algorithms: [Fisher scoring](https://books.google.com/books?id=QYqeYTftPNwC&lpg=PP1&pg=PA142#v=onepage&q&f=false) and [minorization-maximization algorithm](http://hua-zhou.github.io/media/pdf/ZhouHuZhouLange19VCMM.pdf)

[`VarianceComponentModels.jl`](https://github.com/OpenMendel/VarianceComponentModels.jl/) is a package that resides in [OpenMendel](https://github.com/OpenMendel) ecosystem. It implements computation routines for fitting and testing variance component model of form 

$$Y \sim \text{Normal}(XB, \sigma_1^2 V_1 + \cdots + \sigma_m^2 V_m)$$


In this model, **data** is represented by 

* $Y$: continuously varying quantitative trait(s)
* $X$: covariates (e.g. sex, age, principal components)
* $V_i$: structuring matrix corresponding to $i$-th variance component $(i=1,\ldots, m)$

and **parameters** are

* $B$: mean fixed effects coefficient
* $\sigma_i^2$: $i$-th variance component $(i=1,\ldots, m)$

## Heritability Analysis

Variance component estimation can be used to estimate heritability of a quantitative trait. 
Here we decompose the overall phenotypic variance into genetic and environmental components:

$$\Omega = \sigma_a^2 (2\Phi) + \sigma_e^2 I $$

where 

- $\sigma_a^2$: aggregate additive genetic effects of an unspecified number of loci at unknown locations in the genome 
- $\sigma_e^2$: unique, unshared environmental effects 
- $2\Phi$: genetic relationship matrix; twice the kinship coefficient
- $I$: identity matrix, which is ones down the diagonal and zeros everywhere else.


Then the (narrow-sense) heritability is defined as 

$$h^2 = \frac{\sigma_a^2}{\sigma_a^2 + \sigma_e^2}.$$


### Data files

For this analysis, we use a sample data set [`EUR_subset`](https://openmendel.github.io/SnpArrays.jl/latest/#Example-data-1) from `SnpArrays.jl`. This data set is available in the `data` folder of the package. Note that we are using a small dataset for the purpose of this tutorial, but you can use a dataset much larger than the example data set. 




In [306]:
using SnpArrays

In [307]:
datapath = normpath(SnpArrays.datadir())

"/home/juhkim111/.julia/packages/SnpArrays/YOSme/data"

`EUR_subset.bed`, `EUR_subset.bim`, and `EUR_subset.fam` is a set of Plink files in binary format.

In [308]:
using Glob
readdir(glob"EUR_subset.*", datapath)

3-element Array{String,1}:
 "/home/juhkim111/.julia/packages/SnpArrays/YOSme/data/EUR_subset.bed"
 "/home/juhkim111/.julia/packages/SnpArrays/YOSme/data/EUR_subset.bim"
 "/home/juhkim111/.julia/packages/SnpArrays/YOSme/data/EUR_subset.fam"

### Read in binary SNP data 

We use the [`SnpArrays.jl`](https://openmendel.github.io/SnpArrays.jl/latest) package to read in binary SNP data and compute the empirical kinship matrix. 

In [309]:
# read in genotype data from Plink binary file
const EUR_subset = SnpArray(SnpArrays.datadir("EUR_subset.bed"))



379×54051 SnpArray:
 0x03  0x03  0x03  0x02  0x02  0x03  …  0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x02  0x03  0x02  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x03  0x03  0x03     0x02  0x02  0x02  0x03  0x03  0x02
 0x03  0x03  0x03  0x00  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x00  0x03  0x03     0x02  0x02  0x02  0x03  0x03  0x03
 0x02  0x03  0x03  0x03  0x03  0x03  …  0x03  0x03  0x03  0x03  0x03  0x02
 0x02  0x03  0x03  0x02  0x02  0x03     0x03  0x03  0x02  0x02  0x03  0x03
 0x02  0x03  0x03  0x03  0x02  0x02     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x00  0x02  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x02  0x03  0x03  0x02  0x03  0x02     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x02  0x03  0x03  …  0x03  0x03  0x02  0x02  0x03  0x03
 0x03  0x03  0x03  0x02  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x02
 0x03  0x02  0x03  0x02  0x02  0x03     0x03  0x03  0x03  0x03  0x03  0x03
    ⋮

### `EUR_subset` 

`EUR_subset` contains **379** individuals and **54,051** SNPs. There is no missing genotype in `EUR_subset`.

Minor allele frequencies (MAF) for each SNP:

In [310]:
maf_EUR = maf(EUR_subset)

54051-element Array{Float64,1}:
 0.09762532981530347
 0.01319261213720313
 0.04485488126649073
 0.48944591029023743
 0.32189973614775724
 0.09102902374670185
 0.3733509234828496
 0.05277044854881263
 0.0554089709762533
 0.11345646437994727
 0.20448548812664913
 0.16226912928759896
 0.27176781002638517
 ⋮
 0.341688654353562
 0.13192612137203164
 0.24802110817941958
 0.21240105540897103
 0.12532981530343013
 0.13192612137203164
 0.07387862796833777
 0.07783641160949872
 0.13588390501319259
 0.0554089709762533
 0.01319261213720313
 0.02638522427440637

Histogram of minor allele frequency:

![image info](./hist_MAF.png)

The above plot can be generated by using the command commented out below: 

In [342]:
# using Plots
# hist_maf = histogram(maf_EUR, xlab = "Minor Allele Frequency (MAF)", 
#                    ylab = "Number of SNPs", label="")
# png(hist_maf, "hist_MAF.png") 

Note that about 29% of SNPs have their MAF < 0.05. 

In [311]:
count(!iszero, maf_EUR .< 0.05) / length(maf_EUR)

0.2914839688442397

### Empirical kinship matrix

For a measure of relatedness, we compute empirical kinship matrix based on all SNPs by the genetic relation matrix (GRM). If there are missing genotypes, they are imputed on the fly by drawing according to the minor allele frequencies.

Kinship coefficients summarize genetic similarity between pairs of individuals. To estimate kinship coefficient $\Phi_{ij}$ between individuals $i$ and $j$ using GRM:

$$\widehat{\Phi}_{GRMij} = \frac{1}{2S} \sum_{k=1}^S \frac{(G_{ik}-2p_k)(G_{jk}-2p_k)}{2p_k(1-p_k)},$$

where 

* $S$: number of SNPs in this set
* $p_k$: minor allele frequency of SNP $k$
* $G_{ik} \in \{0,1,2\}$: number of copies of minor alleles at the $k$-th SNP of the $i$-th individual. 

This matrix is known as the classical kinship matrix. 

The following command constructs the empirical kinship matrix using the above formula. Note that `grm` command excludes SNPs with minor allele frequency below 0.01. This can be changed by the keyword argument `minmaf`. For more information, see [SnpArrays.jl documentation](https://openmendel.github.io/SnpArrays.jl/latest/#Genetic-relationship-matrix-(GRM)).

In [312]:
## GRM using SNPs with maf > 0.01 (default) 
Φgrm = grm(EUR_subset; method = :GRM) # classical genetic relationship matrix

379×379 Array{Float64,2}:
  0.526913     -0.010026     -0.0012793    …   0.00536883    0.00713397
 -0.010026      0.500049      0.00147092      -0.00178778   -0.00344277
 -0.0012793     0.00147092    0.521904        -0.0109387    -0.00262695
 -0.00239381    0.00550462    0.00755985      -0.00265867   -0.000141742
 -0.00391296    0.00422806    0.0222034       -0.0107694    -0.00248895
 -0.000555581   0.000696874   0.0125771    …  -0.0100831    -0.00575495
 -0.0095376     0.00231344   -0.00259641      -0.00282701    0.000732385
 -0.00823869    0.00556861    0.0060825       -0.00911662   -0.00638629
  0.00117402   -0.00444907   -0.0029182       -0.00244795    0.00634087
 -0.0111617     0.00436269    0.000537307     -0.00483523   -0.00621726
 -0.00252813   -0.000626719   0.00753937   …  -0.00180836    0.00714953
  0.0112036    -0.0024306     0.00446458      -0.00983116   -0.00296109
 -0.000451414   0.00707358   -0.00620136      -0.00473171   -0.00720874
  ⋮                                 

You can compute the empirical kinship matrix using other methods: the method of moment method, `grm(A, model=:MoM)`, or the robust method, `grm(A, model=:Robust)`. Uncomment the following commands and try them for yourself!

In [313]:
# Φgrm = grm(EUR_subset; method = :MoM) # method of moment method
# Φgrm = grm(EUR_subset; method = :Robust) # robust method

### Simulating phenotypes 


We simulate phenotype vector from

$$\mathbf{y} \sim \text{Normal}(\mathbf{1}, 0.1 \widehat{\Phi}_{GRM} + 0.9 \mathbf{I})$$

where $\widehat{\Phi}_{GRM}$ is the estimated empirical kinship matrix `Φgrm`. 

The data should be available in `pheno.txt`.

### Phenotypes 

Read in the phenotype data and plot a histogram.

In [314]:
using DelimitedFiles 
pheno = readdlm("pheno.txt")

379×1 Array{Float64,2}:
  1.846582104608307
  0.12019614558345848
  0.5172368025545149
  0.11933401051509973
  1.8407354203053767
  3.1553094044176166
  1.518422163488851
  0.737544574135081
  1.4904102203720164
  0.4942945743765428
  0.456648703052165
  0.9830094325553045
  1.1241872723791884
  ⋮
  3.2200310377598873
  2.581586233004705
  0.6636455378448316
  1.807461571000498
  1.067894878511225
 -0.06137646537271868
  0.016827921246634125
  0.5801978239606949
  0.08351113077031458
  0.04408173384902825
  0.7937573165253659
  0.9279379783077415

Histogram of phenotype values:

![image info](./hist_pheno.png)

The above plot can be generated and saved using the command commented out below: 

In [315]:
# using Plots
# hist_pheno = histogram(pheno, xlab="Phenotype", ylab="Frequency", label="") # generate histogram
# png(hist_pheno, "hist_pheno.png") # save as png file 

### Pre-processing data for heritability analysis

First load the VarianceComponentModels package.

In [316]:
using VarianceComponentModels

Our covariate matrix includes a vector of ones, age, and sex. Both age and sex vectors are randomly generated.  

In [317]:
# load package for random number generation
using Random
Random.seed!(123)
# no. of observations 
nobs = size(pheno, 1)
# radomly generate covariates 
age = 9*randn(nobs) .+ 30
sex = bitrand(nobs) .+ 1
# covariate matrix 
X = [ones(nobs) sex age]

379×3 Array{Float64,2}:
 1.0  2.0  40.7124
 1.0  1.0  48.4336
 1.0  1.0  40.2839
 1.0  1.0  34.1347
 1.0  2.0  26.4299
 1.0  1.0  24.0176
 1.0  2.0  38.8287
 1.0  1.0  29.3207
 1.0  1.0  32.4643
 1.0  2.0  28.2519
 1.0  1.0  26.9457
 1.0  1.0  22.4051
 1.0  1.0  21.9996
 ⋮         
 1.0  1.0  38.2156
 1.0  2.0  20.3147
 1.0  2.0  19.8511
 1.0  2.0  20.2451
 1.0  2.0  22.1222
 1.0  2.0  25.6851
 1.0  1.0  22.2179
 1.0  2.0  21.5676
 1.0  1.0  31.895
 1.0  1.0  29.1916
 1.0  2.0  42.7991
 1.0  2.0  32.5419

Now form an instance of VarianceComponentVariate by gathering our "data".  We plug in the phenotype vector `pheno`, covariate matrix `X` and two matrices corresponding to each variance component $(2\Phi, I)$: `(2Φgrm, Matrix(1.0I, nobs, nobs))`.

In [318]:
# form instance of VarianceComponentVariate
using LinearAlgebra
EURdata = VarianceComponentVariate(pheno, X, (2Φgrm, Matrix(1.0I, nobs, nobs)))

VarianceComponentVariate{Float64,2,Array{Float64,2},Array{Float64,2},Array{Float64,2}}([1.846582104608307; 0.12019614558345848; … ; 0.7937573165253659; 0.9279379783077415], [1.0 2.0 40.712410928876494; 1.0 1.0 48.43361737010316; … ; 1.0 2.0 42.79905841925565; 1.0 2.0 32.541938843137075], ([1.0538262213132035 -0.02005203927901296 … 0.010737654900514978 0.014267936018676418; -0.02005203927901296 1.0000975487266064 … -0.00357555339105328 -0.006885530334701236; … ; 0.010737654900514978 -0.00357555339105328 … 0.9859187797469418 0.026340992959466743; 0.014267936018676418 -0.006885530334701236 … 0.026340992959466743 1.0243852451056221], [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))

### Heritability estimation of single trait 

Before fitting the variance component model, we pre-compute the eigen-decomposition of $2\Phi_{\text{GRM}}$, the rotated responses, and the constant part in log-likelihood, and store them as a TwoVarCompVariateRotate instance, which is re-used in various variane component estimation procedures.

We use Fisher scoring algorithm to fit variance component model for our trait. 

In [319]:
# pre-compute eigen-decomposition 
@time EURdata_rotated = TwoVarCompVariateRotate(EURdata)
fieldnames(typeof(EURdata_rotated))

# form data set for trait 
trait_data = TwoVarCompVariateRotate(EURdata_rotated.Yrot, 
    EURdata_rotated.Xrot, EURdata_rotated.eigval, EURdata_rotated.eigvec, 
    EURdata_rotated.logdetV2)

# initialize model parameters
trait_model = VarianceComponentModel(trait_data)

# estimate variance components
_, _, _, Σcov, = mle_fs!(trait_model, trait_data; solver=:Ipopt, verbose=false)
σ2a = trait_model.Σ[1][1] # additive genetic variance 
σ2e = trait_model.Σ[2][1] # environmental variance 
@show σ2a, σ2e

  0.034314 seconds (18 allocations: 3.443 MiB)
(σ2a, σ2e) = (0.16153996925054054, 0.7622642974157461)


(0.16153996925054054, 0.7622642974157461)

Additive genetic variance:

In [320]:
σ2a

0.16153996925054054

Environmental/non-genetic variance:

In [321]:
σ2e

0.7622642974157461

In [322]:
# heritability and its standard error 
h, hse = heritability(trait_model.Σ, Σcov)
[h[1], hse[1]]

2-element Array{Float64,1}:
 0.1748638484140006
 0.5116600593153141

We can also run MM algorithm. 

In [323]:
trait_model = VarianceComponentModel(trait_data)
@time _, _, _, Σcov, = mle_mm!(trait_model, trait_data; verbose=false)
σ2a = trait_model.Σ[1][1]
σ2e = trait_model.Σ[2][1]
@show σ2a, σ2e

  1.171228 seconds (111.83 k allocations: 9.029 MiB, 1.60% gc time)
(σ2a, σ2e) = (0.17312796952899193, 0.7507276711209097)


(0.17312796952899193, 0.7507276711209097)

Heritability and its standard error.

In [324]:
h, hse = heritability(trait_model.Σ, Σcov)
[h[1], hse[1]]

2-element Array{Float64,1}:
 0.18739721003078164
 0.5052219467970395

### Multivariate trait analysis

For the joint analysis of multiple traits, see [`VarianceComponentModels.jl` documentation](https://openmendel.github.io/VarianceComponentModels.jl/latest/man/heritability/) and  [`OpenMendel` tutorial](https://github.com/OpenMendel/Tutorials/blob/master/Heritability/HERITABILITY-VCexample.ipynb).

## Testing SNP association using maximum likelihoods of variance component models
credit: [Tutorial by Sarah Ji, Janet Sinsheimer and Hua Zhou](https://github.com/OpenMendel/Tutorials/blob/master/Heritability/HERITABILITY-VCexample.ipynb)

Suppose we want to see a group of SNPs has an effect on a given phenotype after accounting for relatedness among individuals. Here we fit variance component model with SNPs as fixed effect. 

$$\hspace{5em}  \mathbf{y} = \mathbf{X}\mathbf{\beta} + \mathbf{G}_s \gamma + \mathbf{g} + \mathbf{\epsilon} \hspace{5em} (1)$$

\begin{equation}
\begin{array}{ll}
\mathbf{g} \sim N(\mathbf{0}, \sigma_g^2\mathbf{\Phi}) \\
\mathbf{\epsilon} \sim N(\mathbf{0}, \sigma_e^2\mathbf{I})
\end{array}
\end{equation}

$$\hspace{5em}  \mathbf{y} = \mathbf{X}\mathbf{\beta} + \mathbf{G}_s \gamma + \mathbf{g} + \mathbf{\epsilon}, \hspace{1em} \mathbf{g} \sim N(\mathbf{0}, \sigma_g^2\mathbf{\Phi}), \hspace{1em} \mathbf{\epsilon} \sim N(\mathbf{0}, \sigma_e^2\mathbf{I})$$

where 

* $\mathbf{y}$: phenotype 
* Fixed effects:
    * $\mathbf{X}$: matrix of covariates including intercept
    * $\beta$: vector of covariate effects, including intercept
    * $\mathbf{G}_s$: genotype matrix of SNPs of interest 
    * $\gamma$: association parameter of interest, measuring the effect of genotype on phenotype  
* Random effects:
    * $\mathbf{g}$: random vector of polygenic effects with $\mathbf{g} \sim N(\mathbf{0}, \sigma_g^2 \mathbf{\Phi})$
        * $\sigma_g^2$: additive genetic variance
        * $\mathbf{\Phi}$: matrix of pairwise measures of genetic relatedness 
    * $\epsilon$: random vector with $\epsilon \sim N(\mathbf{0}, \sigma_e^2\mathbf{I})$
        * $\sigma_e^2$: non-genetic variance due to non-genetic effects assumed to be acting independently on individuals



To test whether our SNPs are associated with phenotype, we fit two models. First consider the model without SNPs as fixed effects (aka null model): 

$$\hspace{5em}  \mathbf{y} = \mathbf{X}\mathbf{\beta} + \mathbf{g} + \mathbf{\epsilon} \hspace{5em} (2)$$

and the model with SNPs as fixed effects (1). Then we can compare the log likelihood to see if there is improvement in the model fit with inclusion of our SNPs of interest. 

### Fit the null model

In [325]:
using VarianceComponentModels

In [326]:
# null data model has two variance components but no SNP fixed effects
X = [ones(nobs) sex age]
nulldata = VarianceComponentVariate(pheno, X, (2Φgrm, Matrix(1.0I, nobs, nobs)))
nullmodel = VarianceComponentModel(nulldata)
@time nulllogl, nullmodel, = fit_mle!(nullmodel, nulldata; algo=:FS)

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

(-522.7995032788442, VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([1.0982528445105832; -0.049790965461479236; -0.0011173387859553952], ([0.16153996925054054], [0.7622642974157461]), Array{Float64}(undef,0,3), Char[], Float64[], -Inf, Inf), ([0.28637160399826644], [0.29037628696484363]), [0.08200869557653992 -0.08091120209956988; -0.0809112020995699 0.08431838803148921], [0.22756326068408744; 0.09903256896133299; 0.005218403921070817], [0.05178503761317394 -0.01572358847762115 -0.0008603839232000639; -0.015723588477620322 0.009807449715081174 2.7208545465831223e-5; -0.000860383923200106 2.7208545465857677e-5 2.7231739483447272e-5])

The null model log-likelihood (no SNP effects)

In [327]:
nulllogl

-522.7995032788442

The null model mean effects (a grand mean)

In [328]:
nullmodel.B

3×1 Array{Float64,2}:
  1.0982528445105832
 -0.049790965461479236
 -0.0011173387859553952

The null model additive genetic variance

In [329]:
nullmodel.Σ[1]

1×1 Array{Float64,2}:
 0.16153996925054054

The null model environmental variance

In [330]:
nullmodel.Σ[2]

1×1 Array{Float64,2}:
 0.7622642974157461

### Fit the alternative model

For the alternative model, we include genotypes of SNPs of interest in the covariate matrix.

In [331]:
snp_mat = convert(Matrix{Float64}, EUR_subset[:, 10:20]) 
Xalt = [ones(nobs) sex age snp_mat] 
altdata = VarianceComponentVariate(pheno, Xalt, (2Φgrm, Matrix(1.0I, nobs, nobs)))
altmodel = VarianceComponentModel(altdata)
@time altlogl, altmodel, = fit_mle!(altmodel, altdata; algo=:FS)

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

(-514.9307100515854, VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([-0.5650772534154904; -0.04175952202257888; … ; 0.24813567380735943; -0.22240223729988207], ([0.36160628542343937], [0.5266911593044307]), Array{Float64}(undef,0,14), Char[], Float64[], -Inf, Inf), ([0.282704223783742], [0.2795753961755862]), [0.07992167814516807 -0.07694976655753723; -0.07694976655753723 0.07816240214673596], [1.3865711843978776; 0.09836535838981278; … ; 0.2505911842018573; 0.25472360006392125], [1.9225796494025333 -0.01681528153698582 … -0.13499832245206594 -0.12824841110772908; -0.016815281537015045 0.009675743731156311 … 0.0019277350041501063 -0.0018897539417217569; … ; -0.13499832245239285 0.0019277350041509665 … 0.06279594159968915 -0.022119153796151553; -0.12824841110814747 -0.001889753941720035 … -0.022119153796144195 0.06488411242952451])

The alternative model log-likelihood:

In [332]:
altlogl

-514.9307100515854

The alternative model mean effects 

In [333]:
altmodel.B

14×1 Array{Float64,2}:
 -0.5650772534154904
 -0.04175952202257888
 -0.0020631936965069325
  0.13798788088282812
 -0.06458855409089669
  0.02775601692768474
  0.15120382884135317
  0.06392636975031145
  0.04805343647025799
  0.05097779763588679
 -0.01656719306048999
  0.18457721192458673
  0.24813567380735943
 -0.22240223729988207

The alternative model additive genetic variance

In [334]:
altmodel.Σ[1]

1×1 Array{Float64,2}:
 0.36160628542343937

The alternative model environmental variance

In [335]:
altmodel.Σ[2]

1×1 Array{Float64,2}:
 0.5266911593044307

### Likelihood ratio test 

We use likelihood ratio test (LRT) to test the goodness-of-fit between two models. 

Our likelihood ratio test statistic is around 15.7 (distributed chi-squared), with 11 degrees of freedom.

In [336]:
using Distributions
LRT = 2(altlogl - nulllogl)

15.737586454517668

The associated p-value:

In [337]:
pval = ccdf(Chisq(11), LRT)

0.15115585964284117

We see that adding the SNPs as covariates to the model does not fit significantly better than the null model. In other words, the SNPs do not explain more of the variation in our trait.

### Exercise

Use minorization-maximization algorithm (`algo=:MM`) to find MLEs of both null model and alternative model. Then conduct the likelihood ratio test.