In [1]:
using Revise
using MendelIHT
using SnpArrays
using Random
using GLM
using DelimitedFiles
using Test
using Distributions
using LinearAlgebra
using CSV
using DataFrames
using StatsBase
BLAS.set_num_threads(1) # remember to set BLAS threads to 1 !!!
#     using TraitSimulation, OrdinalMultinomialModels, VarianceComponentModels

┌ Info: Precompiling MendelIHT [921c7187-1484-5754-b919-5d3ed9ac03c4]
└ @ Base loading.jl:1317


# Univariate Gaussian trait

In [2]:
n = 1000  # number of samples
p = 10000 # number of SNPs
k = 10    # number of causal SNPs per trait
d = Normal
l = canonicallink(d())

# set random seed for reproducibility
Random.seed!(2021)

# simulate `.bed` file with no missing data
x = simulate_random_snparray(undef, n, p)
xla = SnpLinAlg{Float64}(x, model=ADDITIVE_MODEL, center=true, scale=true) 

# intercept is the only nongenetic covariate
z = ones(n)
intercept = 1.0

# simulate response y, true model b, and the correct non-0 positions of b
y, true_b, correct_position = simulate_random_response(xla, k, d, l, Zu=z*intercept);

## Run IHT

In [21]:
@time result = fit_iht(y, xla, z, k=10, init_beta=true)

****                   MendelIHT Version 1.4.1                  ****
****     Benjamin Chu, Kevin Keys, Chris German, Hua Zhou       ****
****   Jin Zhou, Eric Sobel, Janet Sinsheimer, Kenneth Lange    ****
****                                                            ****
****                 Please cite our paper!                     ****
****         https://doi.org/10.1093/gigascience/giaa044        ****

Initializing β to univariate regression values...
...completed in 0.1 seconds.

Running sparse linear regression
Number of threads = 1
Link functin = IdentityLink()
Sparsity parameter (k) = 10
Prior weight scaling = off
Doubly sparse projection = off
Debias = off
Max IHT iterations = 200
Converging when tol < 0.0001 and iteration ≥ 5:

Iteration 1: loglikelihood = -2486.6803542089983, backtracks = 0, tol = 0.5168146007666207
Iteration 2: loglikelihood = -1576.084706218103, backtracks = 0, tol = 0.3348001846027596
Iteration 3: loglikelihood = -1482.661472264235, backtracks = 0, t


IHT estimated 10 nonzero SNP predictors and 1 non-genetic predictors.

Compute time (sec):     0.03817605972290039
Final loglikelihood:    -1472.3905616484403
SNP PVE:                0.8426997403655612
Iterations:             8

Selected genetic predictors:
[1m10×2 DataFrame[0m
[1m Row [0m│[1m Position [0m[1m Estimated_β [0m
[1m     [0m│[90m Int64    [0m[90m Float64     [0m
─────┼───────────────────────
   1 │      782    -0.437816
   2 │      901     0.747927
   3 │     1204     0.691428
   4 │     1306    -1.425
   5 │     1655    -0.194702
   6 │     3160    -0.86171
   7 │     3936    -0.147419
   8 │     4201     0.338541
   9 │     4402    -0.126501
  10 │     6879    -1.21893

Selected nongenetic predictors:
[1m1×2 DataFrame[0m
[1m Row [0m│[1m Position [0m[1m Estimated_β [0m
[1m     [0m│[90m Int64    [0m[90m Float64     [0m
─────┼───────────────────────
   1 │        1      1.02016

## Check answer

In [9]:
[true_b[correct_position] result.beta[correct_position]]

10×2 Matrix{Float64}:
 -0.402269   -0.437816
  0.758756    0.747927
  0.729135    0.691428
 -1.47163    -1.425
 -0.172668   -0.194702
 -0.847906   -0.86171
  0.296183    0.338541
 -0.0034339   0.0
  0.125965    0.0
 -1.24972    -1.21893

In [10]:
# non genetic covariates
[result.c intercept]

1×2 Matrix{Float64}:
 1.02016  1.0

## Test Cross validation

In [6]:
Threads.nthreads()

1

In [19]:
Random.seed!(2020)
@time mses = cv_iht(y, xla, z, path=0:20, init_beta=true);

****                   MendelIHT Version 1.4.1                  ****
****     Benjamin Chu, Kevin Keys, Chris German, Hua Zhou       ****
****   Jin Zhou, Eric Sobel, Janet Sinsheimer, Kenneth Lange    ****
****                                                            ****
****                 Please cite our paper!                     ****
****         https://doi.org/10.1093/gigascience/giaa044        ****



[32mCross validating...100%|████████████████████████████████| Time: 0:00:13[39m




Crossvalidation Results:
	k	MSE
	0	1097.9822679456634
	1	639.948435659851
	2	639.948435659851
	3	491.31091160253413
	4	414.7982590934014
	5	307.63048180391786
	6	266.95374125278414
	7	242.0572267640577
	8	236.23211459381542
	9	243.58873069893227
	10	245.6937243742038
	11	246.44391892259432
	12	254.05026743790478
	13	256.2080803045909
	14	260.3857536819678
	15	260.5867303978344
	16	270.1163325764251
	17	275.10575084293146
	18	276.73604590979255
	19	280.43703442081045
	20	272.60846960582063

Best k = 8

 14.263994 seconds (30.21 M allocations: 7.031 GiB, 5.04% gc time)
