# Lets profile `fit` function for univariate and multivariate

In [1]:
using Revise
using MendelIHT
using SnpArrays
using Random
using GLM
using DelimitedFiles
using Test
using Distributions
using LinearAlgebra
using CSV
using DataFrames
using StatsBase
using Profile
# using ProfileView
BLAS.set_num_threads(1) # remember to set BLAS threads to 1 !!!

┌ Info: Precompiling MendelIHT [921c7187-1484-5754-b919-5d3ed9ac03c4]
└ @ Base loading.jl:1317


## Univariate response with SnpLinAlg

In [2]:
n = 1000  # number of samples
p = 10000 # number of SNPs
k = 10    # number of causal SNPs per trait
d = Normal
l = canonicallink(d())

# set random seed for reproducibility
Random.seed!(2021)

# simulate `.bed` file with no missing data
x = simulate_random_snparray(undef, n, p)
xla = SnpLinAlg{Float64}(x, model=ADDITIVE_MODEL, center=true, scale=true) 

# intercept is the only nongenetic covariate
z = ones(n)
intercept = 1.0

# simulate response y, true model b, and the correct non-0 positions of b
Y, true_b, correct_position = simulate_random_response(xla, k, d, l, Zu=z*intercept);

In [9]:
Random.seed!(2020)
@time result = fit_iht(Y, xla, z, init_beta=true);
speed_per_iter = result.time / result.iter

****                   MendelIHT Version 1.4.1                  ****
****     Benjamin Chu, Kevin Keys, Chris German, Hua Zhou       ****
****   Jin Zhou, Eric Sobel, Janet Sinsheimer, Kenneth Lange    ****
****                                                            ****
****                 Please cite our paper!                     ****
****         https://doi.org/10.1093/gigascience/giaa044        ****

Initializing β to univariate regression values...
...completed in 0.1 seconds.

Running sparse linear regression
Number of threads = 1
Link functin = IdentityLink()
Sparsity parameter (k) = 10
Prior weight scaling = off
Doubly sparse projection = off
Debias = off
Max IHT iterations = 200
Converging when tol < 0.0001 and iteration ≥ 5:

Iteration 1: loglikelihood = -2466.2371277080556, backtracks = 0, tol = 0.5295306543287241
Iteration 2: loglikelihood = -1579.16025278937, backtracks = 0, tol = 0.34661836758541936
Iteration 3: loglikelihood = -1489.9150515502452, backtracks = 0, 

0.004685987125743519

Univariate IHT runs at 0.004 seconds per iteration. Let's profile our fit function.

In [10]:
fit_iht(Y, xla, z, init_beta=true, verbose=false)
Profile.clear()
@profile fit_iht(Y, xla, z, init_beta=true, verbose=false);
Profile.print()

Overhead ╎ [+additional indent] Count File:Line; Function
  ╎57  @Base/task.jl:406; (::IJulia.var"#15#18")()
  ╎ 57  @IJulia/src/eventloop.jl:8; eventloop(socket::ZMQ.Socket)
  ╎  57  @Base/essentials.jl:706; invokelatest
  ╎   57  @Base/essentials.jl:708; #invokelatest#2
  ╎    57  .../execute_request.jl:67; execute_request(socket::ZMQ.Soc...
  ╎     57  .../SoftGlobalScope.jl:65; softscope_include_string(m::Mo...
  ╎    ╎ 57  @Base/loading.jl:1094; include_string(mapexpr::type...
 8╎    ╎  57  @Base/boot.jl:360; eval
  ╎    ╎   6   ...iler/typeinfer.jl:921; typeinf_ext_toplevel(mi::Cor...
  ╎    ╎    6   ...iler/typeinfer.jl:925; typeinf_ext_toplevel(interp...
  ╎    ╎     6   ...iler/typeinfer.jl:892; typeinf_ext(interp::Core.Co...
  ╎    ╎    ╎ 6   ...ler/typeinfer.jl:209; typeinf(interp::Core.Compi...
  ╎    ╎    ╎  5   ...ler/typeinfer.jl:214; _typeinf(interp::Core.Com...
  ╎    ╎    ╎   5   ...terpretation.jl:1520; typeinf_nocycle(interp::...
  ╎    ╎    ╎    2   ...terpretation

**Conclusion:** 
+ `init_beta` took 4/43 samples (line 96 of `fit.jl`). 
+ `score!` took 33/43 samples (line 247 of `fit.jl` which is the `score!` function, specifically on the line `mul!(v.df, Transpose(x), v.r)` (i.e. computing the gradient which requires full genotype matrix times dense vector). This is expected.

## Multivariate response with SnpLinAlg (2 traits)

In [11]:
n = 1000  # number of samples
p = 10000 # number of SNPs
k = 10    # number of causal SNPs per trait
r = 2     # number of traits

# set random seed for reproducibility
Random.seed!(2021)

# simulate `.bed` file with no missing data
x = simulate_random_snparray(undef, n, p)
xla = SnpLinAlg{Float64}(x, model=ADDITIVE_MODEL, center=true, scale=true) 

# intercept is the only nongenetic covariate
z = ones(n, 1)
intercepts = randn(r)' # each trait have different intercept

# simulate response y, true model b, and the correct non-0 positions of b
Y, true_Σ, true_b, correct_position = simulate_random_response(xla, k, r, Zu=z*intercepts, overlap=2);

In [15]:
Random.seed!(2020)
Yt = Matrix(Y')
Zt = Matrix(z')
@time result = fit_iht(Yt, Transpose(xla), Zt, init_beta=true);
speed_per_iter = result.time / result.iter

****                   MendelIHT Version 1.4.1                  ****
****     Benjamin Chu, Kevin Keys, Chris German, Hua Zhou       ****
****   Jin Zhou, Eric Sobel, Janet Sinsheimer, Kenneth Lange    ****
****                                                            ****
****                 Please cite our paper!                     ****
****         https://doi.org/10.1093/gigascience/giaa044        ****

Initializing β to univariate regression values...
...completed in 0.3 seconds.

Running sparse Multivariate Gaussian regression
Number of threads = 1
Link functin = IdentityLink()
Sparsity parameter (k) = 10
Prior weight scaling = off
Doubly sparse projection = off
Debias = off
Max IHT iterations = 200
Converging when tol < 0.0001 and iteration ≥ 5:

Iteration 1: loglikelihood = -201.6215159825589, backtracks = 0, tol = 0.0
Iteration 2: loglikelihood = 333.5651708910011, backtracks = 0, tol = 0.11348309227050864
Iteration 3: loglikelihood = 372.19186911431996, backtracks = 0, to

0.010966845921107702

Multivariate IHT runs at 0.01 seconds per iteration with 2 traits. Let's profile our fit function.

In [16]:
fit_iht(Yt, Transpose(xla), Zt, init_beta=true, verbose=false)
Profile.clear()
@profile fit_iht(Yt, Transpose(xla), Zt, init_beta=true, verbose=false);
Profile.print()

Overhead ╎ [+additional indent] Count File:Line; Function
  ╎73  @Base/task.jl:406; (::IJulia.var"#15#18")()
  ╎ 73  @IJulia/src/eventloop.jl:8; eventloop(socket::ZMQ.Socket)
  ╎  73  @Base/essentials.jl:706; invokelatest
  ╎   73  @Base/essentials.jl:708; #invokelatest#2
  ╎    73  .../execute_request.jl:67; execute_request(socket::ZMQ.Soc...
  ╎     73  .../SoftGlobalScope.jl:65; softscope_include_string(m::Mo...
  ╎    ╎ 73  @Base/loading.jl:1094; include_string(mapexpr::type...
  ╎    ╎  73  @Base/boot.jl:360; eval
  ╎    ╎   73  @MendelIHT/src/fit.jl:72; (::MendelIHT.var"#fit_iht##k...
  ╎    ╎    9   ...delIHT/src/fit.jl:96; fit_iht(y::Matrix{Float64},...
  ╎    ╎     9   @Base/timing.jl:287; macro expansion
  ╎    ╎    ╎ 9   ...ta_structures.jl:116; initialize
  ╎    ╎    ╎  9   ...ta_structures.jl:123; #initialize#1
  ╎    ╎    ╎   9   ...multivariate.jl:393; init_iht_indices!(v::Mend...
  ╎    ╎    ╎    9   ...multivariate.jl:62; score!(v::MendelIHT.mIHTV...
  ╎    ╎    ╎     

**Conclusion**: 
+ `score!` consumes 61/73 samples on `fit_iht!`, specifically, the line `SnpArrays.mul!(v.p_by_r, v.X, v.n_by_r)` which compute `SnpLinAlg`-matrix multiplication. This is expected. 
+ `init_beta` consumes 9/73 samples from `fit_iht`, which is ok. 
+ `iht_stepsize!` (which inverts the covariance matrix) is not even sampled at all. 

## Multivariate response with SnpLinAlg (2 traits)

In [19]:
n = 1000  # number of samples
p = 10000 # number of SNPs
k = 10    # number of causal SNPs per trait
r = 10    # number of traits

# set random seed for reproducibility
Random.seed!(2021)

# simulate `.bed` file with no missing data
x = simulate_random_snparray(undef, n, p)
xla = SnpLinAlg{Float64}(x, model=ADDITIVE_MODEL, center=true, scale=true) 

# intercept is the only nongenetic covariate
z = ones(n, 1)
intercepts = randn(r)' # each trait have different intercept

# simulate response y, true model b, and the correct non-0 positions of b
Y, true_Σ, true_b, correct_position = simulate_random_response(xla, k, r, Zu=z*intercepts, overlap=0);

In [21]:
Random.seed!(2020)
Yt = Matrix(Y')
Zt = Matrix(z')
@time result = fit_iht(Yt, Transpose(xla), Zt, init_beta=true);
speed_per_iter = result.time / result.iter

****                   MendelIHT Version 1.4.1                  ****
****     Benjamin Chu, Kevin Keys, Chris German, Hua Zhou       ****
****   Jin Zhou, Eric Sobel, Janet Sinsheimer, Kenneth Lange    ****
****                                                            ****
****                 Please cite our paper!                     ****
****         https://doi.org/10.1093/gigascience/giaa044        ****

Initializing β to univariate regression values...
...completed in 1.0 seconds.

Running sparse Multivariate Gaussian regression
Number of threads = 1
Link functin = IdentityLink()
Sparsity parameter (k) = 10
Prior weight scaling = off
Doubly sparse projection = off
Debias = off
Max IHT iterations = 200
Converging when tol < 0.0001 and iteration ≥ 5:

Iteration 1: loglikelihood = -6804.448066023764, backtracks = 0, tol = 0.0
Iteration 2: loglikelihood = -6777.540091483411, backtracks = 0, tol = 0.058246770583647925
Iteration 3: loglikelihood = -6772.864909530544, backtracks = 0, 

0.03809549411137899

In [22]:
fit_iht(Yt, Transpose(xla), Zt, init_beta=true, verbose=false)
Profile.clear()
@profile fit_iht(Yt, Transpose(xla), Zt, init_beta=true, verbose=false);
Profile.print()

Overhead ╎ [+additional indent] Count File:Line; Function
  2╎2    @Base/reduce.jl:233; mapreduce_impl(f::typeof(abs2),...
   ╎409  @Base/task.jl:406; (::IJulia.var"#15#18")()
   ╎ 409  ...lia/src/eventloop.jl:8; eventloop(socket::ZMQ.Socket)
   ╎  409  @Base/essentials.jl:706; invokelatest
   ╎   409  @Base/essentials.jl:708; #invokelatest#2
   ╎    409  .../execute_request.jl:67; execute_request(socket::ZMQ.So...
   ╎     409  ...SoftGlobalScope.jl:65; softscope_include_string(m::M...
   ╎    ╎ 409  @Base/loading.jl:1094; include_string(mapexpr::typ...
   ╎    ╎  409  @Base/boot.jl:360; eval
   ╎    ╎   409  ...delIHT/src/fit.jl:72; (::MendelIHT.var"#fit_iht##...
   ╎    ╎    33   ...elIHT/src/fit.jl:96; fit_iht(y::Matrix{Float64}...
   ╎    ╎     33   @Base/timing.jl:287; macro expansion
   ╎    ╎    ╎ 33   ...a_structures.jl:116; initialize
   ╎    ╎    ╎  33   ...a_structures.jl:123; #initialize#1
   ╎    ╎    ╎   1    ...multivariate.jl:354; init_iht_indices!(v::Men...
   ╎    ╎ 

**Conclusion**: 
+ `init_beta` consumes 33/409 samples from `fit_iht`, which is ok. 
+ `score!` consumes 356/409 samples on `fit_iht!`, specifically, the line `SnpArrays.mul!(v.p_by_r, v.X, v.n_by_r)` which compute `SnpLinAlg`-matrix multiplication. This is expected. 
+ `iht_stepsize!` (which inverts the covariance matrix) is not even sampled at all. 

# Profile numeric matrices

## Univariate case

In [17]:
n = 1000  # number of samples
p = 10000 # number of SNPs
k = 10    # number of causal SNPs per trait
d = Normal
l = canonicallink(d())

# set random seed for reproducibility
Random.seed!(2021)

# simulate `.bed` file with no missing data
x = randn(n, p)

# intercept is the only nongenetic covariate
z = ones(n)
intercept = 1.0

# simulate response y, true model b, and the correct non-0 positions of b
Y, true_b, correct_position = simulate_random_response(x, k, d, l, Zu=z*intercept);

# run IHT
Random.seed!(2020)
@time result = fit_iht(Y, x, z);
speed_per_iter = result.time / result.iter

****                   MendelIHT Version 1.4.0                  ****
****     Benjamin Chu, Kevin Keys, Chris German, Hua Zhou       ****
****   Jin Zhou, Eric Sobel, Janet Sinsheimer, Kenneth Lange    ****
****                                                            ****
****                 Please cite our paper!                     ****
****         https://doi.org/10.1093/gigascience/giaa044        ****

Running sparse linear regression
Link functin = IdentityLink()
Sparsity parameter (k) = 10
Prior weight scaling = off
Doubly sparse projection = off
Debias = off
Max IHT iterations = 200
Converging when tol < 0.0001:

Iteration 1: loglikelihood = -1496.713961529228, backtracks = 0, tol = 0.42407482809494373
Iteration 2: loglikelihood = -1440.396428956392, backtracks = 0, tol = 0.11860044142480099
Iteration 3: loglikelihood = -1437.2177092287425, backtracks = 0, tol = 0.02302770129518664
Iteration 4: loglikelihood = -1437.1722070780304, backtracks = 0, tol = 0.0027889369108425118

0.008611162503560385

## Multivariate case

In [46]:
n = 1000  # number of samples
p = 10000 # number of SNPs
k = 10    # number of causal SNPs per trait
r = 2     # number of traits

# set random seed for reproducibility
Random.seed!(2022)

# simulate `.bed` file with no missing data
x = rand(0.:2., n, p)

# intercept is the only nongenetic covariate
z = ones(n, 1)
intercepts = randn(r)' # each trait have different intercept

# simulate response y, true model b, and the correct non-0 positions of b
Y, true_Σ, true_b, correct_position = simulate_random_response(x, k, r, Zu=z*intercepts, overlap=2);

# run IHT
Random.seed!(2020)
Yt = Matrix(Y')
Zt = Matrix(z')
Xt = Matrix(x')
@time result = fit_iht(Yt, Transpose(x), Zt, k = 20, max_iter=500);
speed_per_iter = result.time / result.iter

****                   MendelIHT Version 1.4.0                  ****
****     Benjamin Chu, Kevin Keys, Chris German, Hua Zhou       ****
****   Jin Zhou, Eric Sobel, Janet Sinsheimer, Kenneth Lange    ****
****                                                            ****
****                 Please cite our paper!                     ****
****         https://doi.org/10.1093/gigascience/giaa044        ****

Running sparse Multivariate Gaussian regression
Link functin = IdentityLink()
Sparsity parameter (k) = 20
Prior weight scaling = off
Doubly sparse projection = off
Debias = off
Max IHT iterations = 500
Converging when tol < 0.0001:

Iteration 1: loglikelihood = 191.73602048640578, backtracks = 0, tol = 0.2292371125117518
Iteration 2: loglikelihood = 762.9277376172406, backtracks = 0, tol = 0.021360037643612592
Iteration 3: loglikelihood = 884.2331365763932, backtracks = 0, tol = 0.02091394090715768
Iteration 4: loglikelihood = 971.5768695868555, backtracks = 0, tol = 0.025650042

0.024981919833070017

In [47]:
# first beta
β1 = result.beta[1, :]
true_b1_idx = findall(!iszero, true_b[:, 1])
[β1[true_b1_idx] true_b[true_b1_idx, 1]]

3×2 Array{Float64,2}:
 -1.32085   -1.50987
 -0.457323  -0.619427
 -1.80573   -1.84578

In [48]:
# second beta
β2 = result.beta[2, :]
true_b2_idx = findall(!iszero, true_b[:, 2])
[β2[true_b2_idx] true_b[true_b2_idx, 2]]

7×2 Array{Float64,2}:
 -0.257583  -0.308054
 -0.665576  -0.54334
  0.629238   0.586241
  0.0       -0.0183675
 -1.49299   -1.51484
 -0.329601  -0.360181
  2.04007    2.0501

In [49]:
# non genetic covariates
[result.c intercepts']

2×2 Array{Float64,2}:
 -0.326767   0.900301
  0.0       -0.151044

In [50]:
# covariance matrix
[vec(result.Σ) vec(true_Σ)]

4×2 Array{Float64,2}:
  1.73172   1.69161
 -1.62934  -1.57315
 -1.62934  -1.57315
  1.77269   1.69077