# Test SnpArray Linear Algebra

According to [SnpArray documentation](https://openmendel.github.io/SnpArrays.jl/dev/#Linear-Algebra), there is at least 2 ways one can perform linear algebra on a SnpArray. This notebook tests which method is better and compare them to standard BLAS operations (default 8 BLAS threads).

In [1]:
using Revise
using SnpArrays
using BenchmarkTools
using LinearAlgebra

└ @ Revise /Users/biona001/.julia/packages/Revise/qxX5H/src/Revise.jl:1336


In [2]:
# load test data (no missing)
const EUR = SnpArray(SnpArrays.datadir("EUR_subset.bed"));

# convert to SnpLinAlg and SnpBitMatrix
const EURsla = SnpLinAlg{Float64}(EUR, model=ADDITIVE_MODEL, center=true, scale=true);
const EURsla_ = SnpLinAlg{Float64}(EUR, model=ADDITIVE_MODEL, center=true, scale=true, impute=false);
const EURbm = SnpBitMatrix{Float64}(EUR, model=ADDITIVE_MODEL, center=true, scale=true);

In [3]:
Threads.nthreads()

1

## Matrix vector multiplication (Xv)

In [4]:
v1 = randn(size(EUR, 1))
v2 = randn(size(EUR, 2))
A = convert(Matrix{Float64}, EUR, model=ADDITIVE_MODEL, center=true, scale=true);

SnpLinAlg

In [11]:
@benchmark mul!($v1, $EURsla, $v2)

BenchmarkTools.Trial: 
  memory estimate:  8.39 KiB
  allocs estimate:  161
  --------------
  minimum time:     7.924 ms (0.00% GC)
  median time:      8.662 ms (0.00% GC)
  mean time:        8.621 ms (0.00% GC)
  maximum time:     11.010 ms (0.00% GC)
  --------------
  samples:          581
  evals/sample:     1

SnpBitMatrix

In [12]:
@benchmark mul!($v1, $EURbm, $v2)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     27.062 ms (0.00% GC)
  median time:      29.694 ms (0.00% GC)
  mean time:        29.321 ms (0.00% GC)
  maximum time:     32.492 ms (0.00% GC)
  --------------
  samples:          171
  evals/sample:     1

BLAS

In [13]:
@benchmark mul!($v1, $A, $v2)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.672 ms (0.00% GC)
  median time:      12.260 ms (0.00% GC)
  mean time:        12.167 ms (0.00% GC)
  maximum time:     22.166 ms (0.00% GC)
  --------------
  samples:          411
  evals/sample:     1

SnpLinAlg is clearly fastest.

## Tranpose matrix vector multiplication (X'v)

SnpLinAlg

In [16]:
@benchmark mul!($v2, $EURsla', $v1)

BenchmarkTools.Trial: 
  memory estimate:  4.53 KiB
  allocs estimate:  83
  --------------
  minimum time:     7.257 ms (0.00% GC)
  median time:      7.777 ms (0.00% GC)
  mean time:        8.040 ms (0.00% GC)
  maximum time:     12.704 ms (0.00% GC)
  --------------
  samples:          622
  evals/sample:     1

SnpBitMatrix

In [17]:
@benchmark mul!($v2, $EURbm', $v1)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     22.155 ms (0.00% GC)
  median time:      22.694 ms (0.00% GC)
  mean time:        23.002 ms (0.00% GC)
  maximum time:     32.510 ms (0.00% GC)
  --------------
  samples:          218
  evals/sample:     1

BLAS

In [18]:
@benchmark mul!($v2, $A', $v1)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.027 ms (0.00% GC)
  median time:      9.198 ms (0.00% GC)
  mean time:        9.539 ms (0.00% GC)
  maximum time:     26.159 ms (0.00% GC)
  --------------
  samples:          524
  evals/sample:     1

Contrary to [documentation](https://openmendel.github.io/SnpArrays.jl/dev/#Linear-Algebra), both $Ax$ and $A'x$ is faster on `SnpLinAlg`.

## Does SnpLinAlg require more memory?

[SnpBitMatrix](https://github.com/OpenMendel/SnpArrays.jl/blob/master/src/linalg_bitmatrix.jl) implementation definitely requires allocating 2 `BitMatrix`s, so memory usage doubles. However it seems like a [SnpLinAlg](https://github.com/OpenMendel/SnpArrays.jl/blob/master/src/linalg_direct.jl) is instantiated from the original `SnpArray`. Thus perhaps SnpLinAlg does NOT require extra memory? Let's test this. 

In [26]:
X = SnpArray(undef, 100, 100)
Xsla = SnpLinAlg{Float64}(X, model=ADDITIVE_MODEL, center=true, scale=true);

@show X[1:10]
@show Xsla.s[1:10]
Xsla.s[1:10] .= 0x03
@show X[1:10]
@show Xsla.s[1:10];

X[1:10] = UInt8[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
Xsla.s[1:10] = UInt8[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
X[1:10] = UInt8[0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03]
Xsla.s[1:10] = UInt8[0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03]


They clearly refer to the same SnpArray. Thus it seems like SnpLinAlg is better suited for MendelIHT due to 2x memory savings. 