# Test SnpArray Linear Algebra

According to [SnpArray documentation](https://openmendel.github.io/SnpArrays.jl/dev/#Linear-Algebra), there is at least 2 ways one can perform linear algebra on a SnpArray. This notebook tests which method is better and compare them to standard BLAS operations (default 8 BLAS threads).

In [1]:
using Revise
using SnpArrays
using BenchmarkTools
using LinearAlgebra

┌ Info: Precompiling SnpArrays [4e780e97-f5bf-4111-9dc4-b70aaf691b06]
└ @ Base loading.jl:1278
[ Info: Compiling VCF parser...
┌ Info: Precompiling BenchmarkTools [6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf]
└ @ Base loading.jl:1278


In [2]:
# load test data (no missing)
const EUR = SnpArray(SnpArrays.datadir("EUR_subset.bed"));

# convert to SnpLinAlg and SnpBitMatrix
const EURsla = SnpLinAlg{Float64}(EUR, model=ADDITIVE_MODEL, center=true, scale=true);
const EURsla_ = SnpLinAlg{Float64}(EUR, model=ADDITIVE_MODEL, center=true, scale=true, impute=false);
const EURbm = SnpBitMatrix{Float64}(EUR, model=ADDITIVE_MODEL, center=true, scale=true);

In [3]:
Threads.nthreads()

1

## Matrix vector multiplication (Xv)

In [4]:
v1 = randn(size(EUR, 1))
v2 = randn(size(EUR, 2))
A = convert(Matrix{Float64}, EUR, model=ADDITIVE_MODEL, center=true, scale=true);

SnpLinAlg

In [5]:
@benchmark mul!($v1, $EURsla, $v2)

BenchmarkTools.Trial: 
  memory estimate:  8.39 KiB
  allocs estimate:  161
  --------------
  minimum time:     8.461 ms (0.00% GC)
  median time:      9.418 ms (0.00% GC)
  mean time:        9.239 ms (0.00% GC)
  maximum time:     11.144 ms (0.00% GC)
  --------------
  samples:          542
  evals/sample:     1

SnpBitMatrix

In [6]:
@benchmark mul!($v1, $EURbm, $v2)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     28.202 ms (0.00% GC)
  median time:      31.474 ms (0.00% GC)
  mean time:        31.262 ms (0.00% GC)
  maximum time:     33.134 ms (0.00% GC)
  --------------
  samples:          160
  evals/sample:     1

BLAS

In [7]:
@benchmark mul!($v1, $A, $v2)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.606 ms (0.00% GC)
  median time:      12.410 ms (0.00% GC)
  mean time:        12.260 ms (0.00% GC)
  maximum time:     15.217 ms (0.00% GC)
  --------------
  samples:          408
  evals/sample:     1

SnpLinAlg is clearly fastest, even faster than BLAS.

## Tranpose matrix vector multiplication (X'v)

SnpLinAlg

In [8]:
@benchmark mul!($v2, $EURsla', $v1)

BenchmarkTools.Trial: 
  memory estimate:  4.53 KiB
  allocs estimate:  83
  --------------
  minimum time:     7.374 ms (0.00% GC)
  median time:      8.068 ms (0.00% GC)
  mean time:        8.060 ms (0.00% GC)
  maximum time:     12.003 ms (0.00% GC)
  --------------
  samples:          621
  evals/sample:     1

SnpBitMatrix

In [9]:
@benchmark mul!($v2, $EURbm', $v1)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     23.022 ms (0.00% GC)
  median time:      24.236 ms (0.00% GC)
  mean time:        24.366 ms (0.00% GC)
  maximum time:     31.778 ms (0.00% GC)
  --------------
  samples:          206
  evals/sample:     1

BLAS

In [10]:
@benchmark mul!($v2, $A', $v1)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.023 ms (0.00% GC)
  median time:      9.551 ms (0.00% GC)
  mean time:        9.616 ms (0.00% GC)
  maximum time:     11.236 ms (0.00% GC)
  --------------
  samples:          520
  evals/sample:     1

Contrary to [documentation](https://openmendel.github.io/SnpArrays.jl/dev/#Linear-Algebra), both $Ax$ and $A'x$ is faster on `SnpLinAlg`.

## Does SnpLinAlg require more memory?

[SnpBitMatrix](https://github.com/OpenMendel/SnpArrays.jl/blob/master/src/linalg_bitmatrix.jl) implementation definitely requires allocating 2 `BitMatrix`s, so memory usage is 2 bits per genotype. However it seems like a [SnpLinAlg](https://github.com/OpenMendel/SnpArrays.jl/blob/master/src/linalg_direct.jl) is instantiated from the original `SnpArray`. Does SnpLinAlg require more memory than just the SnpArray?

In [12]:
@show Base.summarysize(EUR)
@show Base.summarysize(EURsla)
@show Base.summarysize(EURsla_)
@show Base.summarysize(EURbm);

Base.summarysize(EUR) = 6876757
Base.summarysize(EURsla) = 8177245
Base.summarysize(EURsla_) = 8177245
Base.summarysize(EURbm) = 6421960


Seems like SnpLinAlg requires 25% more memory (2.5 bit per entry).

## mul! on a @view SnpLinAlg

SnpLinAlg behaves like a regular array, and hence, we can use view on it. Let's test performance on a viewed SnpLinAlg.

In [26]:
EURsla_sub = @view(EURsla[1:2:379, 1:2:54051]); # every other row and col

v1 = randn(size(EURsla, 1))
v2 = randn(size(EURsla_sub, 1))
v3 = randn(size(EURsla, 2))
v4 = randn(size(EURsla_sub, 2));

Full SnpLinAlg

In [27]:
@benchmark mul!($v1, $EURsla, $v3)

BenchmarkTools.Trial: 
  memory estimate:  8.39 KiB
  allocs estimate:  161
  --------------
  minimum time:     7.837 ms (0.00% GC)
  median time:      8.678 ms (0.00% GC)
  mean time:        8.964 ms (0.00% GC)
  maximum time:     10.882 ms (0.00% GC)
  --------------
  samples:          558
  evals/sample:     1

Viewed SnpLinAlg

In [28]:
@benchmark mul!($v2, $EURsla_sub, $v4)

LoadError: getindex not defined for SnpLinAlg{Float64}