# HaploADMIXTURE.jl

This software package is an open-source Julia implementation of HaploADMIXTURE, ancestry inference by modeling haplotypes. By modeling haplotypes, we use information between nearby SNPs, obtaining more accurate ancestry estimates.

It supports acceleartion through multithreading and graphic processing units (GPUs). By directly utilizing the data format of the PLINK BED file, the memory usage is highly efficient. 

It estimates ancestry with maximum-likelihood method for a large SNP genotype datasets, where individuals are assumed to be unrelated. The input is binary PLINK 1 BED-formatted file (`.bed`). Also, you will need an idea of $K$, the number of ancestral populations. One possible way to figure out a good value of $K$ is through Akaike information criterion. If the number of SNPs is too large, you may choose to run on a subset of SNPs selected by their information content, using the blockwise [sparse $K$-means via feature ranking](https://github.com/kose-y/SKFR.jl) (SKFR) method. When SKFR is applied, it selects given number of blocks of two nearby SNPs.

## Installation

This package requires Julia v1.7 or later, which can be obtained from
<https://julialang.org/downloads/> or by building Julia from the sources in the
<https://github.com/JuliaLang/julia> repository.

The package can be installed by running the following code:
```julia
using Pkg
pkg"add https://github.com/kose-y/SKFR.jl"
pkg"add https://github.com/OpenMendel/OpenADMIXTURE.jl"
pkg"add https://github.com/OpenMendel/HaploADMIXTURE.jl"
```
For running the examples below, the following are also necessary. 
```julia
pkg"add SnpArrays DelimitedFiles StableRNGs"
```

For GPU support, an Nvidia GPU is required. Also, the following package has to be installed:
```julia
pkg"add CUDA"
```

## Basic Usage

We first import necessary packages:

In [1]:
using LinearAlgebra, Random, SnpArrays, StableRNGs
using HaploADMIXTURE
using DelimitedFiles

We will use the PLINK file included in the `SnpArrays` package, whose path is obtained by:

In [2]:
filename = SnpArrays.datadir("EUR_subset.bed");

This file contains information on 54,051 single nucleotide polymorphisms (SNP)s of 379 samples. The main driver function for admixture proportion estimation is called `run_admixture()`. 

In [3]:
d, clusters, aims = HaploADMIXTURE.run_admixture(filename, 379, 27025, 4; T=Float64, use_gpu=false, rng=StableRNG(7856), admix_rtol=1e-5)

Using /home/kose/.julia/packages/SnpArrays/lx5Kb/src/../data/EUR_subset.bed as input.
Loading genotype data...
Loaded 379 samples and 27025 SNPs


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 1, ll: -1.4510327066059288e7
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 2, ll: -1.381644526909224e7
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 3, ll: -1.3630461295722805e7
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 4, ll: -1.3583905487448066e7
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 5, ll: -1.3560929046308124e7


initial ll: -1.3560929046308124e7
  0.850404 seconds (4 allocations: 64 bytes)
  0.000264 seconds (27 allocations: 12.156 KiB)
  0.998668 seconds (4 allocations: 64 bytes)
  0.134181 seconds (27 allocations: 12.344 KiB)
  0.849196 seconds (4 allocations: 64 bytes)
  0.000252 seconds (27 allocations: 12.156 KiB)
  0.997638 seconds (4 allocations: 64 bytes)
  0.132407 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 1: ll=-1.3442143002068464e7, reldiff = 0.008759432619551937, ll_basic=-1.3439421296893205e7, ll_qn=-1.3442143002068464e7


  4.927956 seconds (123.29 k allocations: 21.585 MiB, 0.53% compilation time)


  0.851355 seconds (4 allocations: 64 bytes)
  0.000242 seconds (27 allocations: 12.156 KiB)
  0.998224 seconds (4 allocations: 64 bytes)
  0.122307 seconds (27 allocations: 12.344 KiB)
  0.851050 seconds (4 allocations: 64 bytes)
  0.000246 seconds (27 allocations: 12.156 KiB)
  0.998467 seconds (4 allocations: 64 bytes)
  0.121218 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 2: ll=-1.3383453322742362e7, reldiff = 0.004366095444533735, ll_basic=-1.3388259907460274e7, ll_qn=-1.3383453322742362e7


  4.859687 seconds (116.22 k allocations: 24.416 MiB)


  0.848979 seconds (4 allocations: 64 bytes)
  0.000273 seconds (27 allocations: 12.156 KiB)
  0.998561 seconds (4 allocations: 64 bytes)
  0.112486 seconds (27 allocations: 12.344 KiB)
  0.848001 seconds (4 allocations: 64 bytes)
  0.000256 seconds (27 allocations: 12.156 KiB)
  0.998448 seconds (4 allocations: 64 bytes)
  0.110201 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 3: ll=-1.3363811617880553e7, reldiff = 0.0014676111156177924, ll_basic=-1.3367645714833554e7, ll_qn=-1.3363811617880553e7


  4.916361 seconds (116.22 k allocations: 27.727 MiB, 1.25% gc time)


  0.851307 seconds (4 allocations: 64 bytes)
  0.000274 seconds (27 allocations: 12.156 KiB)
  0.998512 seconds (4 allocations: 64 bytes)
  0.110176 seconds (27 allocations: 12.344 KiB)
  0.851169 seconds (4 allocations: 64 bytes)
  0.000260 seconds (27 allocations: 12.156 KiB)
  0.997667 seconds (4 allocations: 64 bytes)
  0.104784 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 4: ll=-1.3355847390914964e7, reldiff = 0.0005959547465435272, ll_basic=-1.3355426940109456e7, ll_qn=-1.3355847390914964e7


  4.845048 seconds (116.22 k allocations: 27.727 MiB)


  0.851652 seconds (4 allocations: 64 bytes)
  0.000275 seconds (27 allocations: 12.156 KiB)
  0.998577 seconds (4 allocations: 64 bytes)
  0.103790 seconds (27 allocations: 12.344 KiB)
  0.851105 seconds (4 allocations: 64 bytes)
  0.000274 seconds (27 allocations: 12.156 KiB)
  0.997456 seconds (4 allocations: 64 bytes)
  0.107239 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 5: ll=-1.3350276186759759e7, reldiff = 0.0004171359549222248, ll_basic=-1.3349978349386934e7, ll_qn=-1.3350276186759759e7


  4.839004 seconds (116.22 k allocations: 27.727 MiB)


  0.851309 seconds (4 allocations: 64 bytes)
  0.000283 seconds (27 allocations: 12.156 KiB)
  0.998756 seconds (4 allocations: 64 bytes)
  0.105777 seconds (27 allocations: 12.344 KiB)
  0.851073 seconds (4 allocations: 64 bytes)
  0.000346 seconds (27 allocations: 12.156 KiB)
  0.998307 seconds (4 allocations: 64 bytes)
  0.106074 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 6: ll=-1.3347625577409597e7, reldiff = 0.00019854340937084056, ll_basic=-1.3348060337232983e7, ll_qn=-1.3347625577409597e7


  4.872304 seconds (116.22 k allocations: 27.727 MiB, 0.65% gc time)


  0.851336 seconds (4 allocations: 64 bytes)
  0.000558 seconds (27 allocations: 12.156 KiB)
  0.998534 seconds (4 allocations: 64 bytes)
  0.105784 seconds (27 allocations: 12.344 KiB)
  0.851012 seconds (4 allocations: 64 bytes)
  0.000684 seconds (27 allocations: 12.156 KiB)
  0.997427 seconds (4 allocations: 64 bytes)
  0.102341 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 7: ll=-1.3346229272063218e7, reldiff = 0.00010461076678253966, ll_basic=-1.3346683678589227e7, ll_qn=-1.3346229272063218e7


  4.835209 seconds (116.22 k allocations: 27.727 MiB)


  0.851328 seconds (4 allocations: 64 bytes)
  0.000576 seconds (27 allocations: 12.156 KiB)
  0.998478 seconds (4 allocations: 64 bytes)
  0.101554 seconds (27 allocations: 12.344 KiB)
  0.851036 seconds (4 allocations: 64 bytes)
  0.000621 seconds (27 allocations: 12.156 KiB)
  0.998357 seconds (4 allocations: 64 bytes)
  0.104624 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 8: ll=-1.3345567893618163e7, reldiff = 4.95554535721753e-5, ll_basic=-1.3345705325338118e7, ll_qn=-1.3345567893618163e7


  4.871630 seconds (116.22 k allocations: 27.727 MiB, 0.64% gc time)


  0.851274 seconds (4 allocations: 64 bytes)
  0.000836 seconds (27 allocations: 12.156 KiB)
  0.997119 seconds (4 allocations: 64 bytes)
  0.101553 seconds (27 allocations: 12.344 KiB)
  0.851026 seconds (4 allocations: 64 bytes)
  0.000899 seconds (27 allocations: 12.156 KiB)
  0.998418 seconds (4 allocations: 64 bytes)
  0.100059 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 9: ll=-1.334485185488965e7, reldiff = 5.365367245669872e-5, ll_basic=-1.334492648906096e7, ll_qn=-1.334485185488965e7


  4.824198 seconds (116.22 k allocations: 27.727 MiB)


  0.847601 seconds (4 allocations: 64 bytes)
  0.000966 seconds (27 allocations: 12.156 KiB)
  0.998507 seconds (4 allocations: 64 bytes)
  0.100239 seconds (27 allocations: 12.344 KiB)
  0.851055 seconds (4 allocations: 64 bytes)
  0.000832 seconds (27 allocations: 12.156 KiB)
  0.998339 seconds (4 allocations: 64 bytes)
  0.105940 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 10: ll=-1.3344485904661845e7, reldiff = 2.7422577019496487e-5, ll_basic=-1.334448965648377e7, ll_qn=-1.3344485904661845e7


  4.838603 seconds (116.22 k allocations: 27.727 MiB)


  0.851206 seconds (4 allocations: 64 bytes)
  0.000893 seconds (27 allocations: 12.156 KiB)
  0.998625 seconds (4 allocations: 64 bytes)
  0.102893 seconds (27 allocations: 12.344 KiB)
  0.850957 seconds (4 allocations: 64 bytes)
  0.001028 seconds (27 allocations: 12.156 KiB)
  0.998285 seconds (4 allocations: 64 bytes)
  0.103653 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 11: ll=-1.3344174847778792e7, reldiff = 2.3309768939482905e-5, ll_basic=-1.3344216738217931e7, ll_qn=-1.3344174847778792e7


  4.856887 seconds (116.22 k allocations: 27.727 MiB, 0.29% gc time)


  0.851310 seconds (4 allocations: 64 bytes)
  0.001101 seconds (27 allocations: 12.156 KiB)
  0.998448 seconds (4 allocations: 64 bytes)
  0.100925 seconds (27 allocations: 12.344 KiB)
  0.849687 seconds (4 allocations: 64 bytes)
  0.000964 seconds (27 allocations: 12.156 KiB)
  0.998326 seconds (4 allocations: 64 bytes)
  0.099733 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 12: ll=-1.3344074094647754e7, reldiff = 7.550345539300909e-6, ll_basic=-1.3344002897805652e7, ll_qn=-1.3344074094647754e7


(HaploADMIXTURE.AdmixData2{Float64, Float64}(379, 27025, 4, 3, [0.0659305858944505 1.0e-5 … 1.0e-5 0.9216355557347662; 0.2931541469415451 0.3952243217202208 … 0.013364364647133342 0.8908287248221434; 0.28216338908849464 1.0e-5 … 0.02357940880522459 0.9515433681229478; 0.3587518780755096 0.604755678279779 … 1.0e-5 0.977597786283387], [0.06688156566931577 1.0e-5 … 1.0e-5 0.9197043769377913; 0.29604713908141755 0.42915279024486425 … 0.013827308546145119 0.8890911533650797; 0.2844384519030023 1.0e-5 … 0.023681551651886415 0.952683767031023; 0.35263284334626427 0.5708272097551357 … 1.0e-5 0.9785123488109867], [0.06676375953175287 1.0e-5 … 1.0e-5 0.9199367750015377; 0.29574761860381416 0.42509942042450694 … 0.013772084998557576 0.8893414727887852; 0.2841836627728369 1.0e-5 … 0.02367025726164887 0.9525174405640243; 0.35330495909159604 0.574880579575493 … 1.0e-5 0.9783844763769592], [0.0659305858944505 1.0e-5 … 1.0e-5 0.9216355557347662; 0.2931541469415451 0.3952243217202208 … 0.01336436464713

The first argument is the path to the PLINK 1 `.bed`, and the second argument is the number of populations. The second through fourth arguments are: 
- `I`: Number of individuals. We use first `I` individuals in the PLINK file for the analysis.
- `J`: Number of pairs of SNPs to be used for analysis. We use the first `2J` SNPs in the PLINK file for the analysis.
- `K`: Number of populations.

After the semicolon are the keyword arguments:
- `T`: Precision of the estimation. `Float64` or `Float32`. Default `Float64`. 
- `use_gpu`: Whether to use GPU for estimation. Default `false`. 
- `rng`: Random number generator. Default `Random.GLOBAL_RNG`. 
- `prefix`: Prefix of the PLINK file only with the SNPs selected using SKFR. The output file is named `$(prefix)_$(K)_$(sparsity)aims.bed`. 
- `sparsity`: Number of pairs of SNPs selected by SKFR. Default `nothing` and skip SKFR. 
- `skfr_tries`: Runs SKFR this many times and choose the best clustering. Default 1. 
- `skfr_max_inner_iter`: Runs each SKFR for up to this many iterations or until convergence. Default 50. 
- `admix_n_iter`: Maximum number of iterations for ADMIXTURE. Default 1000. 
- `admix_rtol`: Convergence criteria in terms of relative change in loglikelihood. Default 1e-7. 
- `admix_n_em_iter`: Number of EM iterations to get a good initial guess for estimation. Default 5. 
- `Q`: Number of steps to be used in quasi-Newton acceleration. Default 3. 

The output are:
- `d`: the strucutre to store Admixture data. In particular, `d.p` stores the allele frequencies and `d.q` stores the admixture proportions. 
- `clusters`: cluster labels of each samples, `nothing` if `sparsity == nothing`.
- `aims`: The index of the selected SNPs in the decreasing order of importance, `nothing` if `sparsity == nothing`. 

To see the admixture proportion of each sample:

In [4]:
d.q

4×379 Matrix{Float64}:
 0.0659306  1.0e-5    1.0e-5    0.139574  …  0.00558938  1.0e-5   1.0e-5
 0.293154   0.395224  0.244018  0.464112     1.0e-5      1.0e-5   1.0e-5
 0.282163   1.0e-5    1.0e-5    0.396304     0.511126    0.99997  0.99997
 0.358752   0.604756  0.755962  1.0e-5       0.483274    1.0e-5   1.0e-5

Each column represent each sample, and each row represent each population.

To see the haplotype frequencies of the first alleles listed in the `.bim` file accompanying the `.bed` file:

In [5]:
d.p

4×108100 Matrix{Float64}:
 0.00189887  0.150532   0.0251855  …  0.0739429  1.0e-5     0.921636
 1.0e-5      0.0597565  0.0138474     0.0957969  0.0133644  0.890829
 1.0e-5      0.0797001  0.0123388     0.0198118  0.0235794  0.951543
 1.0e-5      0.119816   1.0e-5        0.0223822  1.0e-5     0.977598

Number of columns here, `108100`, is `4 * 27025`. Again, each row represent each population. Each contiguous four-column block represent frequency of four haplotypes, adding up to 1. For example, 

In [6]:
d.p[1, 1:4]

4-element Vector{Float64}:
 0.0018988672076842915
 0.1505318314348172
 0.025185507235799612
 0.8223837941216988

represents haplotype frequencies for the first pair of SNPs in the PLINK file, each representing `0|0`, `0|1`, `1|0`, and `1|1`, `0` representing "allele 1" and `1` representing "allele 2".

The following shows the final loglikelihood of the parameters.

In [7]:
d.ll_new

-1.3344074094647754e7

The following is an example with `sparsity` defined. This example uses 10000 pairs of SNPs, i.e., 20000 SNPs. 

In [8]:
d, clusters, aims = HaploADMIXTURE.run_admixture(filename, 379, 10000, 4; T=Float64, use_gpu=false, rng=StableRNG(7856), sparsity=10000, admix_rtol=1e-5, prefix="./EUR_subset")

Using /home/kose/.julia/packages/SnpArrays/lx5Kb/src/../data/EUR_subset.bed as input.
cnt of sparse1:6
./EUR_subset_4_20000aims
Loading genotype data...
Loaded 379 samples and 10000 SNPs


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 1, ll: -5.525851499970987e6
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 2, ll: -5.262097888763453e6
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 3, ll: -5.190244671819323e6
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 4, ll: -5.172440855986926e6


initial ll: -5.163819048892333e6
  0.316168 seconds (4 allocations: 64 bytes)

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mEM iter 5, ll: -5.163819048892333e6



  0.000247 seconds (27 allocations: 12.156 KiB)
  0.370845 seconds (4 allocations: 64 bytes)
  0.047828 seconds (27 allocations: 12.344 KiB)
  0.316039 seconds (4 allocations: 64 bytes)
  0.000244 seconds (27 allocations: 12.156 KiB)
  0.370071 seconds (4 allocations: 64 bytes)
  0.049618 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 1: ll=-5.095234386428468e6, reldiff = 0.013281771071853693, ll_basic=-5.092908738967335e6, ll_qn=-5.095234386428468e6


  1.845925 seconds (58.75 k allocations: 9.360 MiB)


  0.316152 seconds (4 allocations: 64 bytes)
  0.000245 seconds (27 allocations: 12.156 KiB)
  0.376847 seconds (4 allocations: 64 bytes)
  0.045715 seconds (27 allocations: 12.344 KiB)
  0.316086 seconds (4 allocations: 64 bytes)
  0.000249 seconds (27 allocations: 12.156 KiB)
  0.370140 seconds (4 allocations: 64 bytes)
  0.044192 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 2: ll=-5.068209114690185e6, reldiff = 0.005304029155217499, ll_basic=-5.070587028335551e6, ll_qn=-5.068209114690185e6


  1.845089 seconds (58.63 k allocations: 10.583 MiB)


  0.316200 seconds (4 allocations: 64 bytes)
  0.000255 seconds (27 allocations: 12.156 KiB)
  0.370626 seconds (4 allocations: 64 bytes)
  0.044215 seconds (27 allocations: 12.344 KiB)
  0.316089 seconds (4 allocations: 64 bytes)
  0.000256 seconds (27 allocations: 12.156 KiB)
  0.370302 seconds (4 allocations: 64 bytes)
  0.042538 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 3: ll=-5.058483402916277e6, reldiff = 0.0019189641851433926, ll_basic=-5.05971591811844e6, ll_qn=-5.058483402916277e6


  1.846694 seconds (58.63 k allocations: 11.816 MiB)


  0.316215 seconds (4 allocations: 64 bytes)
  0.000272 seconds (27 allocations: 12.156 KiB)
  0.370857 seconds (4 allocations: 64 bytes)
  0.043866 seconds (27 allocations: 12.344 KiB)
  0.317508 seconds (4 allocations: 64 bytes)
  0.000321 seconds (27 allocations: 12.156 KiB)
  0.383580 seconds (4 allocations: 64 bytes)
  0.039848 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 4: ll=-5.054904507371137e6, reldiff = 0.0007075036646510756, ll_basic=-5.053989314272972e6, ll_qn=-5.054904507371137e6


  1.858627 seconds (58.64 k allocations: 11.816 MiB)


  0.316162 seconds (4 allocations: 64 bytes)
  0.000295 seconds (27 allocations: 12.156 KiB)
  0.370839 seconds (4 allocations: 64 bytes)
  0.040766 seconds (27 allocations: 12.344 KiB)
  0.314125 seconds (4 allocations: 64 bytes)
  0.000289 seconds (27 allocations: 12.156 KiB)
  0.370828 seconds (4 allocations: 64 bytes)
  0.040438 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 5: ll=-5.05089891715014e6, reldiff = 0.0007924165956361982, ll_basic=-5.050808890519301e6, ll_qn=-5.05089891715014e6


  1.839599 seconds (58.64 k allocations: 11.816 MiB)


  0.316174 seconds (4 allocations: 64 bytes)
  0.000286 seconds (27 allocations: 12.156 KiB)
  0.370773 seconds (4 allocations: 64 bytes)
  0.039519 seconds (27 allocations: 12.344 KiB)
  0.316038 seconds (4 allocations: 64 bytes)
  0.000288 seconds (27 allocations: 12.156 KiB)
  0.370753 seconds (4 allocations: 64 bytes)
  0.038628 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 6: ll=-5.048942278834922e6, reldiff = 0.0003873841760273358, ll_basic=-5.0492884996244935e6, ll_qn=-5.048942278834922e6


  1.867011 seconds (58.63 k allocations: 11.816 MiB, 1.19% gc time)


  0.316158 seconds (4 allocations: 64 bytes)
  0.000289 seconds (27 allocations: 12.156 KiB)
  0.370846 seconds (4 allocations: 64 bytes)
  0.038279 seconds (27 allocations: 12.344 KiB)
  0.316046 seconds (4 allocations: 64 bytes)
  0.000290 seconds (27 allocations: 12.156 KiB)
  0.370743 seconds (4 allocations: 64 bytes)
  0.039384 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 7: ll=-5.047770124405816e6, reldiff = 0.00023215841346012068, ll_basic=-5.048131962733513e6, ll_qn=-5.047770124405816e6


  1.838665 seconds (58.64 k allocations: 11.816 MiB)


  0.316175 seconds (4 allocations: 64 bytes)
  0.000292 seconds (27 allocations: 12.156 KiB)
  0.370865 seconds (4 allocations: 64 bytes)
  0.040651 seconds (27 allocations: 12.344 KiB)
  0.316094 seconds (4 allocations: 64 bytes)
  0.000291 seconds (27 allocations: 12.156 KiB)
  0.370057 seconds (4 allocations: 64 bytes)
  0.037658 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 8: ll=-5.047160059443727e6, reldiff = 0.0001208583091253336, ll_basic=-5.047304857276616e6, ll_qn=-5.047160059443727e6


  1.829104 seconds (58.63 k allocations: 11.816 MiB)


  0.316144 seconds (4 allocations: 64 bytes)
  0.000360 seconds (27 allocations: 12.156 KiB)
  0.370922 seconds (4 allocations: 64 bytes)
  0.037766 seconds (27 allocations: 12.344 KiB)
  0.316048 seconds (4 allocations: 64 bytes)
  0.000495 seconds (27 allocations: 12.156 KiB)
  0.370779 seconds (4 allocations: 64 bytes)
  0.038488 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 9: ll=-5.04672723859276e6, reldiff = 8.575532494904331e-5, ll_basic=-5.046742000316248e6, ll_qn=-5.04672723859276e6


  1.829108 seconds (58.64 k allocations: 11.816 MiB)


  0.316173 seconds (4 allocations: 64 bytes)
  0.000438 seconds (27 allocations: 12.156 KiB)
  0.370818 seconds (4 allocations: 64 bytes)
  0.037354 seconds (27 allocations: 12.344 KiB)
  0.316031 seconds (4 allocations: 64 bytes)
  0.000496 seconds (27 allocations: 12.156 KiB)
  0.370103 seconds (4 allocations: 64 bytes)
  0.038534 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 10: ll=-5.046544089427066e6, reldiff = 3.6290680481622426e-5, ll_basic=-5.046567686450397e6, ll_qn=-5.046544089427066e6


  1.829496 seconds (58.64 k allocations: 11.816 MiB)


  0.316184 seconds (4 allocations: 64 bytes)
  0.000553 seconds (27 allocations: 12.156 KiB)
  0.370877 seconds (4 allocations: 64 bytes)
  0.037344 seconds (27 allocations: 12.344 KiB)
  0.316039 seconds (4 allocations: 64 bytes)
  0.000614 seconds (27 allocations: 12.156 KiB)
  0.370728 seconds (4 allocations: 64 bytes)
  0.038737 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 11: ll=-5.046441766786618e6, reldiff = 2.0275784504142524e-5, ll_basic=-5.046455377429166e6, ll_qn=-5.046441766786618e6


  1.850166 seconds (58.64 k allocations: 11.816 MiB, 2.19% gc time)


  0.316152 seconds (4 allocations: 64 bytes)
  0.000494 seconds (27 allocations: 12.156 KiB)
  0.370889 seconds (4 allocations: 64 bytes)
  0.037222 seconds (27 allocations: 12.344 KiB)
  0.316035 seconds (4 allocations: 64 bytes)
  0.000628 seconds (27 allocations: 12.156 KiB)
  0.370773 seconds (4 allocations: 64 bytes)
  0.037103 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 12: ll=-5.046367830851275e6, reldiff = 1.4651102452073928e-5, ll_basic=-5.046392931260402e6, ll_qn=-5.046367830851275e6


  1.830888 seconds (58.64 k allocations: 11.816 MiB)


  0.316133 seconds (4 allocations: 64 bytes)
  0.000488 seconds (27 allocations: 12.156 KiB)
  0.370859 seconds (4 allocations: 64 bytes)
  0.037489 seconds (27 allocations: 12.344 KiB)
  0.316056 seconds (4 allocations: 64 bytes)
  0.000486 seconds (27 allocations: 12.156 KiB)
  0.370052 seconds (4 allocations: 64 bytes)
  0.038658 seconds (27 allocations: 12.344 KiB)


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mIteration 13: ll=-5.046319427299974e6, reldiff = 9.591760435113265e-6, ll_basic=-5.046334252027263e6, ll_qn=-5.046319427299974e6


(HaploADMIXTURE.AdmixData2{Float64, Float64}(379, 10000, 4, 3, [1.0e-5 0.33326404938642595 … 0.28344710518992194 0.6011491619239099; 0.07289855987900877 0.13390061705597037 … 0.30974203968287856 0.3409426181222113; 0.49510416556920983 0.5328253335576035 … 0.26111656730581445 0.5630591999972048; 0.43198727455178143 1.0e-5 … 0.31927997487145227 0.5188558386425784], [1.0e-5 0.32868233767898125 … 0.2825780563232349 0.6013998562349263; 0.07195774040801553 0.1318127689754065 … 0.3098696970225646 0.34016162990071086; 0.49657542007143524 0.5394948933456122 … 0.26180169813985926 0.563460682873202; 0.43145683952054925 1.0e-5 … 0.3193299535156763 0.5184337403822538], [1.0e-5 0.32945237159972873 … 0.2829728052186874 0.6014265038605614; 0.07201946000685983 0.13245916348592296 … 0.30987185992617233 0.3403871458457224; 0.496497278673548 0.5380784649143482 … 0.2614780482797504 0.5633225751689016; 0.4314732613195922 1.0e-5 … 0.3192931169583276 0.5184660541774551], [1.0e-5 0.33326404938642595 … 0.283447

The following shows the clustering result of the samples:

In [9]:
clusters

379-element Vector{Int64}:
 1
 2
 1
 4
 2
 1
 2
 3
 2
 1
 1
 1
 1
 ⋮
 1
 4
 1
 4
 1
 2
 4
 4
 1
 4
 4
 4

And the following shows the SNPs selected.

In [10]:
aims

20000-element Vector{Int64}:
  5717
  5718
 43817
 43818
 43821
 43822
 51813
 51814
  5835
  5836
  5651
  5652
  5813
     ⋮
 14511
 14512
 35113
 35114
 44757
 44758
  2073
  2074
 26917
 26918
  6533
  6534

The allele frequencies can be shown as in: 

In [11]:
d.p

4×40000 Matrix{Float64}:
 0.201256   0.0740125  0.0199012  0.70483   …  1.0e-5  0.283447  0.601149
 0.0827117  0.0387316  0.0107814  0.867775     1.0e-5  0.309742  0.340943
 0.153377   0.0370591  1.0e-5     0.809554     1.0e-5  0.261117  0.563059
 0.180305   0.0516882  1.0e-5     0.767997     1.0e-5  0.31928   0.518856

!!! The order of alleles is in the order of index (as in `sort(aim)`). This can be verified by checking the `.bim` file generated along with the newly filtered `.bed` file. 

The admixture proportions can be viewed by:

In [12]:
d.q

4×379 Matrix{Float64}:
 1.0e-5     0.333264  0.276992  0.215578  …  0.934272   1.0e-5   1.0e-5
 0.0728986  0.133901  0.151148  0.116068     0.0259248  1.0e-5   1.0e-5
 0.495104   0.532825  0.57185   0.142498     0.0397931  1.0e-5   1.0e-5
 0.431987   1.0e-5    1.0e-5    0.525857     1.0e-5     0.99997  0.99997

In [13]:
d.ll_new

-5.046319427299974e6

## Multithreading

If you have multiple CPU cores available, it is recommended to launch Julia with multiple threads, for example, by using `-t` option from the terminal:
```bash
julia -t 8
```

You may also set up a multithreaded Jupyter kernel following the instruction given [here](https://github.com/JuliaLang/IJulia.jl/issues/882). 

## GPU support
GPU is enabled by setting the keyword argument `use_gpu` to `true`. The parts computing gradients and Hessians of the loglikelihood is moved to GPU.