# Minimizing allocations

It seems like [any allocation is deterimental to multithreading](https://discourse.julialang.org/t/poor-performance-on-cluster-multithreading/12248), since garbage collection is single threaded (at least as of 2018). We cannot pre-allocate `M`, but all other intermediate matrices (e.g. Xwork, Hwork, N, temporary vectors) can be preallocated and scaled. 

In [1]:
using Revise
using VCFTools
using MendelImpute
using GeneticVariation
using Random
using SparseArrays
using JLD2, FileIO, JLSO
using ProgressMeter
using GroupSlices
using ThreadPools
# using Plots
# using ProfileView

┌ Info: Precompiling MendelImpute [e47305d1-6a61-5370-bc5d-77554d143183]
└ @ Base loading.jl:1273


In [2]:
Threads.nthreads()

8

# Not yet optimized (7/11/2020)

### window by window intersection with global search

In [5]:
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false);

X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 7.59873 seconds
    Computing haplotype pair        = 2.23046 seconds
        BLAS3 mul! to get M and N      = 0.0580204 seconds per thread
        haplopair search               = 1.88659 seconds per thread
        finding redundant happairs     = 0.0253855 seconds per thread
    Phasing by win-win intersection = 0.18561 seconds
    Imputation                      = 6.8954 seconds

 17.168989 seconds (77.19 M allocations: 7.610 GiB, 6.38% gc time)
error_rate = 8.32693826254268e-5


### haplotype thinning

In [6]:
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false, thinning_factor=100, max_haplotypes=100);

X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 7.80156 seconds
    Computing haplotype pair        = 3.30648 seconds
        screening for top haplotypes   = 0.402203 seconds per thread
        BLAS3 mul! to get M and N      = 2.35561 seconds per thread
        haplopair search               = 0.0949087 seconds per thread
        finding redundant happairs     = 0.0348157 seconds per thread
    Phasing by win-win intersection = 0.177686 seconds
    Imputation                      = 6.84377 seconds

 18.479679 seconds (77.72 M allocations: 7.541 GiB, 6.05% gc time)
error_rate = 8.550487693659426e-5


### Lasso

In [10]:
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time hs, ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    lasso = 20, dynamic_programming=false, max_haplotypes=100);

# import imputed result and compare with true
X_mendel = convert_gt(Float32, outfile)
# X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 7.80041 seconds
    Computing haplotype pair        = 0.683484 seconds
        BLAS3 mul! to get M and N      = 0.104721 seconds per thread
        haplopair search               = 0.307949 seconds per thread
        finding redundant happairs     = 0.0280975 seconds per thread
    Phasing by win-win intersection = 0.179585 seconds
    Imputation                      = 6.49275 seconds

 15.390669 seconds (77.19 M allocations: 7.610 GiB, 7.79% gc time)
error_rate = 8.685062226819259e-5


# Optimized

In [3]:
# global search: use view for Xw_aligned
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false);

X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:07[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 7.98878 seconds
    Computing haplotype pair        = 2.2115 seconds
        BLAS3 mul! to get M and N      = 0.0589154 seconds per thread
        haplopair search               = 1.81845 seconds per thread
        finding redundant happairs     = 0.0241587 seconds per thread
    Phasing by win-win intersection = 0.181819 seconds
    Imputation                      = 6.31794 seconds

 16.931099 seconds (77.19 M allocations: 7.541 GiB, 7.82% gc time)
error_rate = 8.32693826254268e-5


In [10]:
# global search: Preallocate happair and hapscore
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false);

X_mendel = convert_gt(Float32, outfile)
# X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 8.06199 seconds
    Computing haplotype pair        = 3.6892 seconds
        BLAS3 mul! to get M and N      = 0.0671328 seconds per thread
        haplopair search               = 2.91493 seconds per thread
        finding redundant happairs     = 0.0277278 seconds per thread
    Phasing by win-win intersection = 0.19773 seconds
    Imputation                      = 6.51697 seconds

 18.751781 seconds (77.19 M allocations: 7.540 GiB, 6.06% gc time)
error_rate = 8.32693826254268e-5


In [8]:
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false, thinning_factor=100, max_haplotypes=100);

X_mendel = convert_gt(Float32, outfile)
# X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:08[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 9.01578 seconds
    Computing haplotype pair        = 3.53233 seconds
        screening for top haplotypes   = 0.684178 seconds per thread
        BLAS3 mul! to get M and N      = 2.43814 seconds per thread
        haplopair search               = 0.0982254 seconds per thread
        finding redundant happairs     = 0.0371928 seconds per thread
    Phasing by win-win intersection = 0.192218 seconds
    Imputation                      = 7.38668 seconds

 20.508864 seconds (77.19 M allocations: 7.316 GiB, 5.95% gc time)
error_rate = 8.550487693659426e-5
