# Minimizing allocations

It seems like [any allocation is deterimental to multithreading](https://discourse.julialang.org/t/poor-performance-on-cluster-multithreading/12248), since garbage collection is single threaded (at least as of 2018). We cannot pre-allocate `M`, but all other intermediate matrices (e.g. Xwork, Hwork, N, temporary vectors) can be preallocated and scaled. 

In [1]:
using Revise
using VCFTools
using MendelImpute
using GeneticVariation
using Random
using SparseArrays
using JLD2, FileIO, JLSO
using ProgressMeter
using GroupSlices
using ThreadPools
using BenchmarkTools
using StatsBase
# using Plots
# using ProfileView

┌ Info: Precompiling MendelImpute [e47305d1-6a61-5370-bc5d-77554d143183]
└ @ Base loading.jl:1273


In [2]:
Threads.nthreads()

8

# Not yet optimized (7/11/2020)

### window by window intersection with global search

In [5]:
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false);

X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 7.59873 seconds
    Computing haplotype pair        = 2.23046 seconds
        BLAS3 mul! to get M and N      = 0.0580204 seconds per thread
        haplopair search               = 1.88659 seconds per thread
        finding redundant happairs     = 0.0253855 seconds per thread
    Phasing by win-win intersection = 0.18561 seconds
    Imputation                      = 6.8954 seconds

 17.168989 seconds (77.19 M allocations: 7.610 GiB, 6.38% gc time)
error_rate = 8.32693826254268e-5


### haplotype thinning

In [6]:
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false, thinning_factor=100, max_haplotypes=100);

X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 7.80156 seconds
    Computing haplotype pair        = 3.30648 seconds
        screening for top haplotypes   = 0.402203 seconds per thread
        BLAS3 mul! to get M and N      = 2.35561 seconds per thread
        haplopair search               = 0.0949087 seconds per thread
        finding redundant happairs     = 0.0348157 seconds per thread
    Phasing by win-win intersection = 0.177686 seconds
    Imputation                      = 6.84377 seconds

 18.479679 seconds (77.72 M allocations: 7.541 GiB, 6.05% gc time)
error_rate = 8.550487693659426e-5


### Lasso

In [10]:
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time hs, ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    lasso = 20, dynamic_programming=false, max_haplotypes=100);

# import imputed result and compare with true
X_mendel = convert_gt(Float32, outfile)
# X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 7.80041 seconds
    Computing haplotype pair        = 0.683484 seconds
        BLAS3 mul! to get M and N      = 0.104721 seconds per thread
        haplopair search               = 0.307949 seconds per thread
        finding redundant happairs     = 0.0280975 seconds per thread
    Phasing by win-win intersection = 0.179585 seconds
    Imputation                      = 6.49275 seconds

 15.390669 seconds (77.19 M allocations: 7.610 GiB, 7.79% gc time)
error_rate = 8.685062226819259e-5


# Optimized

In [4]:
# global search: Preallocate all vectors/matrices
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false);

X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:07[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 8.25352 seconds
    Computing haplotype pair        = 2.23927 seconds
        BLAS3 mul! to get M and N      = 0.0659687 seconds per thread
        haplopair search               = 1.84551 seconds per thread
        finding redundant happairs     = 0.0299791 seconds per thread
    Phasing by win-win intersection = 0.18616 seconds
    Imputation                      = 6.77303 seconds

 17.844942 seconds (77.19 M allocations: 7.420 GiB, 7.17% gc time)
error_rate = 8.32693826254268e-5


In [3]:
# thinning: preallocate all vectors/matrices
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false, thinning_factor=100, max_haplotypes=100);

X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 7.63071 seconds
    Computing haplotype pair        = 3.21096 seconds
        screening for top haplotypes   = 0.62019 seconds per thread
        BLAS3 mul! to get M and N      = 2.25471 seconds per thread
        haplopair search               = 0.0934914 seconds per thread
        finding redundant happairs     = 0.0637935 seconds per thread
    Phasing by win-win intersection = 0.194177 seconds
    Imputation                      = 6.52209 seconds

 17.627988 seconds (77.20 M allocations: 7.317 GiB, 5.14% gc time)
error_rate = 8.550487693659426e-5


In [10]:
# lasso: preallocate all vectors/matrices
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time hs, ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    lasso = 20, dynamic_programming=false, max_haplotypes=100);

# import imputed result and compare with true
X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "./compare2/target.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)
rm(outfile, force=true)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:07[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 8.60205 seconds
    Computing haplotype pair        = 3.22841 seconds
        BLAS3 mul! to get M and N      = 0.0361663 seconds per thread
        haplopair search               = 0.180822 seconds per thread
        finding redundant happairs     = 0.0371089 seconds per thread
    Phasing by win-win intersection = 0.216741 seconds
    Imputation                      = 7.02347 seconds

 19.152174 seconds (77.19 M allocations: 7.420 GiB, 6.80% gc time)
error_rate = 8.685062226819259e-5


# Lets do rigorous benchmarks

In [5]:
# first import all data, declare a bunch of (needed or not) variables, and look at 1 window
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"

loaded = JLSO.load(reffile)
compressed_Hunique = loaded[:compressed_Hunique]
X, X_sampleID, X_chr, X_pos, X_ids, X_ref, X_alt = VCFTools.convert_gt(UInt8, tgtfile, trans=true, save_snp_info=true, msg = "Importing genotype file...");

people = size(X, 2)
tgt_snps = size(X, 1)
ref_snps = length(compressed_Hunique.pos)
tot_windows = floor(Int, tgt_snps / width)
avg_num_unique_haps = round(Int, avg_haplotypes_per_window(compressed_Hunique))
max_windows_per_chunks = nchunks(avg_num_unique_haps, nhaplotypes(compressed_Hunique), width, people, Threads.nthreads(), Base.summarysize(X), compressed_Hunique)
chunks = ceil(Int, tot_windows / min(tot_windows, max_windows_per_chunks))
num_windows_per_chunks = round(Int, tot_windows / chunks)
snps_per_chunk = num_windows_per_chunks * width
last_chunk_windows = tot_windows - (chunks - 1) * num_windows_per_chunks

ph = [HaplotypeMosaicPair(ref_snps) for i in 1:people]
redundant_haplotypes = [OptimalHaplotypeSet(num_windows_per_chunks, nhaplotypes(compressed_Hunique)) for i in 1:people]

chunk = 1
windows = (chunk == chunks ? last_chunk_windows : num_windows_per_chunks)
w_start = (chunk - 1) * num_windows_per_chunks + 1
w_end = (chunk == chunks ? tot_windows : chunk * num_windows_per_chunks)

MendelImpute.initialize!(redundant_haplotypes)
chunk == chunks && MendelImpute.resize!(redundant_haplotypes, last_chunk_windows)

winrange = w_start:w_end
people = size(X, 2)
ref_snps = length(compressed_Hunique.pos)
width = compressed_Hunique.width
windows = length(winrange)
threads = Threads.nthreads()
tothaps = nhaplotypes(compressed_Hunique)

# working arrys 
happair1 = [ones(Int32, people)           for _ in 1:threads]
happair2 = [ones(Int32, people)           for _ in 1:threads]
hapscore = [zeros(Float32, size(X, 2))    for _ in 1:threads]
Xwork    = [zeros(Float32, width, people) for _ in 1:threads]
redunhaps_bitvec1 = [falses(tothaps) for _ in 1:threads]
redunhaps_bitvec2 = [falses(tothaps) for _ in 1:threads]

# window 1
absolute_w = 1
Hw_aligned = compressed_Hunique.CW_typed[absolute_w].uniqueH
Xw_idx_start = (absolute_w - 1) * width + 1
Xw_idx_end = absolute_w * width
Xw_aligned = view(X, Xw_idx_start:Xw_idx_end, :)
id = Threads.threadid();

[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:06[39m


In [3]:
# global search must allocate M, Hwork, N for each window, but there's no other allocation
M = zeros(Float32, size(Hw_aligned, 2), size(Hw_aligned, 2))
N = zeros(Float32, size(Xw_aligned, 2), size(Hw_aligned, 2))
Hwork = convert(Matrix{Float32}, Hw_aligned)
@benchmark haplopair!($Xw_aligned, $Hw_aligned, happair1=$(happair1[id]), 
    happair2=$(happair2[id]), hapscore=$(hapscore[id]), Xwork=$(Xwork[id]),
    M=$M, N=$N, Hwork=$Hwork)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     121.945 ms (0.00% GC)
  median time:      124.566 ms (0.00% GC)
  mean time:        124.980 ms (0.00% GC)
  maximum time:     131.547 ms (0.00% GC)
  --------------
  samples:          41
  evals/sample:     1

In [5]:
# thinning must allocate a bunch of vector/matrices. Only R, Hwork cannot be preallocated
# all allocations are due to distance computation.
keep    = 100
maxindx = zeros(Int, keep)
maxgrad = zeros(Float32, keep)
Xi      = zeros(Float32, size(Xw_aligned, 1))
N       = zeros(Float32, keep)
Hk    = zeros(Float32, size(Hw_aligned, 1), keep)
M     = zeros(Float32, keep, keep)
Xwork = zeros(Float32, size(Xw_aligned, 1), size(Xw_aligned, 2))
Hwork = convert(Matrix{Float32}, Hw_aligned)
R     = rand(Float32, size(Hw_aligned, 2), size(Xw_aligned, 2))
@benchmark haplopair_thin_BLAS2!($Xw_aligned, $Hw_aligned, allele_freq=nothing, 
    keep=$keep, happair1=$(happair1[id]), happair2=$(happair2[id]), 
    hapscore=$(hapscore[id]), maxindx=$maxindx, maxgrad=$maxgrad, Xi=$Xi, N=$N, 
    Hk=$Hk, M=$M, Xwork=$Xwork, Hwork=$Hwork, R=$R)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     211.456 ms (0.00% GC)
  median time:      225.450 ms (0.00% GC)
  mean time:        225.303 ms (0.00% GC)
  maximum time:     244.914 ms (0.00% GC)
  --------------
  samples:          23
  evals/sample:     1

In [4]:
# lasso: here Hwork, M, Nt cannot be preallocated, but there's no other allocation.
r       = 10
maxindx = zeros(Int, r)
maxgrad = zeros(Float32, r)
Xwork = zeros(Float32, size(Xw_aligned, 1), size(Xw_aligned, 2))
Hwork = convert(Matrix{Float32}, Hw_aligned)
M     = zeros(Float32, size(Hw_aligned, 2), size(Hw_aligned, 2))
Nt    = zeros(Float32, size(Hw_aligned, 2), size(Xw_aligned, 2))
@benchmark haplopair_lasso!($Xw_aligned, $Hw_aligned, r=$r, happair1=$(happair1[id]), 
    happair2=$(happair2[id]), hapscore=$(hapscore[id]), maxindx=$maxindx, 
    maxgrad=$maxgrad, Xwork=$Xwork, Hwork=$Hwork, M=$M, Nt=$Nt)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     26.048 ms (0.00% GC)
  median time:      27.389 ms (0.00% GC)
  mean time:        27.534 ms (0.00% GC)
  maximum time:     32.354 ms (0.00% GC)
  --------------
  samples:          182
  evals/sample:     1

In [12]:
# computing redundant haps have allocation in first chunk
w = something(findfirst(x -> x == absolute_w, winrange)) # window index of current chunk
redundant_haplotypes = [OptimalHaplotypeSet(num_windows_per_chunks, nhaplotypes(compressed_Hunique)) for i in 1:people]
@time compute_redundant_haplotypes!(redundant_haplotypes, compressed_Hunique, 
    (happair1[id]), (happair2[id]), w, absolute_w, (redunhaps_bitvec1[id]), 
    (redunhaps_bitvec2[id]))

  0.005366 seconds (4.00 k allocations: 9.461 MiB)


In [13]:
# computing redundant haps have 0 allocation in subsequent chunks
w = something(findfirst(x -> x == absolute_w, winrange)) # window index of current chunk
@benchmark compute_redundant_haplotypes!($redundant_haplotypes, $compressed_Hunique, 
    $(happair1[id]), $(happair2[id]), $w, $absolute_w, $(redunhaps_bitvec1[id]), 
    $(redunhaps_bitvec2[id]))

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     593.756 μs (0.00% GC)
  median time:      607.433 μs (0.00% GC)
  mean time:        617.867 μs (0.00% GC)
  maximum time:     1.646 ms (0.00% GC)
  --------------
  samples:          8068
  evals/sample:     1

# Writing routine

In [10]:
# create variables
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
outfile = "./compare2/mendel.imputed.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, width = width,
    dynamic_programming = false);

loaded = JLSO.load(reffile)
compressed_Hunique = loaded[:compressed_Hunique]

X, X_sampleID, X_chr, X_pos, X_ids, X_ref, X_alt = VCFTools.convert_gt(UInt8, tgtfile, trans=true, save_snp_info=true, msg = "Importing genotype file...")
XtoH_idx = indexin(X_pos, compressed_Hunique.pos) # X_pos[i] == H_pos[XtoH_idx[i]]
X_full = Matrix{Union{Missing, UInt8}}(missing, length(compressed_Hunique.pos), size(X, 2))
copyto!(@view(X_full[XtoH_idx, :]), X); # keep known entries
impute_discard_phase!(X_full, compressed_Hunique, ph)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:07[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m


Total windows = 72, averaging ~ 627 unique haplotypes per window.

Timings: 
    Data import                     = 8.4725 seconds
    Computing haplotype pair        = 3.33952 seconds
        BLAS3 mul! to get M and N      = 0.0983789 seconds per thread
        haplopair search               = 2.90371 seconds per thread
        finding redundant happairs     = 0.0345083 seconds per thread
    Phasing by win-win intersection = 0.287375 seconds
    Imputation                      = 7.33042 seconds

 19.804090 seconds (78.25 M allocations: 7.471 GiB, 5.16% gc time)


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:07[39m


In [None]:
@time write(outfile, X_full, compressed_Hunique, X_sampleID)

In [18]:
# original
@btime write($outfile, $X_full, $compressed_Hunique, $X_sampleID) seconds=30

[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:05[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to 

  5.933 s (414103 allocations: 362.05 MiB)


In [16]:
# copy
@btime write($outfile, $X_full, $compressed_Hunique, $X_sampleID) seconds=30

[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:06[39m
[32mWriting to 

  6.211 s (414104 allocations: 362.05 MiB)


# Optimize window by window intersection

# using array of int

Seems like `intersect!` in Base is allocating a lot. Its implementation is confusing.

In [4]:
@btime intersect!(x, y) setup=(x = [1, 2, 3]; y = [1, 4])

  311.573 ns (15 allocations: 1.05 KiB)


1-element Array{Int64,1}:
 1

In [11]:
@which intersect!([1, 2, 3], [1, 4])

## Try writing our own non-allocating intersect

In our application, the 2 integer vectors are sorted and elements are unique. The code below doesn't assume such, so there may be faster implementations.

In [18]:
"""
    intersect!(v::AbstractVector, u::AbstractVector, seen::BitSet=BitSet())

Computes `v ∩ u` in place and stores result in `v`. 

# Arguments
- `v`: An integer vector
- `u`: An integer vector
- `seen`: Preallocated storage container
"""
function Base.intersect!(
    v::AbstractVector{<:Integer}, 
    u::AbstractVector{<:Integer}, 
    seen::AbstractSet
    )
    empty!(seen)
    for i in u
        push!(seen, i)
    end
    for i in Iterators.reverse(eachindex(v))
        @inbounds v[i] ∉ seen && deleteat!(v, i)
    end
    nothing
end

"""
    intersect_size(v::AbstractVector, u::AbstractVector, seen::BitSet=BitSet())

Computes the size of `v ∩ u` in place. Assumes `v` is usually smaller than `u`
and each element in `v` is unique.

# Arguments
- `v`: An integer vector
- `u`: An integer vector
- `seen`: Preallocated storage container
"""
function intersect_size(
    v::AbstractVector{<:Integer}, 
    u::AbstractVector{<:Integer}, 
    seen::AbstractSet=BitSet()
    )
    empty!(seen)
    for i in u
        push!(seen, i)
    end
    s = 0
    for i in eachindex(v)
        @inbounds v[i] ∈ seen && (s += 1)
    end
    return s
end
intersect_size(v::AbstractVector, u::Integer, seen) = u in v

intersect_size (generic function with 4 methods)

In [15]:
# correctness
seen = BitSet()
sizehint!(seen, 10000)
x = [1, 3, 4, 5, 7, 9]
y = [2, 3, 5, 6]
@show intersect_size(x, y, seen)
intersect!(x, y, seen)
@show x
@show y;

intersect_size(x, y, seen) = 2
x = [3, 5]
y = [2, 3, 5, 6]


### Timings

In [4]:
# Julia built in
@btime intersect!(x, y) setup=(x = rand(1:10000, 1000); y = rand(1:10000, 1000));

  70.590 μs (35 allocations: 95.62 KiB)


In [5]:
seen = BitSet()
sizehint!(seen, 10000)
@btime intersect!(x, y, $seen) setup=(x = rand(1:10000, 1000); y = rand(1:10000, 1000));

  2.768 μs (0 allocations: 0 bytes)


In [13]:
seen = BitSet()
sizehint!(seen, 10000)
@btime intersect_size(x, y, $seen) setup=(x = rand(1:10000, 1000); y = rand(1:10000, 1000));

  2.457 μs (0 allocations: 0 bytes)


In [3]:
# first import all data, declare a bunch of (needed or not) variables, and look at 1 window
cd("/Users/biona001/.julia/dev/MendelImpute/simulation")
Random.seed!(2020)
width   = 512
tgtfile = "./compare2/target.typedOnly.maf0.01.masked.vcf.gz"
reffile = "./compare2/ref.excludeTarget.w$width.jlso"
loaded = JLSO.load(reffile)
compressed_Hunique = loaded[:compressed_Hunique]
X, X_sampleID, X_chr, X_pos, X_ids, X_ref, X_alt = VCFTools.convert_gt(UInt8, tgtfile, 
    trans=true, save_snp_info=true, msg = "Importing genotype file...");

# first person's optimal haplotype in each window (complete index)
happair1_original = [9, 9, 30, 218, 31, 31, 86, 30, 86, 218, 163, 163, 45, 45, 163, 687, 
    3, 3, 6, 687, 3, 170, 212, 687, 328, 687, 48, 67, 7, 7, 7, 7, 7, 7, 169, 169, 156, 
    156, 169, 169, 336, 539, 34, 300, 300, 300, 260, 284, 284, 1, 91, 91, 14, 104, 131, 
    131, 548, 8, 8, 8, 8, 8, 8, 183, 8, 23, 6, 117, 754, 190, 16, 16]
happair2_original = [5509, 45, 218, 5509, 218, 173, 218, 218, 218, 687, 218, 218, 163, 163, 
    1837, 709, 32, 687, 128, 1312, 202, 687, 277, 709, 328, 709, 475, 687, 687, 98, 98, 274, 
    169, 169, 709, 601, 709, 709, 384, 709, 709, 687, 171, 687, 426, 426, 284, 300, 539, 
    76, 617, 104, 104, 131, 1837, 140, 687, 687, 144, 687, 687, 233, 70, 233, 23, 1837, 
    23, 899, 2392, 1538, 78, 754];

[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:07[39m


In [9]:
happair1 = copy(happair1_original)
happair2 = copy(happair2_original)
seen = BitSet()
survivors1=Int32[]
survivors2=Int32[]
sizehint!(seen, 60000)
sizehint!(survivors1, 60000)
sizehint!(survivors2, 60000)

@time phase_sample!(happair1, happair2, compressed_Hunique, seen, survivors1, survivors2)

  0.002382 seconds (4 allocations: 160 bytes)


In [7]:
seen = BitSet()
survivors1=Int32[]
survivors2=Int32[]
sizehint!(seen, 60000)
sizehint!(survivors1, 60000)
sizehint!(survivors2, 60000)

@btime phase_sample!(happair1, happair2, $compressed_Hunique, $seen, $survivors1,
    $survivors2) setup=(happair1=copy(happair1_original);happair2 = 
    copy(happair2_original))

  1.521 ms (0 allocations: 0 bytes)
