# Test imputation on untyped SNPs chrom 20

In [1]:
using Revise
using VCFTools
using MendelImpute
using GeneticVariation
using Random
using StatsBase
using CodecZlib
using ProgressMeter
using JLD2, FileIO, JLSO
using BenchmarkTools
using GroupSlices
using TimerOutputs
using LinearAlgebra

BLAS.set_num_threads(1)

┌ Info: Precompiling MendelImpute [e47305d1-6a61-5370-bc5d-77554d143183]
└ @ Base loading.jl:1278


# MendelImpute error rate (window-window intersection)

In [2]:
Threads.nthreads()

8

In [3]:
# 8 threads
Random.seed!(2020)
d       = 1000
tgtfile = "target.chr20.typedOnly.maf0.01.masked.vcf.gz"
reffile = "ref.chr20.maxd$d.maf0.01.excludeTarget.jlso"
outfile = "mendel.chr20.imputed.target.vcf.gz"
@time ph = phase(tgtfile, reffile, outfile = outfile, max_d = d,
    dynamic_programming = false);

# import imputed result and compare with true
X_mendel = convert_gt(Float32, outfile)
X_complete = convert_gt(Float32, "target.chr20.full.vcf.gz")
n, p = size(X_mendel)
println("error_rate = ", sum(X_mendel .!= X_complete) / n / p)

Importing reference haplotype data...


[32mImporting genotype file...100%|█████████████████████████| Time: 0:00:28[39m
[32mComputing optimal haplotypes...100%|████████████████████| Time: 0:01:01[39m
[32mPhasing...100%|█████████████████████████████████████████| Time: 0:00:40[39m
[32mWriting to file...100%|█████████████████████████████████| Time: 0:00:10[39m


Total windows = 3252, averaging ~ 510 unique haplotypes per window.

Timings: 
    Data import                     = 64.1681 seconds
        import target data             = 30.9919 seconds
        import compressed haplotypes   = 33.1763 seconds
    Computing haplotype pair        = 61.4522 seconds
        BLAS3 mul! to get M and N      = 0.96825 seconds per thread
        haplopair search               = 51.7265 seconds per thread
        initializing missing           = 1.30172 seconds per thread
        allocating and viewing         = 0.0801829 seconds per thread
        index conversion               = 0.0754617 seconds per thread
    Phasing by win-win intersection = 40.4594 seconds
        Window-by-window intersection  = 34.6499 seconds per thread
        Breakpoint search              = 0.714774 seconds per thread
        Recording result               = 2.32517 seconds per thread
    Imputation                     = 13.871 seconds
        Imputing missing               = 2.6