Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read all ALT allele as 1 and REF allele as 0 in minor_allele_dosage!? #16

Closed
biona001 opened this issue May 16, 2021 · 1 comment
Closed

Comments

@biona001
Copy link
Member

Here is a test data (unphased VCF file converted to BGEN using qctool v2)
Archive 2.zip

I want to import all BGEN genotypes into numeric matrix, so I wrote the following function

using BGEN, VCFTools
function convert_gt(t::Type{T}, b::Bgen) where T <: Real
    n = n_samples(b)
    p = n_variants(b)
    G = Matrix{t}(undef, p, n)

    # loop over each variant
    i = 1
    for v in iterator(b; from_bgen_start=true)
        dose = minor_allele_dosage!(b, v; T=t)
        copyto!(@view(G[i, :]), dose)
        i += 1
    end
    return G
end

But imported genotype matrix does not agree with VCF file:

Gtest = convert_gt(Float64, Bgen("target.typedOnly.masked.bgen"))
Gtrue = VCFTools.convert_gt(Float64, "target.typedOnly.masked.vcf.gz", trans=true)
julia> all(Gtrue .== Gtest)
false

This seems to be because BGEN is checking which allele is the minor allele (so a 2 is swapped with 0 and 0 is swapped with 2 compared to VCF file)

idx = findall(skipmissing(Gtrue .!= Gtest)) # index where Gtrue and Gtest does not agree
julia> [Gtrue[idx] Gtest[idx]]
66811×2 Matrix{Union{Missing, Float64}}:
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 0.0  2.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 0.0  2.0
 0.0  2.0
 0.0  2.0
 0.0  2.0

Is there a way to instead read all ALT alleles as 1 and all REF allele as 0?

@biona001
Copy link
Member Author

biona001 commented May 17, 2021

Thanks for the tip of using ref_allele_dosage!. Now this works:

using BGEN, VCFTools
function convert_gt(t::Type{T}, b::Bgen) where T <: Real
    n = n_samples(b)
    p = n_variants(b)
    G = Matrix{t}(undef, p, n)

    # loop over each variant
    i = 1
    for v in iterator(b; from_bgen_start=true)
        dose = ref_allele_dosage!(b, v; T=t) # this reads REF allele as 1
        BGEN.alt_dosage!(dose, v.genotypes.preamble) # switch 2 and 0 (ie treat ALT as 1)
        copyto!(@view(G[i, :]), dose)
        i += 1
    end
    return G
end

Gtest = convert_gt(Float64, Bgen("target.typedOnly.masked.bgen"))
Gtrue = VCFTools.convert_gt(Float64, "target.typedOnly.masked.vcf.gz", trans=true)

julia> all(skipmissing(Gtrue .== Gtest))
true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant