Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reversing seq with unused data #128

Merged
merged 1 commit into from
Jan 18, 2021

Conversation

jakobnissen
Copy link
Member

MWE to reproduce bug:

julia> using BioSequences

julia> seq = ungap!(dna"TAGC------------ACC")
7nt DNA Sequence:
TAGCACC

julia> reverse_complement(seq)
7nt DNA Sequence:
----GGT

What happened was that the reversing functions assumed that length(seq.data) == seq_data_len(typeof(Alphabet(seq)), length(seq), which is not a valid guarantee (i.e: LongSequences are allowed to have noncoding elements at the end of their data vector).

@codecov
Copy link

codecov bot commented Jan 14, 2021

Codecov Report

Merging #128 (76fcd87) into master (1617206) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   83.18%   83.21%   +0.03%     
==========================================
  Files          44       44              
  Lines        2949     2949              
==========================================
+ Hits         2453     2454       +1     
+ Misses        496      495       -1     
Flag Coverage Δ
unittests 83.21% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/longsequences/longsequence.jl 77.77% <100.00%> (+2.77%) ⬆️
src/longsequences/transformations.jl 100.00% <100.00%> (ø)
src/biosequence/indexing.jl 87.27% <0.00%> (+0.23%) ⬆️
src/bit-manipulation/bitpar-compiler.jl 62.40% <0.00%> (+0.79%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1617206...76fcd87. Read the comment docs.

@@ -69,7 +69,7 @@ Base.reverse(seq::LongSequence{<:Alphabet}) = _reverse(seq, BitsPerSymbol(seq))
@inline function _reverse(seq::LongSequence{A}, B::BT) where {A <: Alphabet,
BT <: Union{BitsPerSymbol{2}, BitsPerSymbol{4}, BitsPerSymbol{8}}}
cp = LongSequence{A}(unsigned(length(seq)))
reverse_data_copy!(identity, cp.data, seq.data, B)
reverse_data_copy!(identity, cp.data, seq.data, seq_data_len(seq) % UInt, B)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the remainder like this is a clever way of converting an integer to an unsigned integer. Didn't realize what was going on here until I tried it

julia> x = 1
1

julia> UInt(x)
0x0000000000000001

julia> x % UInt
0x0000000000000001

julia> UInt(x) == (x % UInt)
true

julia> BenchmarkTools.@benchmark UInt($(rand(Int)))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.059 ns (0.00% GC)
  median time:      0.070 ns (0.00% GC)
  mean time:        0.074 ns (0.00% GC)
  maximum time:     0.539 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> BenchmarkTools.@benchmark $(rand(Int)) % UInt
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.040 ns (0.00% GC)
  median time:      0.040 ns (0.00% GC)
  mean time:        0.045 ns (0.00% GC)
  maximum time:     15.710 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@TransGirlCodes TransGirlCodes merged commit 1cfc0aa into BioJulia:master Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants