-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove integer conversion and arithmetic #38
Conversation
Codecov Report
@@ Coverage Diff @@
## master #38 +/- ##
===========================================
+ Coverage 98.41% 100.00% +1.58%
===========================================
Files 3 3
Lines 126 112 -14
===========================================
- Hits 124 112 -12
+ Misses 2 0 -2
Continue to review full report at Codecov.
|
Re-added logical operations for nucleotides. I guess it actually is useful and meaningful to have |
Concerning BioJulia/BioSequences.jl#102, I think "encode" means the compression of the 8-bit primitive. What about using a method structure like this instead, which neatly skips around the need to choose a name? function (s::Type{BioSymbol})(bits::UInt8)
return reinterpret(s, bits)
end function (s::Type{BioSymbol})(char::Char)
return ...
end |
Yeah, I think keeping the logical and bitwise ops is a good idea. The docs describe how the symbols are encoded - DNA being "one-hot", and the logical ops then make implementing a lot of things like complements and set memberships and so on a breeze. |
@jakobnissen, apologies, today I realise it is the 8-bit symbol encoding that is compressed. |
Can we merge this, @benjward ? |
@jakobnissen I think this is fine, will be a major release though. I'm happy with use of |
As discussed on Slack
Currently, BioSymbols can be converted to integers:
The problem with this is that
convert
is called automatically by Julia in many circumstances, which can lead to unexpected behaviour:The fact that
DNA_M
is internally stored as0x03
is somewhat arbitrary implementation detail and should not be a part of the API for BioSymbols or BioSequences. Similarly, it's not entirely sensical whyIn this PR, integer/biosymbol conversion and arithmetic is removed. This includes addition, subtraction, binary operations (
|
,~
and&
), and conversion.x
, useencoded_data(x)
.encode(T, x)
.*
encoded_data_eltype(::Type{T})
,*
encoded_data(x::T)
*
encode(::Type{T}, x)
The two latter methods fall back to using
reinterpret
.Edit: This is very breaking and will with almost 100% certainty create problems for other BioJulia packages. I can fix those, if we agree this should be merged