-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update interface to version 3. #161
Conversation
Ah, of course. It fails because it requires the latest master branch version of BioSymbols. The latest master contains breaking changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments pertain to the documentation and nomenclature, the code appears solid. However, there are some inconsistencies in nomenclature. The following terms are used interchangeably:
- coding units/elements
- symbol lists/sets
- element/symbol/data
It is not always clear what is happening. Sometimes the general terms "element" or "data" are used in an abstract description. The problem with this is that we lose information about whether the element or data is a symbol that is encoded, or an encoding that is packed.
It may be easier to tighten the nomenclature with a separate pull request?
src/biosequence/biosequence.jl
Outdated
`BioSequence` is the main abstract type of `BioSequences`. | ||
Its subtypes are characterized by: | ||
* Being a linear container type with random access and indices `Base.OneTo(length(x))`. | ||
* Containing zero or more coding elements of type `encoded_data_eltype(typeof(x))`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Containing zero or more encoded `BioSymbol` of type `encoded_data_eltype(typeof(x))`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I think it's better to be explicit that the BioSequence contains the internal representation, so I've changed it to "* Containing zero or internal data elements of type encoded_data_eltype(typeof(x))
."
src/bit-manipulation/bitindex.jl
Outdated
@@ -59,7 +71,7 @@ end | |||
Base.show(io::IO, i::BitIndex) = print(io, '(', index(i), ", ", offset(i), ')') | |||
|
|||
"Extract the element stored in a packed bitarray referred to by bidx." | |||
@inline function extract_encoded_element(bidx::BitIndex{N,W}, data::AbstractArray{W}) where {N,W} | |||
@inline function extract_encoded_element(bidx::BitIndex{N,W}, data::Union{AbstractArray{W}, NTuple{T, W}}) where {N,W,T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a better function name be extract_packed_encoding
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, not sure. One of the salient things should be that it extracts the encoded version of a single element.
Thanks for the review! You are right. Except for the two points I commented, I have made the requested changes. |
I'm happy for this to get merged first and then if there's anything not made moot in #160 it can be merged afterwards. |
Set minimum required version of BioSymbols to 5.0.0
Codecov Report
@@ Coverage Diff @@
## v3 #161 +/- ##
==========================================
+ Coverage 80.12% 82.14% +2.01%
==========================================
Files 31 31
Lines 2179 2156 -23
==========================================
+ Hits 1746 1771 +25
+ Misses 433 385 -48
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@jakobnissen I've released BioSymbols v5, and took the liberty to make some commits to your interfaces branch to get make BioSymbols 5 a requirement and to get tests to work. One of the test failures was due to struct definitions in Tests work for julia 1 tests, but not for 1.0 now. The current reason seems to be a method ambiguity between the constructor on line 67 of longsequence/subseq.jl and the constructor on line 26 of longsequence/conversion.jl I don't know what you plan to do about this, one solution I've found is to change the sebseq.jl function to: function LongSequence{A}(seq::LongSubSeq{A}) where {A<:NucleicAcidAlphabet}
_copy_seqview(LongSequence{A}, seq)
end Which as far as I can tell achieves the same as the current function (::Type{T})(seq::LongSubSeq{<:NucleicAcidAlphabet{N}}) where
{N, T<:LongSequence{<:NucleicAcidAlphabet{N}}}
_copy_seqview(T, seq)
end But without the ambiguity. I'm happy to make the change here and see if CI passes if you're occupied. |
All tests are passing now for 1 and 1.0 |
@jakobnissen I'd like to merge this, this evening into v3 if that's alright. |
Ref #140 , #160 . Maybe needs to be merged with #160 .
This is a fairly large rewrite of BioSequences. The changes are breaking, and so needs to go into v3, but they appear bigger than they are. The large majority are internal changes.
API changes
Alphabet
andBioSequence
are now defined as interfaces. They have a small, minimal interface they must implement. You can find it in the docstrings in this PR for the two types - it's nearly identical to those discussed in #140 and #160. This is breaking, but unlikely to actually break anyone's code because it was difficult to actually implement these new types beforehand.inbounds_getindex
has been removed. This is part of a larger plan to make indexing more like vectors for v3.Internal changes only
[ to be added ]