Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update interface to version 3. #161

Merged
merged 13 commits into from
Jun 18, 2021
Merged

Conversation

jakobnissen
Copy link
Member

Ref #140 , #160 . Maybe needs to be merged with #160 .

This is a fairly large rewrite of BioSequences. The changes are breaking, and so needs to go into v3, but they appear bigger than they are. The large majority are internal changes.

API changes

Alphabet and BioSequence are now defined as interfaces. They have a small, minimal interface they must implement. You can find it in the docstrings in this PR for the two types - it's nearly identical to those discussed in #140 and #160. This is breaking, but unlikely to actually break anyone's code because it was difficult to actually implement these new types beforehand.

inbounds_getindex has been removed. This is part of a larger plan to make indexing more like vectors for v3.

Internal changes only

[ to be added ]

@jakobnissen
Copy link
Member Author

Ah, of course. It fails because it requires the latest master branch version of BioSymbols. The latest master contains breaking changes.

Copy link
Member

@CiaranOMara CiaranOMara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments pertain to the documentation and nomenclature, the code appears solid. However, there are some inconsistencies in nomenclature. The following terms are used interchangeably:

  • coding units/elements
  • symbol lists/sets
  • element/symbol/data

It is not always clear what is happening. Sometimes the general terms "element" or "data" are used in an abstract description. The problem with this is that we lose information about whether the element or data is a symbol that is encoded, or an encoding that is packed.

It may be easier to tighten the nomenclature with a separate pull request?

src/alphabet.jl Outdated Show resolved Hide resolved
src/alphabet.jl Outdated Show resolved Hide resolved
src/alphabet.jl Outdated Show resolved Hide resolved
src/alphabet.jl Outdated Show resolved Hide resolved
src/alphabet.jl Outdated Show resolved Hide resolved
src/alphabet.jl Outdated Show resolved Hide resolved
src/alphabet.jl Outdated Show resolved Hide resolved
`BioSequence` is the main abstract type of `BioSequences`.
Its subtypes are characterized by:
* Being a linear container type with random access and indices `Base.OneTo(length(x))`.
* Containing zero or more coding elements of type `encoded_data_eltype(typeof(x))`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Containing zero or more encoded `BioSymbol` of type `encoded_data_eltype(typeof(x))`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think it's better to be explicit that the BioSequence contains the internal representation, so I've changed it to "* Containing zero or internal data elements of type encoded_data_eltype(typeof(x))."

test/alphabet.jl Outdated Show resolved Hide resolved
@@ -59,7 +71,7 @@ end
Base.show(io::IO, i::BitIndex) = print(io, '(', index(i), ", ", offset(i), ')')

"Extract the element stored in a packed bitarray referred to by bidx."
@inline function extract_encoded_element(bidx::BitIndex{N,W}, data::AbstractArray{W}) where {N,W}
@inline function extract_encoded_element(bidx::BitIndex{N,W}, data::Union{AbstractArray{W}, NTuple{T, W}}) where {N,W,T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a better function name be extract_packed_encoding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not sure. One of the salient things should be that it extracts the encoded version of a single element.

@jakobnissen
Copy link
Member Author

Thanks for the review! You are right. Except for the two points I commented, I have made the requested changes.

@TransGirlCodes
Copy link
Member

I'm happy for this to get merged first and then if there's anything not made moot in #160 it can be merged afterwards.

Set minimum required version of BioSymbols to 5.0.0
@codecov
Copy link

codecov bot commented Jun 17, 2021

Codecov Report

Merging #161 (48b236e) into v3 (d95fa67) will increase coverage by 2.01%.
The diff coverage is 90.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##               v3     #161      +/-   ##
==========================================
+ Coverage   80.12%   82.14%   +2.01%     
==========================================
  Files          31       31              
  Lines        2179     2156      -23     
==========================================
+ Hits         1746     1771      +25     
+ Misses        433      385      -48     
Flag Coverage Δ
unittests 82.14% <90.27%> (+2.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/BioSequences.jl 0.00% <ø> (ø)
src/biosequence/conversion.jl 76.47% <ø> (+11.76%) ⬆️
src/longsequences/copying.jl 97.45% <ø> (-0.11%) ⬇️
src/longsequences/counting.jl 91.30% <ø> (ø)
src/search/re.jl 85.20% <0.00%> (ø)
src/biosequence/copying.jl 45.45% <45.45%> (ø)
src/longsequences/seqview.jl 84.84% <69.23%> (-8.70%) ⬇️
src/alphabet.jl 81.66% <84.84%> (+14.01%) ⬆️
src/bit-manipulation/bitindex.jl 65.71% <85.71%> (+6.73%) ⬆️
src/longsequences/indexing.jl 93.54% <90.00%> (-6.46%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d95fa67...48b236e. Read the comment docs.

@TransGirlCodes
Copy link
Member

TransGirlCodes commented Jun 17, 2021

@jakobnissen I've released BioSymbols v5, and took the liberty to make some commits to your interfaces branch to get make BioSymbols 5 a requirement and to get tests to work.

One of the test failures was due to struct definitions in @testset blocks not being permitted in julia 1.0, so I tweaked that.

Tests work for julia 1 tests, but not for 1.0 now. The current reason seems to be a method ambiguity between the constructor on line 67 of longsequence/subseq.jl and the constructor on line 26 of longsequence/conversion.jl

I don't know what you plan to do about this, one solution I've found is to change the sebseq.jl function to:

function LongSequence{A}(seq::LongSubSeq{A}) where {A<:NucleicAcidAlphabet}
	_copy_seqview(LongSequence{A}, seq)
end

Which as far as I can tell achieves the same as the current

function (::Type{T})(seq::LongSubSeq{<:NucleicAcidAlphabet{N}}) where
{N, T<:LongSequence{<:NucleicAcidAlphabet{N}}}
	_copy_seqview(T, seq)
end

But without the ambiguity. I'm happy to make the change here and see if CI passes if you're occupied.

@TransGirlCodes
Copy link
Member

All tests are passing now for 1 and 1.0

@TransGirlCodes
Copy link
Member

TransGirlCodes commented Jun 17, 2021

@jakobnissen I'd like to merge this, this evening into v3 if that's alright.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants