Add sequence view #120

jakobnissen · 2020-07-26T14:36:54Z

This adds a sequence view SeqView as discussed in #102 .

Because a LongSequence and SeqView share the same underlying encoding, many of the old LongSequence methods has been changed to now take a SeqOrView (which is just Union{LongSequence, SeqView})

You can construct a view with

SeqView(s::LongSequence, ::UnitRange)
@view my_seq[1:5]
SeqView(::Vector{UInt64}, ::UnitRange) (this one doesn't boundscheck)

Mutating methods that change the length of the view are intentionally not implemented, such as

filter!
map!
resize!
ungap!

codecov · 2020-07-26T14:40:26Z

Codecov Report

Merging #120 (ea3201f) into master (1cfc0aa) will decrease coverage by 0.16%.
The diff coverage is 85.36%.

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
- Coverage   83.21%   83.04%   -0.17%     
==========================================
  Files          44       45       +1     
  Lines        2949     3061     +112     
==========================================
+ Hits         2454     2542      +88     
- Misses        495      519      +24

Flag	Coverage Δ
unittests	`83.04% <85.36%> (-0.17%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/BioSequences.jl	`50.00% <ø> (ø)`
src/biosequence/biosequence.jl	`54.54% <ø> (ø)`
src/longsequences/constructors.jl	`85.93% <0.00%> (+3.67%)`	⬆️
src/minhash.jl	`68.62% <33.33%> (-2.21%)`	⬇️
src/biosequence/transformations.jl	`79.16% <54.54%> (-4.44%)`	⬇️
src/longsequences/transformations.jl	`93.81% <72.72%> (-6.19%)`	⬇️
src/longsequences/hash.jl	`88.65% <75.00%> (-11.35%)`	⬇️
src/longsequences/counting.jl	`91.30% <88.23%> (ø)`
src/longsequences/seqview.jl	`93.54% <93.54%> (ø)`
src/geneticcode.jl	`73.00% <100.00%> (-0.41%)`	⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1cfc0aa...09d7e92. Read the comment docs.

TransGirlCodes · 2020-08-01T23:29:58Z

Is editing a LongSequence, making it smaller say as a result of ungap! or resize! going to present a problem for a previously created view of say @view seq[90:100] if the sequence is resized to only 90bp?

src/longsequences/transformations.jl

jakobnissen · 2020-08-02T14:43:25Z

Is editing a LongSequence, making it smaller say as a result of ungap! or resize! going to present a problem for a previously created view of say @view seq[90:100] if the sequence is resized to only 90bp?

Yes, the view is completely unsafe. In theory it could even segfault if the underlying array is truncated. Perhaps I could add another boundscheck when accessing a view that the underlying buffer must be long enough? That would slow down boundschecked access just a little.
Edit: On second thought, that will help with the segfaults, but still not guard against out-of-bounds indexing, since an e.g. 10-nt long nucleotide seq will still have 64 full bits in the buffer despite only using 20/40 bits. Perhaps better to just state that views are unsafe.

An alternative is for the view to contain not a reference to the vector, but a reference to the longsequence itself.

Edit. On third thought, the views of Base do not bounds check this way, so let's just keep the unsafe behaviour.

TransGirlCodes · 2020-08-08T13:35:07Z

Ok, I'm happy with whatever behaviour closest resembles what julians already expect from a view, since that will be most consistent. We'll just make sure to document any pitfalls properly.

jakobnissen · 2020-08-23T14:41:37Z

@CiaranOMara @benjward As far as I can see, this PR is ready to go. I'll leave this PR up for a few months if you want to give it a test run. Maybe there are some convenient constructors I've forgotten to implement, or a well-hidden bug or whatever.

CiaranOMara · 2020-08-23T15:37:05Z

I think it’s fair to say that each BioSequence would have their view counterpart. To setup a consistent naming pattern for other views and to be more specific about what view SeqView provides, what about renaming SeqView as SubLongSequence? This name would also be more in line with Julia’s naming of SubArray.

jakobnissen · 2020-08-23T16:04:21Z

That's a good point. Perhaps my name is too generic. LongSubSequence is probably better (though technically, it doesn't need to be long, but the same could be said of LongSequence).

We probably will never need views for e.g. kmers, but surely there are other sequence types than LongSequence. So yeah, we should rename it

CiaranOMara · 2021-02-11T09:12:19Z

I was proposing "Sub" as a prefix convention.

jakobnissen · 2021-02-25T12:28:03Z

I think this has been hanging around long enough - I'm merging this now, and will begin using master branch as my daily driver. Hopefully I can hammer out any remaining bugs that way.

jakobnissen added 4 commits July 24, 2020 16:14

Remove COW from LongSequence

bfa1f49

Improve LongSequence hashing

000b96e

Simplify code further

aa20e87

Amend docs, simplify hashing code

07890da

jakobnissen added 3 commits July 27, 2020 11:50

Fix ambiguity error on Julia 1.0

d6907b6

Added SeqView

60fc0d9

Improve tests

edb4b04

jakobnissen force-pushed the seqview branch from 90e783e to edb4b04 Compare July 27, 2020 14:00

TransGirlCodes reviewed Aug 1, 2020

View reviewed changes

src/longsequences/transformations.jl Outdated Show resolved Hide resolved

jakobnissen added 2 commits August 23, 2020 16:24

Add documentation; improve tests

5abe69d

Merge branch 'master' into seqview

c96ae89

Rename SeqView to LongSubSeq

7b83950

jakobnissen added 2 commits February 24, 2021 20:49

Merge branch 'master' into seqview

c5b475b

Fix tests for subseq

09d7e92

jakobnissen merged commit 2527208 into BioJulia:master Feb 25, 2021

This was referenced Feb 25, 2021

julia 1.5: Time for views of a sequence? #102

Closed

Future of subsequences #118

Closed

Implement broadcasting for biosequences #135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sequence view #120

Add sequence view #120

jakobnissen commented Jul 26, 2020 •

edited

codecov bot commented Jul 26, 2020 •

edited

TransGirlCodes commented Aug 1, 2020

jakobnissen commented Aug 2, 2020 •

edited

TransGirlCodes commented Aug 8, 2020

jakobnissen commented Aug 23, 2020

CiaranOMara commented Aug 23, 2020

jakobnissen commented Aug 23, 2020

CiaranOMara commented Feb 11, 2021

jakobnissen commented Feb 25, 2021

Add sequence view #120

Add sequence view #120

Conversation

jakobnissen commented Jul 26, 2020 • edited

codecov bot commented Jul 26, 2020 • edited

Codecov Report

TransGirlCodes commented Aug 1, 2020

jakobnissen commented Aug 2, 2020 • edited

TransGirlCodes commented Aug 8, 2020

jakobnissen commented Aug 23, 2020

CiaranOMara commented Aug 23, 2020

jakobnissen commented Aug 23, 2020

CiaranOMara commented Feb 11, 2021

jakobnissen commented Feb 25, 2021

jakobnissen commented Jul 26, 2020 •

edited

codecov bot commented Jul 26, 2020 •

edited

jakobnissen commented Aug 2, 2020 •

edited