Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 797 #803

Merged
merged 2 commits into from
Jun 27, 2016
Merged

Issue 797 #803

merged 2 commits into from
Jun 27, 2016

Conversation

ctSkennerton
Copy link
Contributor

Same changes as #801 minus the change in doctest that I couldn't figure out how to fix

Implements a solution to #797 by checking for DBSeqRecord and explicitly returning SeqRecord

@codecov-io
Copy link

Current coverage is 78.71%

Merging #803 into master will not affect coverage as of 9e54d13

@@            master    #803   diff @@
======================================
  Files          315     315       
  Stmts        46619   46622     +3
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit          36694   36697     +3
  Partial          0       0       
  Missed        9925    9925       

Review entire Coverage Diff as of 9e54d13

Powered by Codecov. Updated on successful CI builds.

@peterjc
Copy link
Member

peterjc commented Jun 24, 2016

Alternative solution which does not change the SeqRecord but duplicates code in DBSeqRecord:
https://github.com/peterjc/biopython/commits/biosql_slicing

What do you think @ctSkennerton? Yours seems a better long term choice as avoids code duplication.

@ctSkennerton
Copy link
Contributor Author

I'm against duplicated code as the number of people that a looking at BioSQL is going to be less than for SeqRecord so there is a greater chance of things breaking.

Having said that, I think adding __getitem__ to BioSQL.BioSeq would be worth it if there are additional features that make use of the lazy loading from the database. For example instead of iterating over all of the features and then appending those that are inside the slice, a specific SQL query could be used to only return those features from the database, which would boost performance. (This is something I've thought about doing recently)

I see your comment in Bio.SeqRecord about keeping the api in sync - can this be enforced automatically using tests?

@peterjc
Copy link
Member

peterjc commented Jun 24, 2016

A more elegant fix would be to extend the DBSeqRecord.__init__ to take start and end parameters (defaulting to everything) and using these for the sequence and feature lookup. But that is more work.

@ctSkennerton
Copy link
Contributor Author

Yeah I think that is a nice approach but would require rewriting a bunch of the SQL statements. I think for now it would be best to keep it simple and not duplicate the code and then down the track think about adding it in.

@peterjc peterjc merged commit 48cd6bc into biopython:master Jun 27, 2016
MarkusPiotrowski pushed a commit to MarkusPiotrowski/biopython that referenced this pull request Oct 31, 2017
Squashed commit of pull request biopython#803, closes issue biopython#797.

Long term the DBSeqRecord could implement __getitem__,
perhaps by extending __init__ to include start/end options
(default to the full record) and acting as a windowed view
of the full record (e.g. loading only selected features on
demand).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants