-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilized __slots__ in Seq and SeqRecord classes, issue #2854 #3309
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3309 +/- ##
==========================================
- Coverage 83.98% 83.98% -0.01%
==========================================
Files 318 318
Lines 51661 51677 +16
==========================================
+ Hits 43389 43402 +13
- Misses 8272 8275 +3
Continue to review full report at Codecov.
|
The flake8 tests failed, running black will fix most of them for you:
See the CONTRIBUTING file for details of how to install |
Bio/SeqRecord.py
Outdated
@@ -250,23 +274,10 @@ def __init__( | |||
self.features = features | |||
|
|||
# TODO - Just make this a read only property? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is obsolete with your proposed changes (but right now I don't see why you changed the SeqRecord, I though you were only changing the Seq classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will delete the comment.
I mentioned in #3309 I wanted to try to implement __slots__
for both classes, Seq and SeqRecord. If this is a problem I can remove the changes for SeqRecord.
I think I have flake8/pre-commit installed incorrectly because I do not get the same errors when checking locally, I will fix this. |
Hmm, looks like the discussion on #2854 was unclear about if you would try the Then comes benchmarking... |
I did not write these descriptions so I am hesitant to change them since they were already part of the code and accepted. |
@peterjc I'm having trouble with pre-commit. I was hoping you could help me, I never used this before and don't know how to fix my problem. It is blocking me from adding any changes now. This is the error I get when I try to commit my new changes,
I'm not working on my master branch but on the "slots" branch, I'm not sure why it is trying to checkout. |
@kaskales what OS do you have? Are you running |
I have Ubuntu 18.04.5 and I am using my system Python 2.7.17, I normally have to specify
|
See this issue: pre-commit/pre-commit#1036 It might be an issue with the pre-commit config? |
Regarding these:
The original property docstring was not style compliant, but since it was defined via the property function and not a decorated method, the flake8 plugin didn't catch it. You'll have to tweak the wording to get this to pass. While pre-commit is not working, I suggest installing flake8 and calling it directly - Update: In your case, |
Thanks @JoaoRodrigues ! I will take a look. I forgot to add this line which prints before the error occurs too.
@peterjc I was doing this and the only error I was getting was:
It wasn't catching the ones you commented here. I am fixing them by hand, since I think the error was I was using single quotes instead of double quotes. |
Running That flake8 gave you a SyntaxError strongly suggests to me that flake8 is running under Python 2, not Python 3. |
I would strongly suggest getting (mini)conda and creating a new Python3 environment for biopython where you install pre-commit. Makes it much easier to handle all these version issues than battling with system Python. |
@peterjc I fixed the flake8 errors by hand, I also used Thank you @JoaoRodrigues , I will look into getting miniconda too. |
Green lights - the tests passed. Do you want to run some FASTA or FASTQ parsing benchmarking now? Maybe try the example on #3188? P.S. It would be good if we can solve the pre-commit issue (to help avoid anyone else suffering from it). |
I look at that PR and I used this to test:
Maybe because of the small test file I used but I couldn't find much of a performance different between with/without |
That's disappointing, but how many sequences are there in We probably need to try a massive high throughput sequencing file to see much difference, with millions of short sequences. The overheads we're hoping to see an improvement on are per sequence. The example used on the other issue was from SRR12143416, which you can download from https://www.ebi.ac.uk/ena/browser/view/SRR12143416 with 35 million sequences in each of the FASTQ paried files. Converting either to FASTA would be fine for this test. Or, download from the SRA in their custom format and use their tool to convert to FASTA: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR12143416 (1.8GB) |
I repeated the trails with the fasta file you suggested, but I trimmed the file down to 1GB for my laptop. Without slots the average memory usage was 7446676 bytes during 36seconds, and with slots was 6462748 bytes during 33 seconds. The advantage of using slots is much more apparent when testing on larger files. |
FASTA or FASTQ? The percentage saving would be more obvious on FASTA without the quality scores. |
FASTA, I downloaded the SRA file from the link you sent me and used fastq-dump to convert it to a fasta format. |
Thank you for working on this. It was an interesting experiment even though the performance impact was minimal. Closing pull request. |
Added
__slots__
to the Seq and SeqRecord classes.In SeqRecord the
seq
andletter_annotations
attributes were changed into functions with property decorators.I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
I have read the
CONTRIBUTING.rst
file, have runpre-commit
locally,and understand that AppVeyor and TravisCI will be used to confirm the Biopython unit
tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rst
andCONTRIB.rst
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
Closes #2854