-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added first pass at BiologicalSequence object #53
Conversation
Before: ====================================================================== FAIL: complement functions as expected ---------------------------------------------------------------------- Traceback (most recent call last): File "tests/test_core/test_sequence.py", line 183, in test_complement self.assertEqual(self.b1.complement(),DNASequence("CTAATGT")) AssertionError: CTAATGT != CTAATGT After: ====================================================================== FAIL: complement functions as expected ---------------------------------------------------------------------- Traceback (most recent call last): File "tests/test_core/test_sequence.py", line 183, in test_complement self.assertEqual(self.b1.complement(),DNASequence("CTAATGT")) AssertionError: <NucleotideSequence: CTAATGT (length: 7)> != <DNASequence: CTAATGT (length: 7)>
used [this page](http://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html) as a source for the maps, will use a different source for the tests.
pulled test data from [this page](http://droog.gs.washington.edu/parc/images/iupac.html) which is different then where the mapping in the library code came from (see previous commit).
includes additions based on review @rob-knight's sequence (and derived) classes in PyCogent.
OK, I've either address all comments with code changes or responses. Thanks again for the input. This is ready for final review for merge. |
def __reversed__(self): | ||
return reversed(self._sequence) | ||
|
||
def __str__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the join here isn't needed because __init__
guarantees it, but in general, __str__
is usually human readable and __repr__
as unambiguous as possible, so it seems that both implementations are swapped.
If the sequences can get long, it can be better to return a shortened version of _sequence
(you don't want to have your terminal flooded by a really long string).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this in place to ease transition to an array-based representation. I do want this to be a string representation of the full sequence (that would be the way to get the full sequence into a python string, if you wanted to do anything with it there (e.g., write it to file).
from bipy.core.exception import BiologicalSequenceError | ||
|
||
|
||
def get_iupac_nucleotide_bases(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three functions seem a little out of place. Could the strings be stored as class variables (one in each of their respective classes)? You'd still be able to construct the alphabet the same way that you're currently doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that we might want access to them from other places in the code base. One alternative might be to make them class methods so they could be accessed as:
DNA.get_iupac_characters()
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just:
DNA.iupac_characters()
as I agree with your other comment about not using get_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And without the function call? Ie, storing them as class variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just storing them as class variables is fine:
DNA.iupac_characters
Is there a reason to make them static methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added as class variables, thanks for the input
|
||
|
||
class BiologicalSequence(Sequence): | ||
""" Base class for biological sequences """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a note here that BiologicalSequence
s are meant to be immutable, similar to strings? At least that's how they appear to me, since you've made all instance vars "private" with read-only properties, and methods return new copies of BiologicalSequence
s when appropriate.
class NucleotideSequence(BiologicalSequence): | ||
""" class for representing nucleotide sequences | ||
|
||
all IUPAC DNA/RNA characters are supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a note here about case-insensitivity?
description: a description or comment about the sequence (e.g., | ||
"green fluorescent protein") | ||
validate: if True, runs the is_valid method after construction | ||
and raises BiologicalSequenceError if is_valid == False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailing )
Thanks @gregcaporaso, will merge! |
added first pass at BiologicalSequence object
Thanks for the input everyone! |
NOT READY FOR MERGEOK, this one is ready for review for merge. @wasade or @jrrideout, are one of you available to take the lead on the review?