Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added first pass at BiologicalSequence object #53

Merged
merged 93 commits into from
Jan 27, 2014
Merged

added first pass at BiologicalSequence object #53

merged 93 commits into from
Jan 27, 2014

Conversation

gregcaporaso
Copy link
Contributor

NOT READY FOR MERGE

OK, this one is ready for review for merge. @wasade or @jrrideout, are one of you available to take the lead on the review?

Before:
======================================================================
FAIL: complement functions as expected
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_core/test_sequence.py", line 183, in test_complement
    self.assertEqual(self.b1.complement(),DNASequence("CTAATGT"))
AssertionError: CTAATGT != CTAATGT

After:
======================================================================
FAIL: complement functions as expected
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_core/test_sequence.py", line 183, in test_complement
    self.assertEqual(self.b1.complement(),DNASequence("CTAATGT"))
AssertionError: <NucleotideSequence: CTAATGT (length: 7)> != <DNASequence:
CTAATGT (length: 7)>
used [this
page](http://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html)
as a source for the maps, will use a different source for the tests.
pulled test data from [this
page](http://droog.gs.washington.edu/parc/images/iupac.html) which is different
then where the mapping in the library code came from (see previous commit).
includes additions based on review @rob-knight's sequence (and derived) classes
in PyCogent.
@gregcaporaso
Copy link
Contributor Author

OK, I've either address all comments with code changes or responses. Thanks again for the input. This is ready for final review for merge.

def __reversed__(self):
return reversed(self._sequence)

def __str__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the join here isn't needed because __init__ guarantees it, but in general, __str__ is usually human readable and __repr__ as unambiguous as possible, so it seems that both implementations are swapped.

If the sequences can get long, it can be better to return a shortened version of _sequence (you don't want to have your terminal flooded by a really long string).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this in place to ease transition to an array-based representation. I do want this to be a string representation of the full sequence (that would be the way to get the full sequence into a python string, if you wanted to do anything with it there (e.g., write it to file).

from bipy.core.exception import BiologicalSequenceError


def get_iupac_nucleotide_bases():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three functions seem a little out of place. Could the strings be stored as class variables (one in each of their respective classes)? You'd still be able to construct the alphabet the same way that you're currently doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that we might want access to them from other places in the code base. One alternative might be to make them class methods so they could be accessed as:

DNA.get_iupac_characters()

what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just:

DNA.iupac_characters()

as I agree with your other comment about not using get_.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And without the function call? Ie, storing them as class variables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just storing them as class variables is fine:

DNA.iupac_characters

Is there a reason to make them static methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added as class variables, thanks for the input



class BiologicalSequence(Sequence):
""" Base class for biological sequences """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a note here that BiologicalSequences are meant to be immutable, similar to strings? At least that's how they appear to me, since you've made all instance vars "private" with read-only properties, and methods return new copies of BiologicalSequences when appropriate.

@ghost ghost assigned jairideout Jan 27, 2014
class NucleotideSequence(BiologicalSequence):
""" class for representing nucleotide sequences

all IUPAC DNA/RNA characters are supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a note here about case-insensitivity?

description: a description or comment about the sequence (e.g.,
"green fluorescent protein")
validate: if True, runs the is_valid method after construction
and raises BiologicalSequenceError if is_valid == False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing )

@jairideout
Copy link
Contributor

Thanks @gregcaporaso, will merge!

jairideout added a commit that referenced this pull request Jan 27, 2014
added first pass at BiologicalSequence object
@jairideout jairideout merged commit a6c3c8b into scikit-bio:master Jan 27, 2014
@gregcaporaso
Copy link
Contributor Author

Thanks for the input everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants