-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple counts() method for pairwise alignments #4221
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two suggestions:
- It may be better to return a
namedtuple
instead of a plain tuple, as future versions of this function may return additional numbers (e.g. number of gap opens, number of gap extensions, number of upper-vs-lower matches, etc.). Then users can use thenamedtuple
keywords to ensure that they are getting the number they think they are getting. - For multiple (> 2) alignments, it is trivial to extend this function to include all pairs, as in
for i, seq1 in alignment:
for j, seq2 in alignment:
if i==j: break
for a, b in zip(seq1, seq2):
if a == "-" or b == "-":
gaps += 1
elif a == b:
identities += 1
else:
mismatches += 1
Good idea on using a named tuple, although it is possible a future optional argument might change "identities" into something else like "close matches". Go with the current variable names? Do you think the counts defined that way would be useful for n>2? If so, should we exclude double counting - i.e. something equivalent to this: for i, seq1 in alignment:
for j, seq2 in alignment:
# Don't count seq1 vs seq2 and seq2 vs seq1
if i >= j: break
for a, b in zip(seq1, seq2):
if a == "-" or b == "-":
gaps += 1
elif a == b:
identities += 1
else:
mismatches += 1 |
Yes;
I have seen this used before in Biopython, though I can't remember now where. In any case, it does not hurt to allow that possibility.
I believe that works out to be the same as breaking on |
You're probably right about the loop. I'll update this shortly... |
Thank you - that looks better than the first iteration. |
I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
I have read the
CONTRIBUTING.rst
file, have runpre-commit
locally, and understand that continuous integration checks will be used to
confirm the Biopython unit tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rst
andCONTRIB.rst
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
Closes #3538, while leaving scope for future enhancement.