Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple counts() method for pairwise alignments #4221

Merged
merged 3 commits into from
Jan 25, 2023

Conversation

peterjc
Copy link
Member

@peterjc peterjc commented Jan 24, 2023

  • I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • I have read the CONTRIBUTING.rst file, have run pre-commit
    locally, and understand that continuous integration checks will be used to
    confirm the Biopython unit tests and style checks pass with these changes.

  • I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

Closes #3538, while leaving scope for future enhancement.

@peterjc peterjc requested a review from mdehoon as a code owner January 24, 2023 14:59
Copy link
Contributor

@mdehoon mdehoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two suggestions:

  • It may be better to return a namedtuple instead of a plain tuple, as future versions of this function may return additional numbers (e.g. number of gap opens, number of gap extensions, number of upper-vs-lower matches, etc.). Then users can use the namedtuple keywords to ensure that they are getting the number they think they are getting.
  • For multiple (> 2) alignments, it is trivial to extend this function to include all pairs, as in
for i, seq1 in alignment:
    for j, seq2 in alignment:
        if i==j: break
        for a, b in zip(seq1, seq2):
           if a == "-" or b == "-":
                gaps += 1
            elif a == b:
                identities += 1
            else:
                mismatches += 1

@peterjc
Copy link
Member Author

peterjc commented Jan 25, 2023

Good idea on using a named tuple, although it is possible a future optional argument might change "identities" into something else like "close matches". Go with the current variable names?

Do you think the counts defined that way would be useful for n>2?

If so, should we exclude double counting - i.e. something equivalent to this:

for i, seq1 in alignment:
    for j, seq2 in alignment:
        # Don't count seq1 vs seq2 and seq2 vs seq1
        if i >= j: break
        for a, b in zip(seq1, seq2):
           if a == "-" or b == "-":
                gaps += 1
            elif a == b:
                identities += 1
            else:
                mismatches += 1

@mdehoon
Copy link
Contributor

mdehoon commented Jan 25, 2023

Good idea on using a named tuple, although it is possible a future optional argument might change "identities" into something else like "close matches". Go with the current variable names?

Yes; identities is not ambiguous.

Do you think the counts defined that way would be useful for n>2?

I have seen this used before in Biopython, though I can't remember now where. In any case, it does not hurt to allow that possibility.

If so, should we exclude double counting - i.e. something equivalent to this:

I believe that works out to be the same as breaking on i == j.

@peterjc
Copy link
Member Author

peterjc commented Jan 25, 2023

You're probably right about the loop. I'll update this shortly...

@peterjc peterjc merged commit 7b112cf into biopython:master Jan 25, 2023
@peterjc peterjc deleted the align_counts branch January 25, 2023 15:55
@peterjc
Copy link
Member Author

peterjc commented Jan 25, 2023

Thank you - that looks better than the first iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

matches, mismatches and gaps from Bio.Align.PairwiseAlignment
2 participants