Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support translation of ambiguity codes #30

Merged
merged 3 commits into from
Apr 14, 2021
Merged

support translation of ambiguity codes #30

merged 3 commits into from
Apr 14, 2021

Conversation

kyuhas
Copy link

@kyuhas kyuhas commented Apr 13, 2021

No description provided.

@kyuhas
Copy link
Author

kyuhas commented Apr 13, 2021

this is linked to this hgvs package issue: biocommons/hgvs#595

@cassiemk
Copy link

@reece we need a solution for this bug immediately, please review ASAP

@reece
Copy link
Member

reece commented Apr 13, 2021

Will look today. Thanks for the contribution.

@kyuhas
Copy link
Author

kyuhas commented Apr 13, 2021

Also, assuming this PR is good to merge, could you publish a new version of bioutils to pypi? Thanks!

Copy link
Member

@reece reece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, Kaylee.

This implementation translates any code with an ambiguity code as X. However, it's often possible to translate codons with ambiguity codes where the ambiguity is irrelevant to the outcome. Since we're adding ambiguity support, I think we should strive for the fuller support eventually. I'll follow up on slack with discussion for some options on how to proceed.

src/bioutils/cytobands.py Show resolved Hide resolved
@@ -441,7 +453,12 @@ def translate_cds(seq, full_codons=True, ter_symbol="*"):
protein_seq = list()
for i in range(0, len(seq) - len(seq) % 3, 3):
try:
aa = dna_to_aa1_lut[seq[i:i + 3]]
the_seq = seq[i:i + 3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename the_seq to codon for clarity?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing!

aa = dna_to_aa1_lut[seq[i:i + 3]]
the_seq = seq[i:i + 3]
wildcard_nucleotides = ["B", "D", "H", "V", "N", "U", "W", "S", "M", "K", "R", "Y", "Z"]
if any([wildcard_base in the_seq for wildcard_base in wildcard_nucleotides]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, any wildcard cause the AA to be X. I think we can do better than that. For example, in a standard translation table, CUN ⇒ Leu, GCN ⇒ Ala, GGN ⇒ Gly, AAY ⇒ Asn, etc. See overall comments for discussion.

@@ -441,7 +453,12 @@ def translate_cds(seq, full_codons=True, ter_symbol="*"):
protein_seq = list()
for i in range(0, len(seq) - len(seq) % 3, 3):
try:
aa = dna_to_aa1_lut[seq[i:i + 3]]
the_seq = seq[i:i + 3]
wildcard_nucleotides = ["B", "D", "H", "V", "N", "U", "W", "S", "M", "K", "R", "Y", "Z"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IUPAC calls these "ambiguity codes". Please use that name so that the intent is clearer.
(e.g., something like iupac_ambiguity_codes = "BDHVNUWSMKRYZ")

Also, a list of chars is better written as a string for readability. (Lists and strings are both Sequences and have the same interface for lookup, length, iterability, etc.)

@reece reece changed the title 595 support wildcards hgvs#595 support wildcards Apr 14, 2021
@reece reece changed the title hgvs#595 support wildcards support translation of wildcards Apr 14, 2021
@reece reece changed the title support translation of wildcards support translation of ambiguity codes Apr 14, 2021
@reece reece merged commit c5ea54e into biocommons:main Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants