Skip to content
This repository has been archived by the owner on Nov 18, 2023. It is now read-only.

Support Mitochondrial HGVS #4

Closed
korikuzma opened this issue Jul 4, 2023 · 7 comments
Closed

Support Mitochondrial HGVS #4

korikuzma opened this issue Jul 4, 2023 · 7 comments
Labels
advanced Project is good for those with advanced experience hgvs Project is for HGVS

Comments

@korikuzma
Copy link
Contributor

korikuzma commented Jul 4, 2023

Submitter Name

Andreas Prlic (@andreasprlic)

Submitter Affiliation

Invitae

Requested By

Invitae

Additional Submitter Details

No response

Lead(s)

@andreasprlic

biocommons Repo

hgvs

Project Details

The hgvs library uses the translate_cds method from bioutils. That one already supports alternative translation tables (eg. for selenoproteins). We need another translation table there for mitochondria. That would be similar to biocommons/bioutils#36. Then this needs to get enabled with the AltTranscriptData / AltSeqBuilder somehow. See a related ticket here.

Refseq has this data for MT genes I think this should be sufficient to offer "m_to_p". This data needs to get loaded into seqrepo / UTA in a way so we can conveniently access it and it looks similar to the rest of data used by hgvs.

Skill Level

Advanced

Required Skills

Python, Mitochondrial HGVS nomenclature

@korikuzma korikuzma added advanced Project is good for those with advanced experience hgvs Project is for HGVS labels Jul 4, 2023
@andreasprlic
Copy link

The hgvs library uses the translate_cds method from bioutils. That one already supports alternative translation tables (eg. for selenoproteins). We need another translation table there for mitochondria. That would be similar to biocommons/bioutils#36. Then this needs to get enabled with the AltTranscriptData / AltSeqBuilder somehow. See a related ticket here.

Refseq has this data for MT genes I think this should be sufficient to offer "m_to_p". This data needs to get loaded into seqrepo / UTA in a way so we can conveniently access it and it looks similar to the rest of data used by hgvs.

@korikuzma
Copy link
Contributor Author

@andreasprlic Thanks! I will just copy this into the Project Details section

@reece
Copy link
Member

reece commented Aug 3, 2023

@andreasprlic: I don't have a good handle on exactly what it will take to implement this, but I think we're in for at least a new version of UTA, and we both know what that's like.

Would you please do the following?

  • Generate a small set of examples that we can use for test cases
  • Develop a plan for how this project should and what materials we need

@veenarajaraman
Copy link

veenarajaraman commented Aug 26, 2023

chrMT_test_variants.csv
Here are some test variants from ClinVar.
Coding regions:
NC_012920.1_coding_regions.csv

@veenarajaraman
Copy link

we also need a PR into https://github.com/biocommons/bioutils to add in the vertebrate mitochondrial translation table

diff --git a/src/bioutils/sequences.py b/src/bioutils/sequences.py
index 1a2ce75..c67f966 100644
--- a/src/bioutils/sequences.py
+++ b/src/bioutils/sequences.py
@@ -221,6 +221,18 @@ dna_to_aa1_lut = {  # NCBI standard translation table
 dna_to_aa1_sec = dna_to_aa1_lut.copy()
 dna_to_aa1_sec["TGA"] = "U"
 
+# Vertebrate micochondrial translation table
+# https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG2
+
+dna_to_aa1_vmito = dna_to_aa1_lut.copy()
+dna_to_aa1_vmito["AGA"] = "*"
+dna_to_aa1_vmito["AGG"] = "*"
+dna_to_aa1_vmito["ATA"] = "M"
+dna_to_aa1_vmito["TGA"] = "W"
+
+
+
+
 complement_transtable = bytes.maketrans(b"ACGT", b"TGCA")
 
 
@@ -506,6 +518,7 @@ class TranslationTable(StrEnum):
 
     standard = "standard"
     selenocysteine = "sec"
+    vertebrate_mitochondrial = 'vmito'
 
 
 def translate_cds(seq, full_codons=True, ter_symbol="*", translation_table=TranslationTable.standard):
@@ -596,6 +609,8 @@ def translate_cds(seq, full_codons=True, ter_symbol="*", translation_table=Trans
         trans_table = dna_to_aa1_lut
     elif translation_table == TranslationTable.selenocysteine:
         trans_table = dna_to_aa1_sec
+    elif translation_table == TranslationTable.vertebrate_mitochondrial:
+        trans_table = dna_to_aa1_vmito
     else:
         raise ValueError("Unsupported translation table {}".format(translation_table))
     seq = replace_u_to_t(seq)

@korikuzma
Copy link
Contributor Author

This will not be worked on at the hackathon. @andreasprlic is going to merge some comments before closing.

@andreasprlic
Copy link

We won't get to this issue as part of the hackthon this weekend, but we will continue on this topic afterwards as part of biocommons/hgvs#663

@andreasprlic andreasprlic closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
advanced Project is good for those with advanced experience hgvs Project is for HGVS
Projects
None yet
Development

No branches or pull requests

4 participants