Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SO terms for polypeptide regions on nucleotides #484

Closed
murphyte opened this issue May 4, 2020 · 4 comments
Closed

SO terms for polypeptide regions on nucleotides #484

murphyte opened this issue May 4, 2020 · 4 comments

Comments

@murphyte
Copy link

murphyte commented May 4, 2020

There are several SO terms that are part_of polypeptide (SO:0000104):
mature_protein_region SO:0000419
propeptide SO:0001062
signal_peptide SO:0000418
transit_peptide SO:0000725

In GenBank, these 4 feature types can appear on both nucleotide and protein records. They are properly protein features, but for purposes of INSDC annotation they need to appear on nucleotides as well. It's also useful in other contexts (e.g. GFF3) to be able to designate a region on a nucleotide sequence that would correspond to one of these subregions of the CDS.

If I'm interpreting the ontology and specs correctly, polypeptide derives_from CDS SO:0000316, so technically those 4 child terms shouldn't be annotated on nucleotide sequences or have CDS as a Parent in GFF3 annotation. Do you agree?

If so, I'd like to request 4 equivalent terms as children of CDS_region SO:0000851. I don't suppose there's a precedent for how to name them? Maybe something like:

  1. mature_protein_region_of_CDS - CDS region corresponding to a mature protein region of a polypeptide. Synonym: INSDC_feature:mat_peptide
  2. propeptide_of_CDS - CDS region corresponding to a propeptide of a polypeptide. Synonym: INSDC_feature:propeptide
  3. signal_peptide_of_CDS - CDS region corresponding to a signal peptide of a polypeptide. Synonym: INSDC_feature:sig_peptide
  4. transit_peptide_of_CDS - CDS region corresponding to a transit peptide of a polypeptide. Synonym: INSDC_feature:transit_peptide
    (I'm not particularly fond of that, but it's the best I could come up with. It has some similarity to RNA feature types and their corresponding genes, like rRNA vs rRNA_gene)

Those INSDC_feature synonyms could either be on both the new terms and the existing polypeptide terms, or only on the new terms. The GenBank and GenPept use the same terms on both, but "INSDC" is only a nucleotide feature spec, so if we want a 1:1 mapping then the INSDC_feature synonyms should be on the new terms.

-Terence

@davidwsant
Copy link
Collaborator

Hi Terrence,

How do you think codon, coding_end, and coding_start (the current children of CDS_region) would best fit when these are added? I am thinking perhaps they could stay as is_a CDS_region, but part_of mature_protein_region_of_CDS. Do you think that sounds correct?

Additionally, I am not sure if these would need any relationship with the protein equivalents. I am thinking perhaps relationships like signal_peptide relationship: derives_from ! signal_peptide_of_CDS. What do you think?

Thanks,
Dave

@murphyte
Copy link
Author

How do you think codon, coding_end, and coding_start (the current children of CDS_region) would best fit when these are added? I am thinking perhaps they could stay as is_a CDS_region, but part_of mature_protein_region_of_CDS. Do you think that sounds correct?

There's no obligation that the coding_start or coding_end is part of mature_protein_region_of_CDS. The start or end could be cleaved off during processing to make the final mature_protein. Codons in the CDS in general may or may not be part of mature_protein_region_of_CDS. My inclination is to not set a relationship between these.

relationships like signal_peptide relationship: derives_from ! signal_peptide_of_CDS

yes, that seems appropriate.

davidwsant added a commit that referenced this issue May 13, 2020
@davidwsant
Copy link
Collaborator

Hi Terrence,

The four new terms have been created:

id: SO:0002249; name: mature_protein_region_of_CDS
id: SO:0002250; name: propeptide_region_of_CDS
id: SO:0002251; name: signal_peptide_region_of_CDS
id: SO:0002252; name: transit_peptide_region_of_CDS

As usual, the terms should be updated in the SO Browser within about 24 hours. Have a nice day!

Dave

@murphyte
Copy link
Author

awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants