New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SO terms for polypeptide regions on nucleotides #484
Comments
Hi Terrence, How do you think codon, coding_end, and coding_start (the current children of CDS_region) would best fit when these are added? I am thinking perhaps they could stay as is_a CDS_region, but part_of mature_protein_region_of_CDS. Do you think that sounds correct? Additionally, I am not sure if these would need any relationship with the protein equivalents. I am thinking perhaps relationships like signal_peptide relationship: derives_from ! signal_peptide_of_CDS. What do you think? Thanks, |
There's no obligation that the coding_start or coding_end is part of mature_protein_region_of_CDS. The start or end could be cleaved off during processing to make the final mature_protein. Codons in the CDS in general may or may not be part of mature_protein_region_of_CDS. My inclination is to not set a relationship between these.
yes, that seems appropriate. |
…02252 in response to GitHub Issue #484
Hi Terrence, The four new terms have been created: id: SO:0002249; name: mature_protein_region_of_CDS As usual, the terms should be updated in the SO Browser within about 24 hours. Have a nice day! Dave |
awesome, thanks! |
There are several SO terms that are part_of polypeptide (SO:0000104):
mature_protein_region SO:0000419
propeptide SO:0001062
signal_peptide SO:0000418
transit_peptide SO:0000725
In GenBank, these 4 feature types can appear on both nucleotide and protein records. They are properly protein features, but for purposes of INSDC annotation they need to appear on nucleotides as well. It's also useful in other contexts (e.g. GFF3) to be able to designate a region on a nucleotide sequence that would correspond to one of these subregions of the CDS.
If I'm interpreting the ontology and specs correctly, polypeptide derives_from CDS SO:0000316, so technically those 4 child terms shouldn't be annotated on nucleotide sequences or have CDS as a Parent in GFF3 annotation. Do you agree?
If so, I'd like to request 4 equivalent terms as children of CDS_region SO:0000851. I don't suppose there's a precedent for how to name them? Maybe something like:
(I'm not particularly fond of that, but it's the best I could come up with. It has some similarity to RNA feature types and their corresponding genes, like rRNA vs rRNA_gene)
Those INSDC_feature synonyms could either be on both the new terms and the existing polypeptide terms, or only on the new terms. The GenBank and GenPept use the same terms on both, but "INSDC" is only a nucleotide feature spec, so if we want a 1:1 mapping then the INSDC_feature synonyms should be on the new terms.
-Terence
The text was updated successfully, but these errors were encountered: