Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document expectations about naming conventions in obo version of SO #465

Closed
cmungall opened this issue Feb 23, 2019 · 3 comments
Closed

Comments

@cmungall
Copy link
Contributor

There are certain expectations about the OBO version of SO relied on for formats such as GFF3. These include:

  • All rdfs:labels (names in OBO) must be underscore separated and free of special characters (e.g. five_prime_UTR rather than "5' UTR")
  • Names for sequence_features should not change. This is because GFF3 uses labels as IDs for typing.

These are obviously highly constraining and do not map to current practice in existing ontologies. However, they are currently part of an unwritten contract between SO and a subset of its users. These contracts must be written up and clearly documented, as well as enforced by travis checks (#462).

In the future we may want to have a mechanism such that we can continue to support existing software and evolve the ontology in a modern fashion. This may or may not be coupled with the SO/MSO plan. It may involve creating a specific URL such as for a "SO-GFF" product, while the main SO product follows normal OBO naming conventions. TBD, the first step is just documenting existing behavior.

@cmungall cmungall changed the title Document expectations about obo version of SO Document expectations about naming conventions obo version of SO Feb 23, 2019
@cmungall
Copy link
Contributor Author

We should also document expectations about the graph of SO, e.g. that there should be a traceable path between transcript elements and genes

@cmungall cmungall changed the title Document expectations about naming conventions obo version of SO Document expectations about naming conventions in obo version of SO Feb 23, 2019
@davidwsant
Copy link
Collaborator

Hi Chris,

I believe the SO still currently uses underscores for all terms and does not use special characters in names. As far as I am aware, the names have not changed but I have only been working on SO for a short time. Is there a way to check to see if this has caused any problems?

As far as a traceable path between transcript elements and genes, I believe this is fine. For example, CDS is_a mRNA_region, which is_a mature_transcript_region, which is_a transcript_region, which is part_of a transcript, which is_a gene_member_region, which is member_of a gene. Does this fulfil that requirement?

I am not certain if anything needs to be done at this point. Another issue ticket you have created mentions developing an SOP, which could contain the information about requiring underscores instead of spaces and no special characters, as well as not changing the names. Do you feel that this would satisfy the suggestions?

Thanks,

Dave

davidwsant added a commit that referenced this issue Apr 14, 2020
@davidwsant
Copy link
Collaborator

Hi Chris,

The README.md has been updated to include information about naming conventions.

Thanks,

Dave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants