Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problems with obo file #26

Closed
cmungall opened this issue Mar 11, 2020 · 19 comments
Closed

Fix problems with obo file #26

cmungall opened this issue Mar 11, 2020 · 19 comments

Comments

@cmungall
Copy link

cmungall commented Mar 11, 2020

  • imports:
    • imports to obo format files will not work in OWL toolchains
    • always use official OBO PURLs in imports. For example, you are using PATO, see products here: http://obofoundry.org/ontology/pato
  • xrefs
    • looks like you are using these in an unusual way, OWL may be better suited to your needs

There are some additional issues reported here:
http://obo-dashboard-test.ontodev.com/ms/dashboard.html

I recommend also

  • documenting your ontology editing practice in an editors guide. Which tools are used to edit the obo? It says generated by obo-edit, but looks hand edited?
  • documenting user expectations of the file. How are the imports used by tools? What would happen if these were omitted? How are the xrefs used?
@mwalzer
Copy link
Contributor

mwalzer commented Mar 11, 2020

@cmungall Hi and thanks for the advice,
timing is (almost) fortunate, our psidev-ms-vocab coordinator recently left (due to a job-change).
We appreciate the advice and maybe you can guide us setting the right automated mechanisms in place to avoid many (future) problems when we add new content?
As a follow-up question, is obofoundry the place to find the PURLs our obo desires and if not exclusively, where to look?
And
Regarding xrefs and owl, I think this is a legacy issue, since it was easiest to use a simple text format over a markup format. Although the xref is inspired by the way it is used in xsd files. At least that is what I figured - wasn't there when that began.
Is there documentation on how obo and/or owl are supposed to be used?
Again, any form of help is appreciated!

@cmungall
Copy link
Author

Thanks @mwalzer!

I think the best thing to start on is documenting as much as possible (preferably as markdown files managed in this repo) as stated above.

It would also be good to have some technical person at the PSI end able to help with any directions I can give. This need not be too much. No ontology expertise necessary, just some basics of git, running things on the command line, and some github knowledge (e.g. travis).

is obofoundry the place to find the PURLs our obo desires and if not exclusively, where to look

Yes, but you have the right to have the OBO PURLs for MS direct wherever you like. Currently OBO central slurps your file and builds release products, and the PURLs redirect to our S3. See http://obofoundry.org/faq/what-is-the-build-field.html. I am confident that we can work towards a situation where you have a workflow you can run and be in control of this yourself.

Regarding xrefs and owl, I think this is a legacy issue

Would there be negative implications of removing these? What software depends on them?

Is there documentation on how obo and/or owl are supposed to be used?

There is a larger answer here but I think let's focus on having your ontology be browsable in major ontology browsers, be consumable by tool chains that provide services such as metadata annotation etc.

Some of these (older) depend on obo format, newer ones typically depend on owl.

More later, I'm sure we can get this up and running

@chambm
Copy link
Contributor

chambm commented Apr 13, 2020

@cmungall I think we still don't have an official maintainer, but I'd volunteer to get this repo in a state where it can be easily maintained by automatically validated PRs run with Travis (or whatever other CI is easiest). I read about ROBOT in the ODK. Is it actually necessary to use the full ODK repo template, or can we add validation CI with the existing layout, while forgoing the other ODK features like release/merging/editing?

In your repo you will see a README-editors.md file that has been customized for your project. Follow these instructions.

Generally the cycle is to:

branch
edit the edit.owl file
make test
git commit
git push
To make a release:

make prepare_release

Note that any make step can be preceded by run.sh if you have Docker installed:

sh run.sh make prepare_release

Until somebody volunteers to be a real maintainer, I think we'd rather stick with a simpler editing paradigm. Specifically, most edits will be simple addition of a few terms. Anybody should be able to get the hang of it, just editing with the GitHub text editor (another reason to continue using .obo format) and make a PR on this repo. GitHub handles the forking/editing/branching/merging. Travis will validate it, and if it passes, an admin can merge it. The raw obo file itself will effectively be the release (an owl conversion could be provided as well). Probably quite a few issues from http://obo-dashboard-test.ontodev.com/ms/dashboard.html should be addressed (fixed or muted) in the process so that true validation errors are more obvious.

@chambm
Copy link
Contributor

chambm commented Jan 5, 2022

Wasn't this addressed by automatic creation of OWL files by GitHub Actions (#88 )?

@edeutsch
Copy link
Contributor

edeutsch commented Jan 5, 2022

I think the answer is yes.

@cmungall
Copy link
Author

cmungall commented Jan 5, 2022

there are still the issues I highlighted above re imports, you should use the correct PURLs, the existing OWL could fail at any time

@chambm
Copy link
Contributor

chambm commented Apr 25, 2022

I think the xrefs for values are indeed a legacy we left in to avoid breaking tools that already depended on it: #50

Does anybody know what tools actually do depend on it though, and can we get them to update to has_value_type?

@edeutsch
Copy link
Contributor

I know that I have some in-house PeptideAtlas related code that parses and uses xref:value-type. I suspect there are other various other tools that use it. But if it useful, I do not mind forcing the issue and migrating xref:value-type to relationship: has_value_type in our code to use the new syntax. I have not done so yet, but could do that easily given some motivation.

I do not know any other specific software that uses it. I might guess that the various validators like the mzML validator and mzIdentML validators probably use it. they should use it, but I vague recall that maybe they did not validate value-type properly, so maybe they don't. It would be helpful if others could chime in with that they know.

@cmungall
Copy link
Author

cmungall commented Apr 26, 2022 via email

@chambm
Copy link
Contributor

chambm commented Apr 26, 2022

I think our use of xref using a field with a well-defined purpose for a different purpose (i.e. misusing). But relationship: has_value_type is extending a field that was designed to be extended. Like any extension, consumers must be explicitly designed to handle it since it's not part of the format specification, but also it should not break any consumers that have not been designed to handle it.

So while I'm sure there are other formats with built-in support for type and range properties, I very much doubt that freezing the OBO and switching future releases to other formats would be more expedient. On the other hand, we could continue with the OBO like it is, but add an automatic transformation to one of those formats with built-in support for types, where those built-in properties are populated by the has_value_type extension. How does that sound?

(I still support removing the misuse of xref if it's practical to have consumers switch over though)

@cmungall
Copy link
Author

cmungall commented Apr 26, 2022 via email

@chambm
Copy link
Contributor

chambm commented Apr 26, 2022

Ah, I didn't actually know that the target of the relationship tag is supposed to be a term. So to avoid the problems you mentioned in the short term, we'd need to make the XSD types into proper terms? Or is there another (small) ontology for computer data types we can import from? I think we used XSD out of convenience, but the main thing we were trying to capture was "string", "real", or "integer". I don't think we are too concerned with ranges, judging by how easy it is to find terms that are using "integer" when they almost certainly should be "nonNegativeInteger".

@cmungall
Copy link
Author

cmungall commented Apr 27, 2022 via email

@edeutsch
Copy link
Contributor

While I love the idea of the beautifully neat and organized MS2 ontology, I don't have the time to devote to it, and as far as I can tell, neither does anyone else in the collaboration that works on the current one. We barely scrape together enough interest to extend the current one in service of the formats that we develop and maintain. And I am under the impression that the MS ontology is not really useful to the greater ontology community at large. There's no point in Translator bothering to incorporate it, it's too niche, really. So, with no clamoring customers beyond what we already have and no one having time and expertise to devote time to rebuilding it, I don't see it happening. I totally get what you're saying, I don't disagree. I just see us breaking out of our current mode as a practical matter. I'd be thrilled to help someone who wanted to do it. But it's a person-year of effort to rebuild a new MS2 with all new tooling and make it workable for all current uses. Not to mention all existing software that uses the OBO.

@chambm
Copy link
Contributor

chambm commented Apr 27, 2022

Thanks for the insight and recommendations Chris. Note that this OBO started as (and arguably still is) a controlled vocabulary rather than an ontology. Eric is right we don't have the time on both the producer and consumer sides to totally revamp this to a different format with a more formal structure. I think we can go with just adding the XSD types as terms so we're not violating the spec. Is there a way we can do that without changing the term ids we're currently referencing (xsd:float)? Something like:

[Term]
id: MS:xsd\:float
def: "32-bit floating point value." []

Because our default-namespace is MS, we should be ok to leave off the MS prefix with:
relationship: has_value_type xsd\:float ! The allowed value-type for this CV term

It might break some parsers that expect the ids to be numeric, but as far as I can tell that's not actually a requirement of OBO (whereas a relationship must refer to an id, not the name of a term; that's less clear for an xref though)

@cmungall
Copy link
Author

id: MS:xsd:float

the double quote will cause all kinds of issues and doesn't really have any advantages as it's only implicitly related to the actual xsd IRI

I would say if you do go this route just make this like any other term and just xref to xsd

Because our default-namespace is MS, we should be ok to leave off the MS prefix with

sorry! doesn't work like that

anyway it sounds like it doesn't really matter either way, MS uses it's own internal conventions with obo files and as Eric says t's not really used outside of some proteomics tools

we should probably mark MS as inactive in OBO if that's OK?

@chambm
Copy link
Contributor

chambm commented Apr 27, 2022

id: MS:xsd:float

the double quote will cause all kinds of issues and doesn't really have any advantages as it's only implicitly related to the actual xsd IRI

I would say if you do go this route just make this like any other term and just xref to xsd

Because our default-namespace is MS, we should be ok to leave off the MS prefix with

sorry! doesn't work like that

I'm confused, isn't https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2Fxsd_float trying to resolve the un-IDspace-prefixed term inside the MS IDspace? It's a dangling reference, right? If that's not the problem, what is the problem you were trying to point out with that link?

anyway it sounds like it doesn't really matter either way, MS uses it's own internal conventions with obo files and as Eric says t's not really used outside of some proteomics tools

we should probably mark MS as inactive in OBO if that's OK?

What would be the consequences of that? It is sometimes convenient to use the OBO browsers to view or share the structure of the MS OBO, e.g. to share a hierarchy of terms with a collaborator. It's also useful for people who are looking for a CV or ontology for MS to be able to find it in the common ontology repositories.

@cmungall
Copy link
Author

cmungall commented Apr 27, 2022 via email

@cmungall
Copy link
Author

I'm confused, isn't https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2Fxsd_float trying to resolve the un-IDspace-prefixed term inside the MS IDspace? It's a dangling reference, right? If that's not the problem, what is the problem you were trying to point out with that link?

there are multiple technical issues. It's something that looks like xsd:float, but is not, it has an OBO library PURL that is not resolvable, it's a dangling reference with no information about it... but most of all I think it's just plain confusing to users

we should probably mark MS as inactive in OBO if that's OK?

What would be the consequences of that? It is sometimes convenient to use the OBO browsers to view or share the structure of the MS OBO, e.g. to share a hierarchy of terms with a collaborator. It's also useful for people who are looking for a CV or ontology for MS to be able to find it in the common ontology repositories.

It would still be browsable and show up in OLS, ontobee, etc, this would serve as a marker that it's not intended to be updated to follow standard interoperability rules and shouldn't be imported into other ontologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants