-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix problems with obo file #26
Comments
@cmungall Hi and thanks for the advice, |
Thanks @mwalzer! I think the best thing to start on is documenting as much as possible (preferably as markdown files managed in this repo) as stated above. It would also be good to have some technical person at the PSI end able to help with any directions I can give. This need not be too much. No ontology expertise necessary, just some basics of git, running things on the command line, and some github knowledge (e.g. travis).
Yes, but you have the right to have the OBO PURLs for MS direct wherever you like. Currently OBO central slurps your file and builds release products, and the PURLs redirect to our S3. See http://obofoundry.org/faq/what-is-the-build-field.html. I am confident that we can work towards a situation where you have a workflow you can run and be in control of this yourself.
Would there be negative implications of removing these? What software depends on them?
There is a larger answer here but I think let's focus on having your ontology be browsable in major ontology browsers, be consumable by tool chains that provide services such as metadata annotation etc. Some of these (older) depend on obo format, newer ones typically depend on owl. More later, I'm sure we can get this up and running |
@cmungall I think we still don't have an official maintainer, but I'd volunteer to get this repo in a state where it can be easily maintained by automatically validated PRs run with Travis (or whatever other CI is easiest). I read about ROBOT in the ODK. Is it actually necessary to use the full ODK repo template, or can we add validation CI with the existing layout, while forgoing the other ODK features like release/merging/editing?
Until somebody volunteers to be a real maintainer, I think we'd rather stick with a simpler editing paradigm. Specifically, most edits will be simple addition of a few terms. Anybody should be able to get the hang of it, just editing with the GitHub text editor (another reason to continue using .obo format) and make a PR on this repo. GitHub handles the forking/editing/branching/merging. Travis will validate it, and if it passes, an admin can merge it. The raw obo file itself will effectively be the release (an owl conversion could be provided as well). Probably quite a few issues from |
Wasn't this addressed by automatic creation of OWL files by GitHub Actions (#88 )? |
I think the answer is yes. |
there are still the issues I highlighted above re imports, you should use the correct PURLs, the existing OWL could fail at any time |
I think the xrefs for values are indeed a legacy we left in to avoid breaking tools that already depended on it: #50 Does anybody know what tools actually do depend on it though, and can we get them to update to has_value_type? |
I know that I have some in-house PeptideAtlas related code that parses and uses xref:value-type. I suspect there are other various other tools that use it. But if it useful, I do not mind forcing the issue and migrating xref:value-type to relationship: has_value_type in our code to use the new syntax. I have not done so yet, but could do that easily given some motivation. I do not know any other specific software that uses it. I might guess that the various validators like the mzML validator and mzIdentML validators probably use it. they should use it, but I vague recall that maybe they did not validate value-type properly, so maybe they don't. It would be helpful if others could chime in with that they know. |
Thanks Eric, I know how much work it is to evaluate the impact of changes
on widely used standards on a diverse range of software tools, much
appreciated!
I think if there is willingness to change, I think it's worth doing it
right. Unfortunately the "relationship: has_value_type" fix is not really
much better, you are just going from overloading one field to a different
field.
For example:
https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2Fxsd_float
Once some information about how the existing MS standard is being used I
can make some proposals about how to do something that captures what it is
you need to capture without any overloading in a way that can be used in
validation. It may be the case that it's most expedient to freeze the
existing obo as a legacy artefact and build something using a more fit for
purpose framework
…On Mon, Apr 25, 2022 at 10:24 PM Eric Deutsch ***@***.***> wrote:
I know that I have some in-house PeptideAtlas related code that parses and
uses xref:value-type. I suspect there are other various other tools that
use it. But if it useful, I do not mind forcing the issue and migrating
xref:value-type to relationship: has_value_type in our code to use the new
syntax. I have not done so yet, but could do that easily given some
motivation.
I do not know any other specific software that uses it. I might guess that
the various validators like the mzML validator and mzIdentML validators
probably use it. they should use it, but I vague recall that maybe they did
not validate value-type properly, so maybe they don't. It would be helpful
if others could chime in with that they know.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMON5DYDDMEMYSNR3EPDVG54ZNANCNFSM4LFZTUNQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think our use of xref using a field with a well-defined purpose for a different purpose (i.e. misusing). But So while I'm sure there are other formats with built-in support for type and range properties, I very much doubt that freezing the OBO and switching future releases to other formats would be more expedient. On the other hand, we could continue with the OBO like it is, but add an automatic transformation to one of those formats with built-in support for types, where those built-in properties are populated by the has_value_type extension. How does that sound? (I still support removing the misuse of xref if it's practical to have consumers switch over though) |
Hi Matt
I am the one to blame for obo format. The relationship tag wasn't designed
to be extended any more than xref. Its purpose is to link terms together by
a defined relationship type. So the current usage in MS is on the surface
not too far off.
The problem here is that
1. you are treating XSD datatypes as terms (classes)
2. obo format unfortunately doesn't handle URIs outside the OBO namespace
well
This causes all kinds of issues for normal uses of obo format, and in the
resulting OWL, see the OLS link
However, this may very well be moot, as PSI software seems to be using the
obo files according to specific conventions
On the other hand, we could continue with the OBO like it is, but add an
automatic transformation to one of those formats with built-in support for
types, where those built-in properties are populated by the has_value_type
extension. How does that sound?
I think this sounds quite practical in the short term. I do think that
longer term the source of truth should be a more appropriate format with
the translation happing in the other direction, to generate the legacy obo
format product
…On Tue, Apr 26, 2022 at 8:25 AM Matt Chambers ***@***.***> wrote:
I think our use of xref using a field with a well-defined purpose for a
different purpose (i.e. misusing). But relationship: has_value_type is
extending a field that was designed to be extended. Like any extension,
consumers must be explicitly designed to handle it since it's not part of
the format specification, but also it should not break any consumers that
have not been designed to handle it.
So while I'm sure there are other formats with built-in support for type
and range properties, I very much doubt that freezing the OBO and switching
future releases to other formats would be more expedient. On the other
hand, we could continue with the OBO like it is, but add an automatic
transformation to one of those formats with built-in support for types,
where those built-in properties are populated by the has_value_type
extension. How does that sound?
(I still support removing the misuse of xref if it's practical to have
consumers switch over though)
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOPYQFY37XX4TMMG5L3VHADE7ANCNFSM4LFZTUNQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ah, I didn't actually know that the target of the relationship tag is supposed to be a term. So to avoid the problems you mentioned in the short term, we'd need to make the XSD types into proper terms? Or is there another (small) ontology for computer data types we can import from? I think we used XSD out of convenience, but the main thing we were trying to capture was "string", "real", or "integer". I don't think we are too concerned with ranges, judging by how easy it is to find terms that are using "integer" when they almost certainly should be "nonNegativeInteger". |
On Tue, Apr 26, 2022 at 9:58 AM Matt Chambers ***@***.***> wrote:
Ah, I didn't actually know that the target of the relationship tag is
supposed to be a term. So to avoid the problems you mentioned in the short
term, we'd need to make the XSD types into proper terms? Or is there
another (small) ontology for computer data types we can import from? I
think we used XSD out of convenience, but the main thing we were trying to
capture was "string", "real", or "integer". I don't think we are too
concerned with ranges, judging by how easy it is to find terms that are
using "integer" when they almost certainly should be "nonNegativeInteger".
This is on the right lines. However, we can't technically make the xsd
types into terms (classes), in OWL-DL (the semantics of OBO are in OWL-DL)
datatypes and classes are disjoint. This may seem arcane but there are some
good reasons
(note if we were using OWL then we would just model things like 'sample
mass' as datatype properties, explicitly assign xsd types as ranges -
unfortunately obo is less expressive than owl, and was never intended for
the kinds of things you're doing)
So a compromise would be terms that "shadow" xsd types. There are in fact
various ontologies that do this:
https://bioportal.bioontology.org/search?query=integer&button= -- not to
say it's a *good* solution, just probably the least worst given various
constraints of legacy tools, formats, etc. You could have xrefs that map
the shadow datatypes to the xsd datatypes (xrefs are semantically silent so
whilst unconventional it's not wrong)
I am not sure if there is apetite for an MS2 standard. I would be
interested in helping. I would not base this on obo format (but we could
have lossy transforms). OWL would be better. But better still would be a
language that is explicitly designed for this kind of thing such as LinkML (
https://linkml.io/linkml)... which I am a part of. This would allow you to
model things like:
- modeling metadata element such as "sample mass" in the correct way, as a
field/slot, with explicit range constraints, units formally annotated
- automated validators and checkers
- annotating metadata elements in an ISO-11179 conformant way, with
"sample mass" being described as a property of an OBI:sample, where the
quality measured is a PATO:mass, and the unit is the appropriate UO or UCUM
unit
- formally correct (but lossy) translation to ontologies for hosting on
ontology portals allowing for search and discoverablility etc
- various other benefits...
… —
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOLCBU4USILLUYCMPLLVHAOCPANCNFSM4LFZTUNQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
While I love the idea of the beautifully neat and organized MS2 ontology, I don't have the time to devote to it, and as far as I can tell, neither does anyone else in the collaboration that works on the current one. We barely scrape together enough interest to extend the current one in service of the formats that we develop and maintain. And I am under the impression that the MS ontology is not really useful to the greater ontology community at large. There's no point in Translator bothering to incorporate it, it's too niche, really. So, with no clamoring customers beyond what we already have and no one having time and expertise to devote time to rebuilding it, I don't see it happening. I totally get what you're saying, I don't disagree. I just see us breaking out of our current mode as a practical matter. I'd be thrilled to help someone who wanted to do it. But it's a person-year of effort to rebuild a new MS2 with all new tooling and make it workable for all current uses. Not to mention all existing software that uses the OBO. |
Thanks for the insight and recommendations Chris. Note that this OBO started as (and arguably still is) a controlled vocabulary rather than an ontology. Eric is right we don't have the time on both the producer and consumer sides to totally revamp this to a different format with a more formal structure. I think we can go with just adding the XSD types as terms so we're not violating the spec. Is there a way we can do that without changing the term ids we're currently referencing (
Because our default-namespace is MS, we should be ok to leave off the MS prefix with: It might break some parsers that expect the ids to be numeric, but as far as I can tell that's not actually a requirement of OBO (whereas a relationship must refer to an id, not the name of a term; that's less clear for an xref though) |
the double quote will cause all kinds of issues and doesn't really have any advantages as it's only implicitly related to the actual xsd IRI I would say if you do go this route just make this like any other term and just xref to xsd
sorry! doesn't work like that anyway it sounds like it doesn't really matter either way, MS uses it's own internal conventions with obo files and as Eric says t's not really used outside of some proteomics tools we should probably mark MS as inactive in OBO if that's OK? |
I'm confused, isn't https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2Fxsd_float trying to resolve the un-IDspace-prefixed term inside the MS IDspace? It's a dangling reference, right? If that's not the problem, what is the problem you were trying to point out with that link?
What would be the consequences of that? It is sometimes convenient to use the OBO browsers to view or share the structure of the MS OBO, e.g. to share a hierarchy of terms with a collaborator. It's also useful for people who are looking for a CV or ontology for MS to be able to find it in the common ontology repositories. |
On Wed, Apr 27, 2022 at 7:48 AM Matt Chambers ***@***.***> wrote:
Thanks for the insight and recommendations Chris. Note that this OBO
started as (and arguably still is) a controlled vocabulary rather than an
ontology.
Just a quick aside on this. All my comments are driven by pragmatic
concerns such as (1) can proteomics tools get the information they need in
a reliable and predictable way now and in the future (2) will existing more
generic tools that consume ms.obo and ms.owl (e.g. OLS, OBO Library,
various software libraries) do the right thing or at least fail gracefully
in ways that don't confuse users.
I'm not interested in being the ontology police or gatekeeping definitions
of ontology -- ontology, terminology, controlled vocabulary, metadata
standard, it's all a fuzzy continuum, what's useful is what matters.
Message ID: ***@***.***>
… |
there are multiple technical issues. It's something that looks like xsd:float, but is not, it has an OBO library PURL that is not resolvable, it's a dangling reference with no information about it... but most of all I think it's just plain confusing to users
It would still be browsable and show up in OLS, ontobee, etc, this would serve as a marker that it's not intended to be updated to follow standard interoperability rules and shouldn't be imported into other ontologies. |
There are some additional issues reported here:
http://obo-dashboard-test.ontodev.com/ms/dashboard.html
I recommend also
The text was updated successfully, but these errors were encountered: