Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using a JSON representation instead of the obo file format #2

Closed
cmungall opened this issue Oct 24, 2016 · 5 comments
Closed

Comments

@cmungall
Copy link
Contributor

Note that have a proposed JSON representation of OBO that would obviate the need for special purpose parsers. Your comments as a developer would be most welcome:

See also: mschubert/python-obo#2

@dhimmel
Copy link
Owner

dhimmel commented Oct 25, 2016

Hey @cmungall -- this is great news! I fully support the deprecation of OBO for a standard serialization format. Of serialization formats that have universal support, I can think of XML, JSON, YAML, and TSV. I think JSON is the best choice, and have myself used it to encode Hetionet v1.0.

Users will still need to understand the data model (what the JSON attributes mean) but will no longer have to struggle with text parsing. The problem with OWL is that it's too complex for most users: you can't just look at the raw text and understand what to do.

I wrote this package and its minimal amount of tests based on my specific use cases. Given the vague and odd specification of the OBO format, I would like to rely on this package as little as possible. So one thing that would be great is if OBOFoundry started releasing their ontologies in the obographs JSON format.

Larger picture, I think there have been two main pain points for ontology adoption. First is documentation and second is readability/operability. The documentation issues are slowly getting better, but the terminology is still daunting for outsiders. For example, I still don't fully understand what an "axiom" is. obographs will do a great job addressing the readability issue. And regarding operability, my hope is that public Neo4j instances (see greenelab/hetontology#3) will allow easy execution of advanced ontology queries via Cypher.

So pardon my ramble, just wanted to jot down my thoughts. Thanks for keeping me in the loop!

@cmungall
Copy link
Contributor Author

In defence of OWL, the intention is that you use Protege or a similar interface to explore it, rather than looking at raw text. But this is a bit of an ivory tower attitude, in reality most coders need fewer levels of abstractions between them and the information they are trying to get at. (the OWL core language is actually amazingly simple - it's just a notation for writing simple set-theoretic expressions over a domain of unary and binary relations - but there is a big gap between this and bioinformatics use cases, and a huge amount of complexity involved in layering this notation onto the RDF/XML concrete form).

I wrote this package and its minimal amount of tests based on my specific use cases. Given the vague and odd specification of the OBO format, I would like to rely on this package as little as possible

Your implementation seems far more robust than other implementations I have seen. But as you point out it's fundamentally hard to check for edge cases given the complexity of the OBO spec (complexity that arises out of the evolution of what was intended to be a simple format - a lesson i there)

So one thing that would be great is if OBOFoundry started releasing their ontologies in the obographs JSON format

Great. We have more work to do to make this a standard, but we can have some release pipelines include this, with the understanding the structure may change, to allow early adopters a chance to test.

Thanks for the comments on the documentation, fully agree

@dhimmel
Copy link
Owner

dhimmel commented Oct 25, 2016

In defence of OWL, the intention is that you use Protege or a similar interface to explore it, rather than looking at raw text.

The people need programmatic ontology access in Python. Out of curiosity, do you know the best way to import OWLs into a Python data structure?

notation for writing simple set-theoretic expressions over a domain of unary and binary relations

Yes "amazingly simple" 😸 .

We have more work to do to make this a standard, but we can have some release pipelines include this, with the understanding the structure may change, to allow early adopters a chance to test.

Awesome, happy to be a tester. Just ping me wherever and whenever the time has come!

@cmungall
Copy link
Contributor Author

On 25 Oct 2016, at 14:51, Daniel Himmelstein wrote:

In defence of OWL, the intention is that you use Protege or a
similar interface to explore it, rather than looking at raw text.

The people need programmatic ontology access in Python. Out of
curiosity, do you know the best way to import OWLs into a Python data
structure?

rdflib is slow, and too low a level of abstraction. It depends what you
want to do. If you need to do serious ontology processing, then
jython+the OWLAPI is the way to go. But if you just need lightweight OWL
operations then the idea is that obographs satisfies the need.

notation for writing simple set-theoretic expressions over a domain
of unary and binary relations

Yes "amazingly simple" 😸 .

Think Venn Diagrams!

We have more work to do to make this a standard, but we can have some
release pipelines include this, with the understanding the structure
may change, to allow early adopters a chance to test.

Awesome, happy to be a tester. Just ping me wherever and whenever the
time has come!

OK, will do

@dhimmel
Copy link
Owner

dhimmel commented Nov 2, 2021

e22bf7b adds a section to the README that mentions the nxontology.imports.pronto_to_multidigraph function. This allows users to first read an OBO Graphs JSON file using pronto.Ontology and then create a newtorkx.MultiDiGraph via pronto_to_multidigraph. Pronto uses fastobo to load OBO Graphs JSON and .obo files. It has its own RdfXMLParser for .owl files. So users looking to parse formats other than .obo should check that out and the nxontology integration.

I personally use OBO Graphs JSON whenever possible. The sad reality however is that for many ontologies, you'll have to try to parse all three formats and if lucky at least one format will work 😸 .

@dhimmel dhimmel closed this as completed Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants