Consider using a JSON representation instead of the obo file format #2

cmungall · 2016-10-24T23:50:15Z

Note that have a proposed JSON representation of OBO that would obviate the need for special purpose parsers. Your comments as a developer would be most welcome:

See also: mschubert/python-obo#2

dhimmel · 2016-10-25T00:50:18Z

Hey @cmungall -- this is great news! I fully support the deprecation of OBO for a standard serialization format. Of serialization formats that have universal support, I can think of XML, JSON, YAML, and TSV. I think JSON is the best choice, and have myself used it to encode Hetionet v1.0.

Users will still need to understand the data model (what the JSON attributes mean) but will no longer have to struggle with text parsing. The problem with OWL is that it's too complex for most users: you can't just look at the raw text and understand what to do.

I wrote this package and its minimal amount of tests based on my specific use cases. Given the vague and odd specification of the OBO format, I would like to rely on this package as little as possible. So one thing that would be great is if OBOFoundry started releasing their ontologies in the obographs JSON format.

Larger picture, I think there have been two main pain points for ontology adoption. First is documentation and second is readability/operability. The documentation issues are slowly getting better, but the terminology is still daunting for outsiders. For example, I still don't fully understand what an "axiom" is. obographs will do a great job addressing the readability issue. And regarding operability, my hope is that public Neo4j instances (see greenelab/hetontology#3) will allow easy execution of advanced ontology queries via Cypher.

So pardon my ramble, just wanted to jot down my thoughts. Thanks for keeping me in the loop!

cmungall · 2016-10-25T18:02:52Z

In defence of OWL, the intention is that you use Protege or a similar interface to explore it, rather than looking at raw text. But this is a bit of an ivory tower attitude, in reality most coders need fewer levels of abstractions between them and the information they are trying to get at. (the OWL core language is actually amazingly simple - it's just a notation for writing simple set-theoretic expressions over a domain of unary and binary relations - but there is a big gap between this and bioinformatics use cases, and a huge amount of complexity involved in layering this notation onto the RDF/XML concrete form).

I wrote this package and its minimal amount of tests based on my specific use cases. Given the vague and odd specification of the OBO format, I would like to rely on this package as little as possible

Your implementation seems far more robust than other implementations I have seen. But as you point out it's fundamentally hard to check for edge cases given the complexity of the OBO spec (complexity that arises out of the evolution of what was intended to be a simple format - a lesson i there)

So one thing that would be great is if OBOFoundry started releasing their ontologies in the obographs JSON format

Great. We have more work to do to make this a standard, but we can have some release pipelines include this, with the understanding the structure may change, to allow early adopters a chance to test.

Thanks for the comments on the documentation, fully agree

dhimmel · 2016-10-25T21:51:10Z

In defence of OWL, the intention is that you use Protege or a similar interface to explore it, rather than looking at raw text.

The people need programmatic ontology access in Python. Out of curiosity, do you know the best way to import OWLs into a Python data structure?

notation for writing simple set-theoretic expressions over a domain of unary and binary relations

Yes "amazingly simple" 😸 .

We have more work to do to make this a standard, but we can have some release pipelines include this, with the understanding the structure may change, to allow early adopters a chance to test.

Awesome, happy to be a tester. Just ping me wherever and whenever the time has come!

cmungall · 2016-10-26T15:45:27Z

On 25 Oct 2016, at 14:51, Daniel Himmelstein wrote:

In defence of OWL, the intention is that you use Protege or a
similar interface to explore it, rather than looking at raw text.

The people need programmatic ontology access in Python. Out of
curiosity, do you know the best way to import OWLs into a Python data
structure?

rdflib is slow, and too low a level of abstraction. It depends what you
want to do. If you need to do serious ontology processing, then
jython+the OWLAPI is the way to go. But if you just need lightweight OWL
operations then the idea is that obographs satisfies the need.

notation for writing simple set-theoretic expressions over a domain
of unary and binary relations

Yes "amazingly simple" 😸 .

Think Venn Diagrams!

We have more work to do to make this a standard, but we can have some
release pipelines include this, with the understanding the structure
may change, to allow early adopters a chance to test.

Awesome, happy to be a tester. Just ping me wherever and whenever the
time has come!

OK, will do

dhimmel · 2021-11-02T20:15:11Z

e22bf7b adds a section to the README that mentions the nxontology.imports.pronto_to_multidigraph function. This allows users to first read an OBO Graphs JSON file using pronto.Ontology and then create a newtorkx.MultiDiGraph via pronto_to_multidigraph. Pronto uses fastobo to load OBO Graphs JSON and .obo files. It has its own RdfXMLParser for .owl files. So users looking to parse formats other than .obo should check that out and the nxontology integration.

I personally use OBO Graphs JSON whenever possible. The sad reality however is that for many ontologies, you'll have to try to parse all three formats and if lucky at least one format will work 😸 .

cmungall mentioned this issue Oct 31, 2016

Gather feedback from developers of bioinformatics tools that consume ontology files geneontology/obographs#9

Open

7 tasks

dhimmel mentioned this issue Mar 24, 2017

Upload the package to PyPI #3

Closed

dhimmel closed this as completed Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using a JSON representation instead of the obo file format #2

Consider using a JSON representation instead of the obo file format #2

cmungall commented Oct 24, 2016

dhimmel commented Oct 25, 2016

cmungall commented Oct 25, 2016

dhimmel commented Oct 25, 2016

cmungall commented Oct 26, 2016

dhimmel commented Nov 2, 2021

Consider using a JSON representation instead of the obo file format #2

Consider using a JSON representation instead of the obo file format #2

Comments

cmungall commented Oct 24, 2016

dhimmel commented Oct 25, 2016

cmungall commented Oct 25, 2016

dhimmel commented Oct 25, 2016

cmungall commented Oct 26, 2016

dhimmel commented Nov 2, 2021