Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OBO repository parsability with OWL API 4.2.8. #410

Closed
matentzn opened this issue Mar 27, 2017 · 20 comments
Closed

OBO repository parsability with OWL API 4.2.8. #410

matentzn opened this issue Mar 27, 2017 · 20 comments
Assignees
Labels
attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools

Comments

@matentzn
Copy link
Contributor

matentzn commented Mar 27, 2017

In case you are interested, I just ran an analysis of an OBO snapshot of around 3 months ago (I know some stuff has already changed since then, and the errors might be obsolete by now). Out of the 140+ ontologies, 9 had unloadable imports, and 1 is not parsable. The rest was fine.

1 aero.owl UnloadableImportException
2 flu.owl UnloadableImportException
3 foodon.owl UnloadableImportException
4 MFOMD.owl UnloadableImportException
5 miapa.owl UnloadableImportException
6 omit.owl UnloadableImportException
7 omrse.owl UnloadableImportException
8 pro_reasoned.owl UnparsableOntologyException
9 rnao.owl UnloadableImportException
10 sep.owl UnloadableImportException

Some more details (exception messages etc, plus some basic info on all ontologies in the snapshot) can be found here: http://rpubs.com/matentzn/obo

@cmungall
Copy link
Contributor

cmungall commented Mar 28, 2017 via email

@matentzn
Copy link
Contributor Author

matentzn commented Mar 28, 2017

I redownloaded the ontologies above to check which ones are still broken (28.03.2017).

(As I commented in the respective issue, rnao and sep still do not have valid purls. In fact, rnao points to the some obo foundry overall list). What I find noteworthy are the two broken imports to ro.owl. Isnt this one of the central OBO ontologies?

Unloadable imports:
aero.owl (Missing: http://purl.obolibrary.org/obo/aero/OMREAEROImport.owl)
flu.owl (Missing: http://www.obofoundry.org/ro/ro.owl)
MFOMD.owl (Missing: http://ogms.googlecode.com/svn/trunk/src/ontology/ogms.owl)
miapa.owl (Ontology already exists: http://www.w3.org/ns/prov-aq)
omit.owl (Missing: http://purl.obolibrary.org/obo/ncro/2015-12-10/ncro-combined.owl)
rnao.owl (Missing: http://www.obofoundry.org/ro/ro.owl)
sep.obo (Unparsable import: http://obo.cvs.sourceforge.net/checkout/obo/obo/ontology/phenotype/unit.obo)

Working now:
omrse.owl
foodon.owl
pro_reasoned.owl

@matentzn
Copy link
Contributor Author

I created a new OBO snapshot today (24 April 2017). Out of 154 ontologies overall, 142 have resolvable purls. Out of those, 136 are OWL API parseable. The following ontologies are not loadable:

Not parsable:

  • dinto.owl: Problem parsing file: dinto.owl (Could not parse ontology.)

Unloadable imports:

@cmungall
Copy link
Contributor

MIAPA - @hlapp already reported this error here: evoinfo/miapa#25

@cmungall
Copy link
Contributor

mf.owl now fixed

@matentzn
Copy link
Contributor Author

matentzn commented Nov 23, 2017

I have checked a snapshot of the OBOFoundry ontologies from today. You can see the results of the analysis here. I looked at OWL API parseability, OWL 2 profile violations, ontology size distributions and used axiomtypes. Highlights:

Out of the 144 that were downloadable today (either through Apache Commons IO or manually)

  • 134 were OWL API 4.2.8 parseable.
  • 10 were not parseable: dinto.owl, gaz.owl, ogg.owl, ohd.owl, ohmi.owl, omiabis.owl, omit.owl, omrse.owl, ontoneo.owl, pr.owl

Out of the 134 parseable, 7 were empty: cmf, cro, dpo, exo, rnao, sepio, xl. Now cmf, cro, dpo, rnao, sepio and xl (all except rnao) fail because the PURL does not link to the ontology, but to some AWS storage space that returns an XML document with a ListBucketResult. Maybe that is not what is intended. I am not clear why the OWLAPI does not fail on well-formed XML that is clearly not in any of the known ontology syntaxes. I do not know what is wrong with rnao. The file looks ok.

Interestingly, I did not have a single "unloadable import" problem this time. I am worried about that. Odds are there should be at least one of those. I did not change the OWL API version since last time I checked!

Two more highlights:

  • OBO Foundry ontologies seem remarkably non-broken with respect to OWL 2 DL profile violations: Only a bit more than 10% of the ontologies contain some sort of (relevant) violation. That seems fixable!
  • There are more ontologies with transitive object properties than with property hierarchies! At least I find this interesting!

@mcourtot
Copy link
Contributor

This is really interesting - if a bit sad...
I think we should have a commitment that the ontologies we list are at a minimum parseable. It'd be interesting to see if the parse errors are due to size (gaz, pr) or else.

I would be willing to consider the following:

  • run checks on regular basis (is this something we could run on a monthly basis? Is the code somewhere public if we wanted to call it from an automated check pipeline @matentzn?)
  • send email to offenders, with warning that if no action taken for 3 months we will deprecate the resource. (at the same time offering to help with technical aspects if the issues are purely technical)

@matentzn
Copy link
Contributor Author

matentzn commented Nov 24, 2017

I can produce a fully self-contained R Markdown file, but it will take some time. Would that be something of interest? At the moment, the whole pipeline is a bit disconnected.

I am also happy to run the analysis on request whenever necessary.

@simonjupp
Copy link
Contributor

OLS runs this check every night. I get a report that I'd be happy to have shared somewhere. Maybe we could have a GitHub badge for ontologies that are failing to parse or not available.

@matentzn
Copy link
Contributor Author

@simonjupp Awesome, can I get a copy this report? Another important place where a badge like this would be useful is on the OBO Foundry pages directly. Maybe a badge that indicated URL health for the download and one that indicates OWL API parsability.

@simonjupp
Copy link
Contributor

Here's the email I got from OLS last night.

`
OLS loading complete with the following messages:

The following ontologies were sucessfully updated

ddpheno
sepio
cro
sbo
dpo
foodon
mod
ddanat
fix
xl
dinto
emap
genepio
fma
exo
rnao

The following ontologies failed

sepio
Empty ontology found:null

cro
Empty ontology found:null

dpo
Empty ontology found:null

epo
Failed to download file: sun.net.www.protocol.https.HttpsURLConnectionImpl cannot be cast to sun.net.www.protocol.ftp.FtpURLConnection

mamo
Failed to download file: sun.net.www.protocol.https.HttpsURLConnectionImpl cannot be cast to sun.net.www.protocol.ftp.FtpURLConnection

idomal
Failed to download file: http://ontologies.berkeleybop.org/idomal.owl

eo
Failed to download file: sun.net.www.protocol.https.HttpsURLConnectionImpl cannot be cast to sun.net.www.protocol.ftp.FtpURLConnection

miro
Failed to download file: http://ontologies.berkeleybop.org/miro.owl

tads
Failed to download file: http://ontologies.berkeleybop.org/tads.owl

ogi
Failed to download file: sun.net.www.protocol.https.HttpsURLConnectionImpl cannot be cast to sun.net.www.protocol.ftp.FtpURLConnection

tgma
Failed to download file: http://ontologies.berkeleybop.org/tgma.owl

xl
Empty ontology found:null

co_356
Failed to download file: Read timed out

dinto
Failed to lazily instantiate collection for query:Initialization of ELKOWLOntologyLoader failed: Problem parsing file:/nfs/production3/spot/data/prod/ols/downloads/dinto
Could not parse ontology. Either a suitable parser could not be found, or parsing failed. See parser logs below for explanation.
The following parsers were tried:

  1. RDFXMLParser
  2. OWLXMLParser
  3. OWLFunctionalSyntaxOWLParser
  4. TurtleOntologyParser
  5. KRSS2OWLParser
  6. ManchesterOWLSyntaxOntologyParser
  7. OBOFormatOWLAPIParser
  8. OWLOBO12Parser

Detailed logs:

Parser: RDFXMLParser
org.semanticweb.owlapi.rdf.syntax.RDFParserException: [line=8:column=17] Expecting rdf:RDF element.


Parser: OWLXMLParser
org.xml.sax.SAXParseException; systemId: file:/nfs/production3/spot/data/prod/ols/downloads/dinto; lineNumber: 42; columnNumber: 91; Attribute name "data-pjax-transient" associated with an element type "meta" must be followed by the ' = ' character.


Parser: OWLFunctionalSyntaxOWLParser
Encountered " "< "" at line 7, column 1.
Was expecting:
"Ontology" ...
(Line 0)


Parser: TurtleOntologyParser
uk.ac.manchester.cs.owl.owlapi.turtle.parser.ParseException: Encountered "" at line 7, column 1.
Was expecting one of:


Parser: KRSS2OWLParser
de.uulm.ecs.ai.owlapi.krssparser.ParseException: Encountered " ">" " "" at line 7, column 1.
Was expecting:


Parser: ManchesterOWLSyntaxOntologyParser
Encountered '' at line 7 column 1. Expected either 'Ontology:' or 'Prefix:' (Line 7)


Parser: OBOFormatOWLAPIParser
LINENO: 7 - Could not find tag separator ':' in line.
LINE:


Parser: OWLOBO12Parser
org.coode.owlapi.obo12.parser.TokenMgrError: Lexical error at line 7, column 16. Encountered: "\n" (10), after : ""

co_333
Failed to download file: Read timed out

mirnao
Failed to download file: http://mirna-ontology.googlecode.com/svn/trunk/src/ontology/mirnao.owl

bto
Failed to download file: http://ontologies.berkeleybop.org/bto.owl

exo
Failed to lazily instantiate collection for query:Initialization of ELKOWLOntologyLoader failed: Problem parsing file:/nfs/production3/spot/data/prod/ols/downloads/exo
Could not parse ontology. Either a suitable parser could not be found, or parsing failed. See parser logs below for explanation.
The following parsers were tried:

  1. RDFXMLParser
  2. OWLXMLParser
  3. OWLFunctionalSyntaxOWLParser
  4. TurtleOntologyParser
  5. KRSS2OWLParser
  6. ManchesterOWLSyntaxOntologyParser
  7. OBOFormatOWLAPIParser
  8. OWLOBO12Parser

Detailed logs:

Parser: RDFXMLParser
org.semanticweb.owlapi.rdf.syntax.RDFParserException: [line=96:column=137] IRI 'http://www.geneontology.org/formats/oboInOwl#http://www.geneontology.org/formats/oboInOWL#xref' cannot be resolved against current base IRI http://purl.obolibrary.org/obo/exo.obo.owl reason is: Illegal character in fragment at index 89: http://www.geneontology.org/formats/oboInOwl#http://www.geneontology.org/formats/oboInOWL#xref


Parser: OWLXMLParser
Attribute not found: IRI (Line 34)


Parser: OWLFunctionalSyntaxOWLParser
Encountered " "< "" at line 1, column 1.
Was expecting:
"Ontology" ...
(Line 0)


Parser: TurtleOntologyParser
uk.ac.manchester.cs.owl.owlapi.turtle.parser.ParseException: Encountered "" at line 1, column 1.
Was expecting one of:


Parser: KRSS2OWLParser
de.uulm.ecs.ai.owlapi.krssparser.ParseException: Encountered " ">" " "" at line 1, column 1.
Was expecting:


Parser: ManchesterOWLSyntaxOntologyParser
Encountered '' at line 1 column 1. Expected either 'Ontology:' or 'Prefix:' (Line 1)


Parser: OBOFormatOWLAPIParser
LINENO: 1 - Could not find tag separator ':' in line.
LINE:


Parser: OWLOBO12Parser
org.coode.owlapi.obo12.parser.TokenMgrError: Lexical error at line 1, column 22. Encountered: "\n" (10), after : ""

ogsf
Failed to download file: sun.net.www.protocol.https.HttpsURLConnectionImpl cannot be cast to sun.net.www.protocol.ftp.FtpURLConnection

rnao
Empty ontology found:null

sep
Failed to download file: http://ontologies.berkeleybop.org/sep.owl
`

@simonjupp
Copy link
Contributor

You can get the status and error message from the API e.g. https://www.ebi.ac.uk/ols/api/ontologies/sep

@matentzn
Copy link
Contributor Author

Nice, I will take a look at it. :) Which OWL API version are you currently on?

@simonjupp
Copy link
Contributor

3.5.2. We plan to update to 4 next year.

@mcourtot
Copy link
Contributor

mcourtot commented Nov 27, 2017 via email

@matentzn
Copy link
Contributor Author

I am thinking of permanently dealing with this issue using the following pipeline:

  1. Analysing ontologies given their deploy location using all three OWL API versions, plus logical consistency checking, plus an analysis of the "health" of the deploy location (does it resolve, etc), in regular intervals.
  2. Establishing a badge system that awards ontologies for certain criteria. Our first draft can be found here.
  3. Offering the badges as a web service: deploy location in (for example http://ogms.googlecode.com/svn/trunk/src/ontology/ogms.owl) and relevant badges out. This allows interested parties to simply pull the latest badges and present them as part of a public listing (like obofoundry.org) or just as a way to send warning messages to maintainers at regular intervals.
  4. Offering a crude UI to explore the state of the ontologies, something like this:
    image

The system is already in place in an experimental state, but it is not secured yet, so I don't want to post the link here. If anyone is interested, I am happy to supply a link via email. I started to write some very basic documentation here.

@cmungall
Copy link
Contributor

cc @bill-baumgartner @balhoff
https://ucdenver-ccp.github.io/obo-ci/

@matentzn
Copy link
Contributor Author

matentzn commented Mar 14, 2019

Recent attempt only two unparsable ones!

Wow we are getting close to perfect parseability! Gogogo :P

@nlharris nlharris added the attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools label May 1, 2020
@nlharris
Copy link
Contributor

nlharris commented May 1, 2020

is this resolved?

@matentzn
Copy link
Contributor Author

matentzn commented May 1, 2020

No more action items and largely replaced by dashboard. Can be closed.

@matentzn matentzn closed this as completed May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools
Projects
None yet
Development

No branches or pull requests

6 participants