Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1286: remove exact synonyms that match label #1289

Merged
merged 109 commits into from Feb 3, 2022
Merged

Conversation

wdduncan
Copy link
Member

@wdduncan wdduncan commented Feb 2, 2022

Fixes #1286

Removed exact synonym violations discovered in PR #1240

ERROR	duplicate_label_synonym	garden	has_exact_synonym	garden
ERROR	duplicate_label_synonym	ocean	has_exact_synonym	ocean
ERROR	duplicate_label_synonym	sea	has_exact_synonym	sea
ERROR	duplicate_label_synonym	stream	has_exact_synonym	stream
ERROR	duplicate_label_synonym	reservoir	has_exact_synonym	reservoir

@wdduncan wdduncan self-assigned this Feb 2, 2022
@wdduncan
Copy link
Member Author

wdduncan commented Feb 2, 2022

@cmungall The PR is failing on this command:

owltools http://purl.obolibrary.org/obo/uberon.obo  --remove-axiom-annotations  --make-subset-by-properties -f BFO:0000050 BFO:0000051 RO:0002202 immediate_transformation_of RO:0002176 IAO:0000136 --remove-external-classes UBERON --remove-dangling-annotations --remove-annotation-assertions -l -s -d --set-ontology-id http://purl.obolibrary.org/obo/uberon.owl -o mirror/uberon.owl
[Fatal Error] :1:50: White spaces are required between publicId and systemId.

What are the publicId and systemId in the error message [Fatal Error] :1:50: White spaces are required between publicId and systemId.?

@cmungall
Copy link
Member

cmungall commented Feb 2, 2022

Recall that the OWLAPI cycles through all possible parsers, this is a SAX error message while it cycles through the RDFA parser. This is extremely confusing behavior and there are already issues on the owlapi issue tracker about it.

You should always look at the stack trace, and find the section relevant to the format, in this case OBOFormat. This usually yields the correct diagnostic

However, in this case it is more complex. There is an issue with the older version of the owlapi that owltools is using (we need to eliminate owltools from the build!) where it is not handling redirects and attempting to parse the network error message as it it were obo format.

The quick workaround here is to download the file before processing

this works:

wget http://purl.obolibrary.org/obo/uberon.obo
owltools uberon.obo  --remove-axiom-annotations  --make-subset-by-properties -f BFO:0000050 BFO:0000051 RO:0002202 immediate_transformation_of RO:0002176 IAO:0000136 --remove-external-classes UBERON --remove-dangling-annotations --remove-annotation-assertions -l -s -d --set-ontology-id http://purl.obolibrary.org/obo/uberon.owl -o mirror/uberon.owl

So I suggest doing this first, then we should modernize the whole pipeline to use a standard ODK approach

@wdduncan
Copy link
Member Author

wdduncan commented Feb 3, 2022

@cmungall I have fixed all the duplicate_label_synonym errors.

When I run make test locally, some errors still remain. The infamous:

ERROR Input ontology contains 1 triple(s) that could not be parsed:
 - <https://www.wikidata.org/wiki/Q2306597> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> _:genid-nodeid-node1fr0hvbupx12691.

And the following missing_label errors:

envo-base-robot-report.tsv:ERROR	missing_label	CHEBI:53243	rdfs:label	
envo-base-robot-report.tsv:ERROR	missing_label	CHEBI:53550	rdfs:label	
envo-base-robot-report.tsv:ERROR	missing_label	CHEBI:60034	rdfs:label	
envo-base-robot-report.tsv:ERROR	missing_label	CHEBI:60737	rdfs:label	
envo-base-robot-report.tsv:ERROR	missing_label	CHEBI:61452	rdfs:label	
envo-base-robot-report.tsv:ERROR	missing_label	CHEBI:61642	rdfs:label

envo-src-robot-report.tsv:ERROR	missing_label	poly(vinyl chloride)	rdfs:label	
envo-src-robot-report.tsv:ERROR	missing_label	poly(propylene)	rdfs:label	
envo-src-robot-report.tsv:ERROR	missing_label	polyethylene polymer	rdfs:label	
envo-src-robot-report.tsv:ERROR	missing_label	polyurethane polymer	rdfs:label	
envo-src-robot-report.tsv:ERROR	missing_label	poly(ethylene terephthalate) polymer	rdfs:label	
envo-src-robot-report.tsv:ERROR	missing_label	polystyrene polymer	rdfs:label

I think we should merge and deal with the missing_label errors in a separate PR.

@wdduncan wdduncan merged commit ef26d3a into master Feb 3, 2022
@wdduncan wdduncan deleted the issue-1286 branch February 3, 2022 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove duplicate synonyms
2 participants