Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-125 Use wikidata to provide skos:definition to owl:Class'es #205

Closed
wants to merge 1 commit into from

Conversation

lewismc
Copy link
Member

@lewismc lewismc commented Jul 21, 2020

This pull request supersedes #203

@graybeal regarding the definitions. I simply removed the skos:definition's for those ones.

There was only one entry which required a skos:historyNote this is as follows

###  http://sweetontology.net/phenCryo/GlacierRetreat
:GlacierRetreat rdf:type owl:Class ;
                rdfs:subClassOf :GlacialProcess ,
                                <http://sweetontology.net/phenSystem/Retreat> ,
                                <http://sweetontology.net/procStateChange/Melting> ;
                rdfs:label "glacier retreat"@en ;
                rdfs:seeAlso <https://github.com/ESIPFed/sweet/issues/185> ;
                skos:closeMatch <http://purl.obolibrary.org/obo/ENVO_01001656> ;
                skos:definition _:genid5 ,
                                _:genid6 .

_:genid5 dcterms:created "2020-04-09T10:20:12-08:00Z"^^xsd:dateTimeStamp ;
          dcterms:creator <https://orcid.org/0000-0003-4091-6059> ;
          dcterms:source <https://orcid.org/0000-0003-4808-4736> ;
          rdfs:comment "The process of glacier ice loss."@en ;
          <http://www.w3.org/ns/prov#wasDerivedFrom> <http://purl.obolibrary.org/obo/ENVO_01001656> ;
          skos:historyNote "Native curated definition by ESIP Semantic Harmonization Committee."@en .

_:genid6 dcterms:created "2020-07-20T17:20:42.890"^^xsd:dateTime ;
          dcterms:creator <https://orcid.org/0000-0003-2185-928X> ;
          dcterms:source <http://www.wikidata.org/entity/Q94706497> ;
          rdfs:comment "shrinking of a glacier"@en ;

Finally, @graybeal @brandonnodnarb @rrovetto see the requested generated CSV file.

@lewismc
Copy link
Member Author

lewismc commented Jul 21, 2020

There appear to be some files which are failing to write... I am investigating those right now. Also, I noticed that a few other files are failing due to host connection timeout issues... this may have to do with OWLAPI or the host or the process... I am not sure.

@lewismc
Copy link
Member Author

lewismc commented Jul 21, 2020

Carried over from #203 from @dr-shorthair

Attempting to load in TopBraid so I can run the SPARQL: a lot of errors from mis-formatted xsd"dateTime and xsd:dateTimeStamp :-( in phenCryo and realmCryo ...

I'll go ahead and fix that. Good catch.

@lewismc
Copy link
Member Author

lewismc commented Jul 21, 2020

@dr-shorthair right now the annotation looks as follows

_:genid1 dcterms:created "2020-07-20T17:03:26.420"^^xsd:dateTime ;

I'll go ahead and change these to the following

_:genid1 dcterms:created "2020-07-20T17:03:26.420-07:00"^^xsd:dateTimeStamp ;

@brandonnodnarb
Copy link
Member

brandonnodnarb commented Jul 21, 2020

Yes, but realmCryo was last edited 2 months ago...

Protege 5.5 doesn't throw an error, but the definition shows as blank brackets in the editor. Would the zulu time encoding from the Cryo group cause a clash? That's an odd error.

@brandonnodnarb
Copy link
Member

brandonnodnarb commented Jul 21, 2020

Looking at the spreadsheet (many thanks @lewismc) it looks like abbreviations are matching to genetic elements. For example, sic, which is an equivalent class to Standard Industrial Classification (and, IMHO, should probably be a skos:altLabel), is finding a wikidata match as "genetic element in the species Drosophila melanogaster"@en

It looks like any definition starting with "genetic element in the species..." can probably be disregarded.
EDIT: there are 9. :)

@lewismc
Copy link
Member Author

lewismc commented Jul 21, 2020

@brandonnodnarb thanks for taking a look. Regarding abbreviations yes I am +1 for adding clarifying axioms as you suggest. This is a bit difficult ti implement automatically though... I don't know how I would do that.

It looks like any definition starting with "genetic element in the species..." can probably be disregarded.

I can implement this check pretty easily. I'll go ahead and do that. Essentially, it just means that these definitions will be dropped.

@dr-shorthair
Copy link
Collaborator

I got a bunch of reports where

  • there was a trailing 'Z' after a time-zone offset - it is one or the other, not both!
  • there were some with spaces embedded - but I can't find them now.

TB might be finding some stuff in imports, but diagnostics a bit lacking.

@lewismc
Copy link
Member Author

lewismc commented Jul 21, 2020

@dr-shorthair

there was a trailing 'Z' after a time-zone offset - it is one or the other, not both!

I think this is a bug and should be addressed in a separate pull request. Are you able to submit that one?

there were some with spaces embedded - but I can't find them now.

OK, I've not experienced this one!

@rrovetto
Copy link
Collaborator

Protege 5.5 doesn't throw an error, but the definition shows as blank brackets in the editor.

Likewise--it also only displayed blank brackets when I tried.

@brandonnodnarb
Copy link
Member

Regarding abbreviations yes I am +1 for adding clarifying axioms as you suggest. This is a bit difficult ti implement automatically though... I don't know how I would do that.

Apologies, this ^ was snark -- I have mentioned this before but haven't had time to address it. (I need a sarcasm font.) I think I can write a simple filter for to extract the subsets.

Anwyay. Assuming #207 fixes the time stamp issue(s), I think this is good to go. There are definitely some definitions that don't seem correct, but they aren't obviously wrong --- aside from the "genetic element" defs.

Assuming nothing else breaks, I think it's a good start. :)

@brandonnodnarb brandonnodnarb requested review from brandonnodnarb and removed request for brandonnodnarb July 21, 2020 07:22
@smrgeoinfo
Copy link
Collaborator

incorrect mapping of geologic time intervals (from the csv dump)

sweet:stateTime/Age | age | http://www.wikidata.org/entity/Q185836 | "period of life of a human or organism"@en |
should map to https://www.wikidata.org/wiki/Q568683

sweet:stateTime/Epoch | epoch | http://www.wikidata.org/entity/P6259 | "epoch of an astronomical object coordinate"@en
should map to | https://www.wikidata.org/wiki/Q754897

sweet:stateTime/Period | period | http://www.wikidata.org/entity/Q101843 | "row in the periodic table of elements"@en
should map to https://www.wikidata.org/wiki/Q392928

@smrgeoinfo
Copy link
Collaborator

smrgeoinfo commented Jul 21, 2020

lots of other incorrect mappings, particularly for commonly used words, some examples:

  • Bottom: "role in a BDSM relationship"@en
  • Rim: "The external flange that is machined, cast, molded, stamped or pressed around the bottom of a firearms cartridge"
  • Flank: "side of the body between the rib cage and the iliac bone of the hip"
  • Margin "a type of financial collateral used to cover credit risk"
  • Validated : "badge being used as a Wikisource work status indicator"
  • Shell: ""in solid mechanics" [didn't get all the text???]
  • Layer: "in electronics, a single thickness of some material covering a surface"

I spent about 45 minutes scanning through the csv file , and found 138 definitions that are obviously wrong or need review; I looked at maybe a third of the rows. The really technical terms for the most part got reasonable matches. The marked up spreadsheet is available, problem defs are highlighted in yellow.

@lewismc
Copy link
Member Author

lewismc commented Jul 22, 2020

@brandonnodnarb no problems ;)

Great @smrgeoinfo some comments from you

sweet:stateTime/Age | age | http://www.wikidata.org/entity/Q185836 | "period of life of a human or organism"@en |
should map to https://www.wikidata.org/wiki/Q568683

... I will update these 3 manually in the next iteration.

lots of other incorrect mappings, particularly for commonly used words, some examples... problem defs are highlighted in yellow.

I'll go ahead and manually remove these incorrect annotations. We can address them in future work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants