Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use wikidata to provide skos:definition for each owl:Class #200

Closed
wants to merge 1 commit into from

Conversation

lewismc
Copy link
Member

@lewismc lewismc commented Jul 15, 2020

Hi folks, this is an updated attempt which extract's Wikidata schema:description's and map them to rdfs:comment's using OWLAPI instead of Jena write the data.

I am looking for feedback here. I know the way that I've structured the annotations is not the way we want to do it but this gives us an idea of how useful the code I wrote by evaluating the results.

The initial results confirm that there are now 2077 occurences of rdfs:comment ... this is not bad (assuming that they make sense)

cat * | grep -c "rdfs:comment"
cat: output: Is a directory
2077

ISSUES

  1. OWLAPI is screwing up the base prefix IRI for each file. I'm going to look into how I can prevent that.
  2. OWLAPI seems to hate using pefixes... this means that the prefix work we did a while back is not really being used IN SWEET anymore. I don't really like this and will try to address this as well.
  3. The owl:versionIRI <Optional[http://sweetontology.net/human]/3.6.0> ; is an issue. This should be owl:versionIRI <http://sweetontology.net/human/3.6.0> ;. I'll work on fixing that.

QUESTION
How do we review the new rdfs:comment annotations to ensure that the make logical sense? Some options

  1. split each one into a separate PR... meaning 2077 PR's!!!
  2. Split each file into a separate PR... meaning some >225 PR's!!!
  3. Have folks sit for hours going through this huge PR and flagging any issues they see. Maybe we could split this work up...

What do you guys think?

Thanks

@dr-shorthair
Copy link
Collaborator

I agree that - in the short term at least - Wikidata probably provides a good, succinct, source of definitions.
But reviewing every text is unrealistic on any useful timescale, and might trigger discussions that may belong better over in ENVO.

So I'd suggest looking at a way to adopt the WIkidata definitions transparently.
i.e. either

  • copy the descriptions over but into an annotation structure that allows the provenance to be recorded
  • just add a suitable SKOS mapping link skos:exactMatch or skos:closeMatch etc to the Wikidata entry

Then other definitions could be added alongside, which would reflect the rough/contested semantics that we are aiming for in SWEET?

I recognise that merely linking does not achieve the goal of getting a local text definition included, but maybe @carueda could do some magic in COR to fetch schema:description values from the link for display purposes?

@lewismc
Copy link
Member Author

lewismc commented Jul 15, 2020

Thanks @dr-shorthair

I agree that - in the short term at least - Wikidata probably provides a good, succinct, source of definitions.

+1

But reviewing every text is unrealistic on any useful timescale, and might trigger discussions that may belong better over in ENVO.

+1

copy the descriptions over but into an annotation structure that allows the provenance to be recorded
just add a suitable SKOS mapping link skos:exactMatch or skos:closeMatch etc to the Wikidata entry

We have examples of the following

###  http://sweetontology.net/realmCryo/AlpineTundra
soreac:AlpineTundra rdf:type owl:Class ;
                  rdfs:subClassOf soreac:Tundra ;
                  rdfs:label "alpine tundra"@en ;
                  skos:closeMatch <http://purl.obolibrary.org/obo/ENVO_01001371> ;
                  skos:definition  [
                        rdfs:comment  "A tundra ecosystem which exists at high altitudes and where vegetation is stunted due to low temperatures and high winds."@en ;
                        dcterms:source <https://orcid.org/0000-0003-4808-4736> ;
                        dcterms:created "2019-12-10T06:11:13-08:00Z"^^xsd:dateTimeStamp ;
                        dcterms:creator <https://orcid.org/0000-0003-4091-6059> ;
                        prov:wasDerivedFrom <http://purl.obolibrary.org/obo/ENVO_01001371> ;
                      ] .

wdyt?

@dr-shorthair
Copy link
Collaborator

Yes - that is pretty much the direction I was looking for.

@lewismc
Copy link
Member Author

lewismc commented Jul 15, 2020

Excellent. I'll go ahead and implement.

Copy link
Collaborator

@smrgeoinfo smrgeoinfo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewismc looks like the current PR files haven't been updated to add the definitions using the skos:definition per #200 (comment). I think that is a good approach, and await seeing updated files with the proposed definition encoding.

@lewismc lewismc changed the title Provide rdfs:comment (and/or skos:definition or dct:description) text to all terms Use wikidata to Provide skos:definition for each owl:Class Jul 17, 2020
@lewismc lewismc changed the title Use wikidata to Provide skos:definition for each owl:Class Use wikidata to provide skos:definition for each owl:Class Jul 17, 2020
@lewismc lewismc closed this Jul 17, 2020
@lewismc lewismc deleted the ISSUE-125 branch July 17, 2020 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants