Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use wikidata to provide skos:definition to owl:Class'es #201

Closed
wants to merge 1 commit into from

Conversation

lewismc
Copy link
Member

@lewismc lewismc commented Jul 17, 2020

This is a new branch which fixes all of the issues identified in #200. Thanks for the feedback @dr-shorthair @smrgeoinfo so far.

I've resolved all of the cryo issues. This PR is ready to be reviewed folks.

@rrovetto
Copy link
Collaborator

I recommend (a) including also additional definitions, and (b) not declaring the WikiData ones as the preferred or definitive one, in part because Wiki may have inaccurate, incorrect or unreliable definitions.

@lewismc
Copy link
Member Author

lewismc commented Jul 17, 2020

@rrovetto

including also additional definitions

From where?

not declaring the WikiData ones as the preferred or definitive one, in part because Wiki may have inaccurate, incorrect or unreliable definitions.

Which ones are inaccurate, incorrect or unreliable? If you have an example, please point it out.

@rrovetto
Copy link
Collaborator

rrovetto commented Jul 17, 2020

It can depend, but like other resources, there can be a diversity of sources. To find out which are inaccurate would involve going through them on wiki, and for those that are specific to a discipline ideally with subject-matter experts.

Depending on the concept or term, definitions may come from textbooks, dictionaries, or some other publication. I've seen unique yet similar definitions for a common term, each of which provide insightful information not gleaned by the other. So it's certainly valuable for us to include other sources of def. Some may be more precise, technical, etc. than others.
We can also create definitions, have subject-matter experts provide input on subject matter concepts, etc.

From where the definitions come from, which ones, etc. are all question on the topic of definitions/descriptions. I think that's a topic we should get into. We can also ask if it's clear what the original intention of SWEET was with respect to descriptions of it's concepts--such as is it clear that it was intended to have definitions/desriptions, and/or def for every term, or are there some concepts or terms that should not have a definition (e.g., due to their generality, or variety of senses, or semantic drift, etc.)--and use that as guidance.

Just as the structure under skos:definition has rdfs:comment for wikidata descriptions/definitions, so it (or another structure can list more than one rdfs:comment or more than one skos:definition for these descriptions from elsewhere. I think that would be helpful.

There are also different types of definition and description that can be asserted, e.g., 'lexical def', 'description of...', etc.

@graybeal
Copy link
Collaborator

I recommend (a) including also additional definitions, and (b) not declaring the WikiData ones as the preferred or definitive one, in part because Wiki may have inaccurate, incorrect or unreliable definitions.

I agree that we should not declare WikiData as preferred or definitive.

And I agree it would be good to have more sources, but:

I do not think we should hold up this extremely good change while we wait for someone to generate another set of annotations from another source. Let's not make perfect the enemy of the good.

@lewismc
Copy link
Member Author

lewismc commented Jul 17, 2020

@rrovetto I'm unsure what to reply to you. What you've stated seems rather tangential to the contents of this pull request. I am looking for actionable input if you have any. Thanks

@graybeal

I do not think we should hold up ...

Agreed. This issue has already been a long, long time coming. Any review would be appreciated.

@lewismc lewismc linked an issue Jul 17, 2020 that may be closed by this pull request
@lewismc lewismc added this to the 3.6.0 milestone Jul 17, 2020
@cmungall
Copy link
Collaborator

I think this proposed way of doing this is natural and coherent. Of course, I prefer the axiom annotation model used in OBO, but I won't push this further.

I would advocate for the principle of DRY: use dc:source or prov:wasDerivedFrom, but not both

I'm not totally sure about rdfs:comment to connect the blank node to the definition string. I'm not sure what else to suggest without doing a bit of further research to see what others have done, but I'd advise putting some thought into this.

it's not clear to me if you intend to allow>1 def per class (do you intend to use shex/shacl to constrain?). If so I would strongly recommend a mechanism to designate the preferred definition (or restricting to one definition per language, but allowing unlimited alternate descriptions), but my opinions here may be stronger than others.

as an aside you may want to consider a standard turtle serialization to eliminate spurious diffs, and unneccessary blank node renderings, e.g as in:

:IndoorAirQuality rdf:type owl:Class ;
                  rdfs:subClassOf :AirQuality ;
                  rdfs:label "indoor air quality"@en ;
                  skos:definition _:genid6 .

_:genid6 dcterms:created "2020-07-17T10:55:59.639"^^xsd:dateTime ;
          dcterms:creator <https://orcid.org/0000-0003-2185-928X> ;
          dcterms:source <http://www.wikidata.org/entity/Q905504> ;
          rdfs:comment "air quality within and around buildings and structures"@en ;
          prov:wasDerivedFrom <http://www.wikidata.org/entity/Q905504> .

@lewismc
Copy link
Member Author

lewismc commented Jul 17, 2020

I would advocate for the principle of DRY: use dc:source or prov:wasDerivedFrom, but not both

I agree here. I was trying to match what had been implemented in the recent cryospheric work. I would be happy to remove either one... any preferences folks?

it's not clear to me if you intend to allow>1 def per class

At the ESIP meeting in January we discussed only having one skos:definition. I am onboard with that,

as an aside you may want to consider a standard turtle serialization to eliminate spurious diffs, and unneccessary blank node renderings

I don't particularly like the way OWLAPI Java wrote the data with blank nodes... is that the issue here?

Thanks @cmungall

@graybeal
Copy link
Collaborator

it's not clear to me if you intend to allow>1 def per class

At the ESIP meeting in January we discussed only having one skos:definition. I am onboard with that,

my recollection from an exchange, that I thought more recent than that (no idea where, sorry—Semantic Cluster, or GitHub ticket), was that we wanted the flexibility of allowing multiple definitions, so long as they were not seriously contradictory. I think there are good arguments for this, and no convincing counter-arguments that should rule out providing multiple definitions. I don't think the community has weighed in on this. And per my previous comment, I don't think we should delay this change in order to make a final decision on that.

In other word, don't preclude additional definitions. If we decide to provide multiple definitions, we can make the decision then about whether we want to consider one of them authoritative.

Multiple definitions can be in different languages. If they are from the same source, they will be embedded within the context above. If they are from a different source, they will have their own entry. This seems straightforward and intuitive to me.

If so [multiple definitions] I would strongly recommend a mechanism to designate the preferred definition (or restricting to one definition per language, but allowing unlimited alternate descriptions),

These strategies are important if the definition is meant to be normative (or authoritative, if you prefer). I claim (rather vigorously if challenged) these definitions are not and can not be normative/authoritative, they are strictly informative, and therefore have equal weight.

As inspection of parallel definitions will quickly establish, there are subtle differences in definitions from different sources that are both informative, while being complementary or subtly contradictory. The subtle contradictions are incredibly value for understanding the concept, and SWEET will never be a system used for heavy reasoning unless it takes on a totally different form. If there are major contradictions among definitions, then someone(s) will have to choose (or annotate, in some cases) one or more to clarify what SWEET means by the concept.

As a dictionary of definition sources, SWEET could prove immensely popular.

as an aside you may want to consider a standard turtle serialization to eliminate spurious diffs, and unneccessary blank node renderings, …

love how readable that example is! don't know what it means to 'consider' it—just that we start working with that as the standard format in order to gain the readability and diff improvements?

I was trying to match what had been implemented in the recent cryospheric work. I would be happy to remove either one... any preferences folks?

They seem subtly different to me, so my preference is to understand why the cryospheric people used both, then decide. Maybe it was that prov:wasDerivedFrom is a clear statement of provenance (useful), while dc:source feels more like a citation (differently useful). (It seems to me they could be different in some cases, but in your application they won't be, so dc:source feels a bit more precise if you chose one.)

@lewismc
Copy link
Member Author

lewismc commented Jul 18, 2020

was that we wanted the flexibility of allowing multiple definitions

I wasn't aware of that/can't remember being part of those conversation(s). Maybe this is something we can raise with the Semantic Harmonization cluster?

I don't think we should delay this change in order to make a final decision on that.

+1

just that we start working with that as the standard format in order to gain the readability and diff improvements?

The way the Turle is written is not configurable afaict. Writing the blank nodes like that is the only way I could find.

Again, RE: dcterms:source instead of prov:wasDerivedFrom in this case, I think that makes sense @graybeal.

@dr-shorthair
Copy link
Collaborator

dr-shorthair commented Jul 18, 2020

  1. I agree with pretty much all that @graybeal wrote above
  2. +1 to using Dublin Core where it has the right semantics, PROV only for the more complex cases.
    But note that DCMI recommends the dcterms: namespace in preference to the dc: one - While the /elements/1.1/ namespace will be supported indefinitely, DCMI gently encourages use of the /terms/ namespace.

Note that I am a big fan of PROV, but in its place.

@lewismc
Copy link
Member Author

lewismc commented Jul 18, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use wikidata to provide skos:definition to owl:Class'es
5 participants