Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usefulness of E55_Type #37

Closed
Flutifioc opened this issue Nov 30, 2019 · 11 comments
Closed

Usefulness of E55_Type #37

Flutifioc opened this issue Nov 30, 2019 · 11 comments
Assignees
Labels
conceptual This issue concerns a more theoretical question modeling This issue concerns how we organize the information semantically Update Documentation Improvements or additions to documentation

Comments

@Flutifioc
Copy link

In lots of places in the target model, we are using an instance of E55_Type. Cf Issue #29 with the three levels of type :

JPRiopelle - P2 -> Painter - P2 -> Profession - P2 -> Occupation.

I feel that it would be semantically more correct to say that JPRiopelle - profession -> Painter, with Painter an instance of the class Profession. We could even put Profession as a subclass of E55 Type, this way we could still say that:

JPRiopelle - P2 -> Painter - rdf:type -> Profession - rdfs:subClassOf -> E55_Type.

Regardless of the need of a third level (which I feel we don't, but it is a question for Issue 29), this seems more... RDF-esque ? than using two or three levels of E55.

In general, I feel that creating an instance of E55_Type should be done when the line between class and instance is blurred (typically, Painter : we can say that it is an instance of Profession, but that JP Riopelle is an instance of Painter). In this understanding, Profession has no place being an instance of E55. And in many places in the target model, the instances of E55 could probably be classes. The model would be far easier to understand. Am I wrong in my understanding ?

@stephenhart8 stephenhart8 added the conceptual This issue concerns a more theoretical question label Dec 5, 2019
@stephenhart8 stephenhart8 changed the title Discussion #1 - Usefulness of E55_Type Issue #37 - Usefulness of E55_Type Dec 20, 2019
@Habennin
Copy link

So... the overall issue is that E55 is used in many places because CRM is a general ontology and needs to be specialized. This means that you might put different kinds of category (e55) typically using p2 has type.

In order to solve this problem, LinkedArt proposes the pattern:

E1 -> p2 -> E55 -> p2 -> E55

Where the second E55 qualifies the first. This would help with dealing with reaching the classes programmatically. You would look for only the E55 types classifying this entity that were themselves classified as X.

E.g.:

E21 p2 E55 Painter p2 E55 Profession
E21 p2 E55 Mâle p2 E55 Gender

If you want to query the E55 for gender, you don’t want to return the E55 for profession and this will allow you to clearly distinguish.

I think what you argue instead is to specialize the property p2

E21 has profession E55 Painter
has profession isA p2 has type

This is also a valid solution. It is one that means generating a number of new properties. Because of the isA hierarchy they would all be reached by p2. That being said, you have to manage all these new properties. The argument in LinkedArt was that the first solution is a more repeatable pattern.

I too am not sure when and how a third level of E55 would help.
Generally what is in E55 should be skosified vocals, so the rest of the hierarchy should be handled within the vocab.

@Habennin
Copy link

To put it another way

CIDOC CRM keeps the ontology small by not covering classification work. A flexible way of extending CIDOC CRM without creating endless new classes but so that it has the semantic richness users need is to type classes. The business of classification is well handled in RDF by adopting SKOS. So wherever one sees a reference to E55 Type it is functionally equivalent to SKOS:concept. CIDOC CRM defers to SKOS as the well known and accepted system for encoding thesauri/vocabularies in RDF.

So let's take the example of occupation/profession (there's an argument there too) from the start. If we want to indicate that a person has an occupation/profession, then we are going to be classifying that person. So the basic CIDOC CRM statement will apply:

E21 Person -> p2 has type -> E55 Type
because of the above we can also write
E21 Person -> p2 has type -> SKOS:Concept

Following this pattern, we could instantiate :

http://viaf.org/viaf/15873 a crm:E21_Person;
rdfs:label "Pablo Picasso";
crm:p2_has_type http://vocab.getty.edu/aat/300411314 .
http://vocab.getty.edu/aat/300411314 a crm:E55_Type, skos:Concept;
rdfs:label "artist painter" .

or in 'colloquial' language:

Pablo Picasso has type Painter

Now the problem with the above is that we can have multiple reasons for classifying an instance of person. If we want to add to this and say Picasso has the state assigned sex of male, then we would want to say also:

http://viaf.org/viaf/15873 a crm:E21_Person;
rdfs:label "Pablo Picasso";
crm:p2_has_type http://homosaurus.org/v2/assignedMale .
http://homosaurus.org/v2/assignedMale a crm:E55_Type, skos:Concept;
rdfs:label "assignedMale" .

Now this is fine and good. But... now we want to support a query that helps understand who this Picasso fellows is. We want to query all the classifications that have been put on the individual and then display them somewhere, but in an ordered way: like the profession information in a profession display area and the gender information in a gender area. But there is nothing in these concepts themselves that says what we are functionally using them for. They are in a SKOS hierarchy, but this does not tell us what we are functionally using them for here. We are missing a meta type to tell us this.

This is the argument for putting a type on these types, generating meta types in the modelling in a standard way, so that developers can easily retrieve just the information they want. So we add:

http://viaf.org/viaf/15873 a crm:E21_Person;
rdfs:label "Pablo Picasso";
crm:p2_has_type http://homosaurus.org/v2/assignedMale .
http://homosaurus.org/v2/assignedMale a crm:E55_Type, skos:Concept;
rdfs:label "assignedMale";
crm:p2_has_type http://vocab.getty.edu/aat/300411835;
rdfs:label "gender (sociological concept)" .

This means we can put it all together and have:

http://viaf.org/viaf/15873 a crm:E21_Person;
rdfs:label "Pablo Picasso";
crm:p2_has_type http://homosaurus.org/v2/assignedMale;
crm:p2_has_type http://vocab.getty.edu/aat/300411314 .
http://homosaurus.org/v2/assignedMale a crm:E55_Type, skos:Concept;
rdfs:label "assignedMale";
crm:p2_has_type http://vocab.getty.edu/aat/300411835 .
http://vocab.getty.edu/aat/300411314 a crm:E55_Type, skos:Concept;
rdfs:label "artist painter"
crm:p2_has_type http://vocab.getty.edu/aat/300393201 .
http://vocab.getty.edu/aat/300411835 a crm:E55_Type, skos:Concept;
rdfs:label "gender (sociological concept)" .
http://vocab.getty.edu/aat/300393201 a crm:E55_Type, skos:Concept;
rdfs:label "professions" .

Which colloquially reads:

Picasso has type Painter.
Picasso has type Assigned Male.
Assigned Male has type Gender (sociological concept)
Painter has type Professions

and it should functionally allow to write a SparQL so that you can retrieve all types on an entity but sort them OR just retrieve one of the kinds of type on an entity and not all the type information. Ie retrieve the gender types on this entity only.

The declaration of the type on a type is not an attempt to go around or do something different than SKOS. If one knows the particular vocabulary that has been adopted, then there are probably much more interesting searches you can do exploiting the power of the broader/narrower of the particular vocabulary adopted. So the researcher could do fascinating gender analysis of artists through time (if they had the data) by a thorough understanding of homosaurus and its distinctions. At the basic level, however, within the CHIN data, the developer/researcher is able to pick out that some type/concept has been used for a particular function within the model (gender/profession and so on), thus facilitating their search and retrieval.

@KarineLeonardBrouillet
Copy link
Collaborator

KarineLeonardBrouillet commented Feb 25, 2020

Notes on verbal meeting 2020-02-17

Flutifioc: If for example we write down all the tags needed for the knowledge base we have our vocab and there will be others as well. How are we to link these together?

Habennin: The specification of the Target Model would be similar to linked art: for meta types (anywhere the model points to a type where a discussion would be interesting). A statement of the materials, for example, might be interesting to standardize or normalize without strict enforcement. Having to go out and choose would be meta types. For the meta-type it would be better to use a single vocabulary to have a constant reference point for the model so we can search for that. On a more theoretical basis, the discussion happens a lot in Parthenos who tried to integrate a number of datasets and ideally the fields would have been normalized despite messy or bad data or competing standards. FORTH is developing a tool called VisTA. There is also the development of OpenTheso. That could be a long term strategy. If the data is well-curated. Original fields and Enriched fields were done in Ariadne so that everything is reversible.
Linked Conservation Data Project are also thinking of using CIDOC CRM but are thinking primarily of organizing vocabularies.

Illip: Should we develop our own vocabularies or translate others?

Habennin: Aligning with the most likely and reliable in terms of science, scholarship and durability is important.

Habennin: we might need in some cases to rely on our own vocabularies. How to manage those extensions wasn’t clear.

Flutifioc: should there be a mapping of metatypes?

Habennin: The data can be translated to a type and the data value can be dumped as the value of E57 node in the mapping and eventual RDF and the tag would be a rule of CHIN. That E57 has type -- static URI to organize types. It could be CHIN’s, the AAT's, etc. Whatever we can have as a specification for programmers.
Semantically E55 = SKOS concept.

Flutifioc: Is there a formal relation such as same:as? Or subclass of?

Habennin: Official advice is to treat E55 as SKOS concept and some aspects of CRM do not even have to be there because they were there before. It is a rare area where SKOS does not cover the domain correctly.

Stephen: the important thing is to link the metatype to a vocabulary, but the type does not have to be… is that right?
Habennin: No. Regardless of whether the data is good or poor the value that comes across from the source material is the ground truth. It can be enriched but the metatype if making it available in the graph to find all instances that are about the metatype regardless of what skos they are in.

Flutifioc: Like this? (Drawing below)

image

Habennin: Yes. Even the occupation one can be contentious as not anyone can execute a profession. Occupation is a weaker term. Some of this is also discussed in linked.art.

Illip: We need to define precisely the metatype (e55) but we can be less restrictive for the types. It depends on the level of interoperability vs simplicity.
Strephen: Like illip says, it seems that the most important is to link the metatype to a structured vocabulary. For example, a museum could have a lot of terms for the gender, another museum just “m” of “f”, but the metatype “gender” must be linked to a structured vocabulary.

@VladimirAlexiev
Copy link

Quite agree with @Habennin : adding endless classes to CRM is not an option. Two considerations though:

  • another option for Persons is to say "member of group". It has been debated for ages ... it most cases I'd use P2, except culture/nationality because these groups themselves can be used as Creators (eg in ULAN you have "unknown Canadian" etc)
  • that second P2 (gender vs profession) is probably already present in the SKOS vocabulary as inScheme or broader. When it is not, you may be adding usage-dependent info to the concept. Eg is "lightbulb" an object type or a material?

@illip
Copy link
Collaborator

illip commented Mar 16, 2020

@VladimirAlexiev 1) Yes for culture and nationality we are reusing the groups patterns. 2) What do you mean by usage-dependent info? Do you see an issue if we use e55 -> p2 -> e55 statements to accomplish that?

@Flutifioc Are you still concerned with this issue of E55? or you understand the benefits of avoiding the endless classes even if it might be a little bit less semantic?

@Flutifioc
Copy link
Author

@VladimirAlexiev Why is it not an option ? What is the issue with complementing CRM with some classes that are relevant to our use-case ?

@illip Both. I understand the benefits of avoiding adding classes, even though we are not talking "endless", but some select few that seem very relevant to me, such as Profession or Work of art. But I'm still concerned, as, in my humble opinion, there are also benefits to adding such classes, and I'm definitely not as convinced as you concerning which option is best.

@illip illip added meeting needed modeling This issue concerns how we organize the information semantically labels Mar 20, 2020
@VladimirAlexiev
Copy link

VladimirAlexiev commented Mar 26, 2020

@Flutifioc Adding a few classes is ok, provided they conform to CRM compatibility principles (which means make subClasses, subProps, or long-paths to complement existing CRM props used as short-cuts).

  • Adding a bunch of classes like Painting, Sculpture etc is nok. Because then you hit a birthday cake that's inkjet-painted to resemble a 3d sculpture (of Boots) and you are stumped. (Actual object from the YCBA).
  • how's a Work of Art different from a MMO?
  • Profession as subclass of Group is ok, but you could just use a P2="Profession" to distinguish it from other kinds of Groups.
    • (BTW George Bruseker over at SARI argues that professions like "Ontologist" are not groups while nationalities like "Bulgarian" are, citing some silly aspect of the scope note of Group. I felt offended ;-)

@illip

usage-dependent info to the concept. Eg is "lightbulb" an object type or a material?

  1. Lightbulb is in the AAT Objects>Components hierarchy but in one Artwork I saw it used as "Material". So if you map that to CRM "made of material", by RDFS inference it would get CRM class "Material". Getty AAT don't have subclasses of Concept, but they have the hierarchical structure, and they would take exception if you "reclassify" their Lightbulb into Material.
  2. Most of AAT's Styles and Periods Facet are cultures/nationalities, eg Bulgarian (culture or style). But sometimes other concepts are used as the Style of an artwork, eg Russian Orthodox. According to the Getty, that's not a Style/Period but Associated concept>Religion.
  3. For me and you, "transgender male" is a gender. But for someone else, that might be a religion or a political statement, who knows.

So there is a bit of danger of adding P2 to a concept because of the way it's used in other data. I don't mean the originator of that concept (eg Getty) is going to sue you: I'm just saying be careful that the added P2 makes sense universally

@illip
Copy link
Collaborator

illip commented Apr 6, 2020

@VladimirAlexiev Thanks for the advice concerning the conflicts that might arise from this P2_has_type categorization. IMO, this categorization is specific to the context of our model, we are not saying that the external vocabulary hierarchy should be rethink, but that we need a specific association between our types and metatypes in order to build relevant SPARQL queries. That said, we need to be aligned with the concept meaning and to keep open world assumption in mind, you're right. I would also say that the role of P2_has_type is not to build a hierarchy like skos:broader. We use P2_has_type in order to specify another entity.

I would also like to keep track of some Linked.Art discussions regarding this topic:

Types of Types definition on their website
When to use Types of Types pattern
"How to make decision" proposal

Going back to the intial topic, CHIN would like to define a policy to define exactly when a situation requires the definition of new classes and properties.

@stephenhart8 stephenhart8 changed the title Issue #37 - Usefulness of E55_Type Usefulness of E55_Type Apr 6, 2020
@illip illip removed their assignment Apr 15, 2020
@illip illip removed this from To do in Version 2.2 (August 2021) Jul 7, 2020
@illip illip moved this from To do to In progress in Version 2.1 (December 2020) Dec 14, 2020
@illip illip added the Semantic Committee Issues to be discussed in an upcoming Semantic Committee meeting label Dec 14, 2020
@illip
Copy link
Collaborator

illip commented Dec 16, 2020

The current CHIN's proposal on this issue is:

  1. We will not create extra classes/properties when a P2_has-type -> E55_Type -> P2_has_type -> E55_Type pattern is sufficient. Like @VladimirAlexiev mentioned, it might be even more complex to map some data if we create new subclasses/subproperties. In addition, CHIN prefers not to expand the model by reusing a more repeatable pattern as @Habennin said.
  2. The metatypes will be used only when it is relevant in order to distinguish a E55_Type from another one attached to the same entity.
  3. CHIN will be careful in the metatype identification in order to accommodate different opinions about the targeted concept. We will also document our choices.

We will validate this proposal with our Semantic Committee on January 7th.

Our proposal regarding the three levels of E55_Type will be discussed in Issue #29

@illip illip added Update Documentation Improvements or additions to documentation and removed Semantic Committee Issues to be discussed in an upcoming Semantic Committee meeting labels Jan 7, 2021
@illip
Copy link
Collaborator

illip commented Jan 7, 2021

All the aforementioned items have been approved by the Semantic Committee on 2021-01-07.

@illip illip moved this from In progress to Done in Version 2.1 (December 2020) Jan 8, 2021
@illip
Copy link
Collaborator

illip commented Jan 8, 2021

A new section called Prioritization of E55_Type and P2_has_type over new classes and properties has been added to the Target Model.

@illip illip closed this as completed Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptual This issue concerns a more theoretical question modeling This issue concerns how we organize the information semantically Update Documentation Improvements or additions to documentation
Projects
No open projects
Development

No branches or pull requests

6 participants