Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing collection for objects #16

Closed
tobiashreiter opened this issue Jul 11, 2017 · 21 comments
Closed

Missing collection for objects #16

tobiashreiter opened this issue Jul 11, 2017 · 21 comments
Labels

Comments

@tobiashreiter
Copy link
Collaborator

We were hoping to have our item records be associated with their parent collection records. This was discussed as possible when we had our meeting with Rob back in April.

Based on the diagram here: http://linked.art/cookbook/getty/photoarchive/ it looks like the mapping could be: p46i_forms_part_of -> E19_Collection; We had more details about how to do this in the cheatsheet:

  /* Not sure this is a legal designation, but there must be a way*/
  /* Collection Record */
  "forms_part_of": {
      "@context": "https://linked.art/ns/context/1/full.jsonld", 
      "id": "https://linked.art/example/collection/12", 
      "type": "PhysicalObject", 
      "label": "$Item_Collection:COLLECTION", 
      "foaf_homepage": "$Item_Collection:COLLECTIONURL",
      /* Titles */
      "title": 
        {
          "id": "https://linked.art/example/Title/1", 
          "type": "Title", 
          "value": "$Item_Collection:$COLLECTIONTITLE, Item:DISPLAYDATE", 
          "classified_as": "aat:300404670"
        },      
        "identified_by": {
          "id": "https://linked.art/example/identifier/3", 
          "type": "Identifier", 
          "label": "Local Database Item ID Number", 
          "value": "$Item_Collection:COLLECTIONID", 
          "classified_as": "aat:300404626"
      },
      /* Branch classification based on Item_Collection:MEDIUM */
      "classified_as": "aat:300027772 (corporation records)| aat:300028037 (personal papers)"
@tobiashreiter
Copy link
Collaborator Author

We consider this to be a blocker, although we recognize that no one else is using this.

@VladimirAlexiev
Copy link
Member

VladimirAlexiev commented Jul 19, 2017

The following Item_Collections.xls fields are not mapped:

  • COLLCODE: mnemonic identifier, eg mccoesth. Map to another p1_has_identifier
  • FKCREATORPERSONID: person who created it.
  • FKCREATORINSTITUTIONID: institution who created it
  • FKCREATOREVENTID: event for which it was created. Only 4 values, and 3 of them are not present in Item_Events.xls (48 4 12)
  • FKCOCREATORPERSONID: secondary person who created it
  • FKCOCREATORINSTITUTIONID: secondary institution who created it
  • FKCOCREATOREVENTID

I think the collection type should be crm:E22_Man-Made_Object or crm:E78_Collection crm:E19_Physical_Object but won't argue it.

I propose to map the CREATOR fields like this:

<collection/(COLLECTIONID)> crm:P108i_was_produced_by <collection/(COLLECTIONID)/production>.
<collection/(COLLECTIONID)/production> a crm:E12_Production;
  # usually only 1-2 of these fields will be populated:
  crm:P14_carried_out_by <person/(FKCREATORPERSONID)>, <person/FKCOCREATORPERSONID>;
  crm:P14_carried_out_by <institution/(FKCREATORINSTITUTIONID)>, <institution/(FKCOCREATORINSTITUTIONID)>;
  # ignore FKCREATOREVENTID, FKCOCREATOREVENTID

@tobiashreiter
Copy link
Collaborator Author

I definitely prefer E78. This is also what linked.art recommends: http://linked.art/cookbook/getty/photoarchive/ (see diagram)

@workergnome
Copy link

workergnome commented Jul 19, 2017

linked.art recommends E19 for aggregates. CRM has E78, but @azaroth42 and I are both cranky about the scope notes, and that it doesn't really add much other then the ability to record the curator, which is only one of many possible roles for people who could be involved with a collection.

Again, this is about the physical aggregates of objects. The intellectual aggregation isn't something that linked.art has tackled yet—it's a bigger, complicated issue that we should tackle at some point, but...it's a bit, complicated issue.

@tobiashreiter
Copy link
Collaborator Author

I have two concerns about using the E19. For one, it seems like an incredibly vague term. Second, from an ontological standpoint, it might be incorrect, because a collection isn't really a physical object -- it's a whole bunch of physical objects (items in a box, and then one or more boxes).

I think E19 makes sense when you're talking about the physical components of a man-made object, but the reverse seems weird.

How is the relationship between the item and the collection mapped? Is it a forms part of relationship or is it a part relationship. If it's a part relationship, that might make the mapping a little trickier.

@workergnome
Copy link

Talked about this in person—I think that we're still looking for the entity that is "an intellectual collection of physical objects".

Summoning @azaroth42, once he's back from his leave.

@VladimirAlexiev
Copy link
Member

@tobiashreiter CRM considers "a whole bunch of objects" also an object, whether those form a system (object parts) or are just put together (objects in a collection).
What difference do you see between "forms part of" and "part"?

@tobiashreiter
Copy link
Collaborator Author

If the system really sees objects this way, then I think I'm comfortable with just having objects inside of objects, but I still somehow prefer using man-made object instead of a pure physical object (the collection is just as surely manmade as the individual items).

As far as forms part of vs part, I was basing this off of the linked.art documentation on this page: https://linked.art/model/base/

However, I think the cidoc-crm term is "has part". For us, it's probably easier to model items with a forms part of property, rather than collections with has part properties.

@workergnome
Copy link

I think that the whole has_part part vs. object forms_part_of collection is something that's going to cause us headaches in the future, for sure. Given a system that allows for inferencing of inverses, of course, it's not going to be a problem, but many of the systems that people use don't support that, or don't instantiate it, and so we have to figure out if that's going in the model, and so we should explicitly record both directions, or if there's another way to do it.

Regarding the E22 s. E19, please see this exhaustive (and perhaps exhausting) thread on the CRM-SIG mailing list: http://lists.ics.forth.gr/pipermail/crm-sig/2017-April/002994.html.

@tobiashreiter
Copy link
Collaborator Author

I guess one question: linked.art seems to, in multiple cases, suggest a vocabulary for describing has_part and forms_part_of relationships. Is anyone else actually using that vocabulary to describe their objects? Would they be forbidden from doing so even though the documentation allows it?

If the concept of part is allowed, I don't see how our use of it differs substantially from the documented form, even if it might be on a larger scale than the default usage.

Thanks for sending the thread. Based on what I read, it seems to suggest that we could definitely describe our collections as E19 (physical object as an aggregate of other physical objects), and possibly E22 as a subclass of E19. However, the definition of E78, Curated Holding, specifically mentions archives as one of its instances. Since our individual collection records seem to conform to the particulars of the E78 definition, I don't think it would be a stretch for them to be considered that, even though I understand if the linked.art model doesn't want to take that on.

@VladimirAlexiev
Copy link
Member

VladimirAlexiev commented Jul 24, 2017

@workergnome @azaroth42: Look at http://personal.sirma.bg/vladimir/crm-graphical/#cidoc_class_hierarchy: Things are split into Objects, Features (eg scratch or stamp on an object) and Collections. So Collection is implicitly considered not an Object: use E24 Thing if you don't want to use E78.

I'll read the exhausting discussion, but I wouldn't take too much to heart whatever arguments saying that a museum collection is not a CRM Collection (what the heck it is then!). The renaming of E78 to Curated Holding is still not official (may not happen), and AAC has regenerated its data many times, so I don't see this as a strong argument against using E78.

@workergnome "whole has_part part vs. object forms_part_of collection is something that's going to cause us headaches in the future, for sure": I agree with your "Inverses are bad" argument, the PROV people had similar arguments.

But in ResearchSpace we had another kind of trouble: conflating "sub_object forms_part_of object" with "object forms_part_of collection". When circumscribing the data of an object we walked links (we do this in GVP with a specific construct query, we didn't have such smarts at the time), and you can see that you need to walk the former but must not walk the latter. IMHO these are different kinds of parthood: a sub_object is an owned part of an object, but a collection is a more loose grouping.
Your not using E78 just aggravates such conflation.

@azaroth42
Copy link

Thanks, but the issue is not that Museum collections are not E78s, it's that other things that are sets of objects are not E78s. Art Dealers do not have collection plans but have sets of objects. A chess set is not a Collection, and so forth.

This is not some thing that we're going to change, unless the CRM changes the scope notes for E78. Given the almost certain name change (it's in the 6.2.2 documentation) towards emphasizing curation, that seems extremely unlikely.

There are two options: E19 or some non-CRM class. We went for E19.

@tobiashreiter
Copy link
Collaborator Author

I do want to point out that within the definition of E78, it does identify archives as instances of E78 entities.

From the 6.2.2 version of the CRM:

Typical instances of curated holdings are museum collections, archives,
library holdings and digital libraries.
http://www.cidoc-crm.org/sites/default/files/2017-01-25%23CIDOC%20CRM_v6.2.2_esIP.pdf#page=78

Since this definition would work for most members of the AAC, I'm wondering what the resistance is to it. Is it that you don't want to model both sets (e.g. that chess set) and "collections" (the Romare Bearden Papers) within the linked.art model, and want to choose just one?

My understanding is that part of the challenge is that the CIDOC CRM doesn't want to model sets generally within the model, because they're interested primarily in individual things.

@azaroth42
Copy link

Yes, exactly. If you don't need to worry about the indistinct line between E19 and E78, and you can't use E78 for everything ... there's only one CRM-specific way forwards.

For example, if a photograph collection is acquired as a whole from an art dealer by a museum, a set of documents of an artist by an archive, or just a collection of paintings from a collector, they start off being E19 (they're not "curated", they don't have a "collection development plan") and then magically and identity-destroying-ly become E78s. That's (in my opinion) completely ridiculous that they should have to significantly change classes just when the ownership changes. It's also ridiculous that some should be E19s and some E78s based on their provenance of whether they were acquired as a whole, or put together by the memory institution.

So not only do I disagree that it really does work for most members of the AAC, I don't think it works for anyone.

@tobiashreiter
Copy link
Collaborator Author

a) I don't think there's any identity destruction happening here. The email thread clearly discussed that the CRM's potential for multiple-instantiation meant that you could have a set of objects described as either being part of a collection or part of a physical thing, and that these descriptions could live together side-by-side. The E78 designation would just create a new set of relationships, not necessarily kill the E19 -- unless I'm really misunderstanding the direction of that converation.

b) I think whether a "collection" was acquired as a whole or put together by the institution, there is still a curatorial agency happening. At the Archives (and I assume this is true of other institutions), we make decisions about which parts of a collection to accept or reject when acquiring them. If we decided to keep an unabridged collection, down to duplicates of photos and photocopies of popular magazines, then that would reflect a curatorial act. And, even if an organization has a policy of always accepting every part of an acquired collection, that still reflects a curatorial decision (in my opinion). I guess one could argue that if someone snuck into a museum in the middle of the night, dropped off some art, and no one noticed for years, then that wouldn't count as a curated holding, but as soon as the museum knows it's there, choosing to keep it reflects curation on the part of someone. Maybe I'm using the word broadly, but I also don't know why the CRM would want it to be used in too narrow of a sense. It's own definition mentions museum collections as being instances of E78 -- I don't think they would insist that only collections that were put together by the institution should count. If this has been discussed on the list, I'd be interested in seeing that thread.

c) Finally, if it doesn't work for anyone in the AAC, then it must be a non-existent entity. I can't think of a curatorial holding that is so distinct that it doesn't exist for the 14 members of our consortium, but that it does exist for some other museum somewhere. Again, I haven't specifically followed the discussions involved in the evolving nature of this entity, but if a museum collection or an archive are both instances of a curated holding according to its definition, then why couldn't these apply to our institutions?

@azaroth42
Copy link

Multiple instantiation is significantly complicated in most languages, and makes it much harder to process an instance into an in-code object to work with. Thus it should be avoided whenever possible. If MI were a consistent pattern, then there would be many fewer "join" classes that have no properties, just two parents. Consider Production and Destruction, versus Beginning of Existence / End of Existence. E78 is not a descendent of E19, they join at E18. So it's not that E19 can be dispensed with, thus making the MI mandatory, for no good reason.

Regarding (b) this is all true at (most) memory institutions, but it is definitely not true universally. Even if the objects are part of a private "collection" (meaning E19) and on permanent or temporary loan to the museum, that's still an E19. It's not museum or archival "collections" that are the main concern, it's everything else. The CRM SIG agreed that an art dealer's collection is not an E78.

So for (c) E78 can apply to most sets of objects that are put together by memory organizations, but as soon as you look at provenance, they fall very very short and E19 is the only way to describe them. For example, there are many Photographic Archives (per the PHAROS project) where the photographs have come from the "collections" of individuals and dealers (being E19s), as well as the "collections" of museums and archives (being E19s or E78s).

@workergnome
Copy link

I think the issue with E78 is not that it isn't a useful concept, but that it's an unnecessarily-privileged concept. An archive is NOT the same thing as a museum collection, which is not a library collection, which is not a gallery sale, which is not an exhibition. All of these are valid "groupings of objects into intellectual whole, which can be spoken of as a collection", but only one of these has a specific CRM class.

The concept of an 'intellectual grouping' is a real data-and-use-case driven need that we have to deal with. @tobiashreiter, your arguments are valid in that a E78 is an appropriate class for your collection, but the issue that it creates is that any consuming tool that uses this data will then have to handle multiple patterns that address the same base need. I hear @azaroth42 asking for a single consistent way to do this, and I agree that a single pattern is absolutely what we need. @VladimirAlexiev thinks that we should just use E78 as a "generic collection", and ignore the scope notes. @azaroth42 thinks that it's a better pattern to use E19. I think that whatever we need to do has to handle more explicit levels of detail than either of those classes provide on their own, because things will be organized (and de-organized) into multiple types of collections (often simultaneously) over time.

I also think the multiple instantiation thing is something that we're going to have to deal with at some point. It does make certain software implementations much more complicated, but I think that CRM (and Linked Data as a whole) doesn't allow us to avoid it. These entities are going to be reused in many contexts, and are going to need to be considered different things in different contexts.

From a publishing standpoint, it doesn't matter to me if you add anything that you like—it's just that the consuming applications will ignore things (like E78) that they don't understand. From the point of view of the AAC, I'm strongly advocating for a standard, reusable pattern, and since we're also working with provenance, and archives, and private collections, and exhibition history, a joining 'entity' that can be typed is what I need, and I think tending to the most generic version that meets that need is the way to go.

Finally, I think that @VladimirAlexiev's point about:

"sub_object forms_part_of object" with "object forms_part_of collection".

is something that we really, really need to keep in mind. That's totally going to bite us if we're not careful.

@VladimirAlexiev
Copy link
Member

not "curated", they don't have a "collection development plan"

IMHO, don't read too much into such stuff. The scope note mentions a museum collection, doesn't it?
There's more than enough wooden philosophers on the CRM mlist.
By the same token, Steven Stead said that "John" is not an Actor Appellation whereas a number may be. Which is complete mumbo-jumbo.

a private "collection" (meaning E19)

Why a private collection would mean E19? First, rich collectors have curators to advise them. Second, I don't think a curation plan needs to be in writing to exist, does it?

an art dealer's collection is not an E78

I hope you did not agree with them ;-) Surely he acquired them according to a plan: to make a profit?

E78 is not a descendent of E19, they join at E18. So it's not that E19 can be dispensed with, thus making the MI mandatory, for no good reason.

The reason is very simple: a Collection is not an Object, just like a Feature is not an Object. No MI is necessary: use E78 for Collections, and E22 for Objects.

@azaroth42 thinks that it's a better pattern to use E19

You cannot use E19 because E78 is not E19: you have to use E18. Look at the class hierarchy: Things break down into Objects, Features and Collections.

"sub_object forms_part_of object" conflated with "object forms_part_of collection" going to bite us if we're not careful.

Here's another nasty bug that makes silly relations between all objects of the same collection: https://confluence.ontotext.com/display/ResearchSpace/FR+Implementation#FRImplementation-BUG

Aside: CRM does include many unnecessary classes in a global hierarchy and would benefit a lot from Mixin classes. Eg see https://www.slideshare.net/valexiev1/largescale-reasoning-with-a-complex-cultural-heritage-ontology-cidoc-crm-slides#slide=14 for some volumetrics on the British Museum collection: 37% of all statements are type statements (after RDFS reasoning) and 2M museum objects have produced 17M E72_Legal_Object, and the BM didn't have a single rights statement!

Cheers!

@VladimirAlexiev
Copy link
Member

erlangen-crm/ecrm#2: the ECRM people are imho rushing with making the E78 renaming before an official CRM version appears

@VladimirAlexiev
Copy link
Member

And it says: "Currently there are some really rough developments (e.g. deletion of all subclasses of E41)".

@tobiashreiter
Copy link
Collaborator Author

In the end, I implemented this as as an E78, still not being quite convinced that our data didn't represent authentic collections or that this wasn't worth being specific about. If this needs to be modified to an E19, I can change this, and probably add an AAT classification for archival records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants