Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need to manage the updates of records (and Named Graphs?) #14

Open
chin-rcip opened this issue Oct 31, 2019 · 6 comments
Open

We need to manage the updates of records (and Named Graphs?) #14

chin-rcip opened this issue Oct 31, 2019 · 6 comments
Assignees
Labels
legal This issue implies legal aspects modeling This issue concerns how we organize the information semantically

Comments

@chin-rcip
Copy link
Owner

chin-rcip commented Oct 31, 2019

Update of records

In the v.1.5 of the Target Model, the history of record (E73 Information Object) in only modeled by the E65 Creation event, and there is no possibility to document the history of the different versions of this record, which is a problem.

With CIDOC CRM

With CIDOC CRM, there is no way to render those updates, as the E11 Modification class refers only to the modification of E24 Physical Man-Made Thing.

With Prov-O

The Prov-o ontology, used to describe the named graph, can also be used to document the update of the entity. with the property prov:wasRevisionOf it creates a link between the creation version and the updated version of the record.
Untitled Diagram

With CIDOC CRM-Dig

With CRM-Dig, if we instantiate the record and named graph into digital object, we could add the event of modification. Nonetheless, that would create 2 entities for the record and named graph, the original one and the modified one. I'm not sure it would be the best way to model it.
Untitled Diagram-2

Named Graphs

The Named Graphs generation are documented with the prov:Activity->prov:generated->prov:Entity(Graph). But we could also document the creation and modification of the whole graph with a similar pattern that for the record. Do we need to document the updates of the Named Graph though?

@chin-rcip chin-rcip added the modeling This issue concerns how we organize the information semantically label Oct 31, 2019
@stephenhart8
Copy link
Collaborator

stephenhart8 commented Jan 15, 2020

When looking at the AAT data, I found that they also uses PROV-O to track the creation and modifications of each Concepts.

They chose 2 simultaneous patterns, with PROV-O Refinements (https://www.w3.org/TR/2013/NOTE-prov-dc-20130430/#term_modified) and Dublin Core.

By looking at their approach, it seems I made a mistake in my earlier proposition. I've linked the creation Event to one E73 Information Object, and the Modification to a different E73, even if it is the same E73 (either Named Graph or Record).

Following what the Getty did with the AAT, I would propose the following:

Updates_Named_Graphs_Records

I am not sure if the property between the prov:Modify and the E73 is prov:wasGeneratedBy. In the documentation, it seems it should be that, but it seems a bit strange to me.

@KarineLeonardBrouillet
Copy link
Collaborator

Notes on verbal meeting 2020-02-17

It might be sufficient for what we are meaning to track.

Made a distinction between a persistent resource and a volatile dynamic resource (e.g. making a software program with versions) you want to be able to refer to each version (e73 information object but is not ex nihilo) but everything is linked back to the volatile object (the software program). Will we keep copies of chunks of metadata or do we only want to know when was the last update.

When wanting to have all the information about a specific named graph, do you need a link between all the information objects?

The museums will need the older versions of their data at some point for sure. If it has to be in the LOD environment is another question.

The named graph could be online with the modified events and the copies of data are in our repositories. Or all could be documented in the model with the multiple e73 instances. This will multiply triples as well.

Illip. Museums will need to query older versions.

Habennin. will put links to Parthenos. How to manage massive integration long term; can create meta metadata for the data that we are transforming. Would require a separate triplestore with pointers and policies to handle them.

@VladimirAlexiev
Copy link

VladimirAlexiev commented Mar 3, 2020

In the Getty vocabs we went for simplicity foremost, because PROV is quite complicated, e.g see http://vocab.getty.edu/doc/#dct_modified

This is a complex topic, so here are just a few considerations:

  • I thought from other issues that you intend to have one named graph per museum, but that is too large granularity. Is best to have one named graph per record (or unit of work) I.e. the individual data pieces that typically are moved in an aggregation scenario.
    • then you can connect the individual entities to the museum dataset using some partOf relation
  • as for keeping older versions, it's not a simple question because of
    • rapidly increasing volume
    • the need to do queries, faceting and some inferences (eg influence, counting) on the latest versions only
  • regarding dedicated implementations ,

@stephenhart8 stephenhart8 added the legal This issue implies legal aspects label Mar 20, 2020
@stephenhart8
Copy link
Collaborator

stephenhart8 commented Mar 30, 2020

Thank you @VladimirAlexiev for your input.

ProvO is indeed a bit complicated and creates a lot of triples, but it's quite similar to CIDOC CRM. Would it be an option to both have ProvO and dct?

For the question of where to have the named graph, I have created the issue #45. I would very much like your input on that important subject.

During our latest discussion (on the 23th of March), we came to the conclusion that it would be best not to publish the older version of the datasets, and to store them in a repository at CHIN (and available if someone asked them for historical purposes).

We need to investigate those implementations, thank you for the information!

General question: Does my pattern proposed on the 15h of January make sens?

@stephenhart8 stephenhart8 changed the title Issue #14 - We need to manage the updates of records (and Named Graphs?) We need to manage the updates of records (and Named Graphs?) Apr 6, 2020
@illip
Copy link
Collaborator

illip commented Dec 17, 2020

Regarding #45, we have decided to go with a Named Graph per dataset. We know need to identify clearly the updating process.

@illip
Copy link
Collaborator

illip commented Jan 7, 2021

During our Semantic Committee meeting on the 2021-01-07, while we were discussiing Issue #10, the update came up since in some use cases, keeping track of more than two roles (creator and provider) could be necessary in order to offer the possibility of documenting updates done, for instance, by an artist regarding his/her data in the museum's dataset.

This highlights the need for having two "categories" of updates:

  1. A way to do snapshots of the whole dataset to keep track of its evolution. Someone who would be interested in identifying the changes will have to compare these versions.
  2. When a stakeholder documents the reason for a data modification, we should have a pattern to answer this need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
legal This issue implies legal aspects modeling This issue concerns how we organize the information semantically
Projects
Development

No branches or pull requests

5 participants