Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preservation Metadata #843

Open
subotic opened this issue May 3, 2018 · 5 comments
Open

Preservation Metadata #843

subotic opened this issue May 3, 2018 · 5 comments
Assignees

Comments

@subotic
Copy link
Collaborator

subotic commented May 3, 2018

  • We need to calculate and store fixity information for Knora resources. This is needed for the data repository side of Knora, so that we are able to check and prove that resources were not altered or corrupted.
  • We need to automatically and periodically perform checks
@subotic subotic self-assigned this May 3, 2018
@lrosenth
Copy link
Contributor

lrosenth commented May 3, 2018 via email

@benjamingeer
Copy link

But we don't have versions of resources, only versions of values.

@subotic
Copy link
Collaborator Author

subotic commented May 4, 2018

But we don't have versions of resources, only versions of values.

Yes, we don't have explicit versions of resources, but implicitly (I think), any change to a value creates a new version of a resource.

Yesterday, I had a long conversation with @lrosenth. This is the summary in very broad strokes. This is just a first broad draft and we still need to discuss if it is feasible:

  • Checksum: to calculate the checksum of a set of triples, calculate the checksum of each triple and then use a combining function. This is what I have used in my PhD based on this paper: https://pdfs.semanticscholar.org/e497/56a0bf7bcf6ce4b033c4f5261b283d0be394.pdf
  • On every value change, we calculate the checksum of the resource. The checksum being a sum of the previous version plus the checksums of triples of the new value.
  • At the same time, a new ARK id is generated (resource ID + timestamp).
  • The resource IRI, ARK id, and checksum are stored away somewhere (separate graph maybe).
  • This fixity information is also replicated to a separate server, and from there only available read-only. We need to make sure, that fixity information is not only stored together with the data, where both could be manipulated at the same time.
  • The goal is to have a gapless log of every change to a resource backed by the checksum. We need to provide evidence, that a resource didn't change inadvertently over time.
  • I'm not sure, how this will work (if at all) if we make changes to the data model and need to change the data.

@subotic
Copy link
Collaborator Author

subotic commented May 4, 2018

I'm not sure, how this will work (if at all) if we make changes to the data model and need to change the data.

Ok, now I'm definitely sure that this will not work. Any change to the data model that requires changes to the data, will render all checksums invalid.

@lrosenth Do we need to make our life so hard and try to build a system that is at the same time a VRE and a Long-Term Data Archival Repository? Can't we separate those two? Basically, have an additional layer, which is read-only that stores the data and the checksums on every change, but allows us to recreate the repository for any point in time? Basically a "backup on steroids" solution. That way we could do whatever is needed for running the VRE in the upper VRE layer while being able to preserve any changes in the lower Repository layer.

@subotic
Copy link
Collaborator Author

subotic commented May 4, 2018

We also don't need to reinvent the wheel in regards to the data model for preservation metadata. The Library of Congres has a well-established standard called PREMIS for which they also have an OWL ontology.

@subotic subotic changed the title Fixity information Preservation Metadata May 24, 2018
@subotic subotic added this to the Backlog milestone Feb 7, 2020
@irinaschubert irinaschubert removed this from the Backlog milestone Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants