Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax requirement for 'content' directory #341

Closed
julianmorley opened this issue Apr 19, 2019 · 16 comments

Comments

Projects
None yet
5 participants
@julianmorley
Copy link
Contributor

commented Apr 19, 2019

Moab requires each version directory to contain a 'data' and a 'manifests' directory, which makes an existing Moab object non-compliant with 0.2 and unable to become so.

I propose we relax the spec to enable a version directory to contain any number of directories of any name, but only able to contain two files in the root - inventory.json and inventory.json.sha512. This enables existing Moab objects to become OCFL-compliant by adding the inventory.json and sidecar.

@julianmorley

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

See draft PR #342

@zimeon

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

Previous discussion where we arrived at content was #230 -- @julianmorley's proposal to allow any directories in v# would need the minor modification that they must be named so as to not conflict with inventory.json and inventory.json.sha512 files.

@rosy1280

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

see comment from @ahankinson on the pull request:

I would still like to see a recommendation for a content directory. The point of moving the version content into a sub-directory was to ensure we did not mix the administrative workings of OCFL with the content being managed.

This change is introduced to maintain backwards compatibility with Moab (which is desirable), but perhaps we can tighten this up to say they should only be 'content', or 'data' and 'manifests' for backwards-compatibility?

Opening it up to arbitrary directories may otherwise risk users misinterpreting the relaxing of this rule and storing their object content in the root of the version directory.

@rosy1280

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

If I understand the comments above the language would be worded something like:

Version directories SHOULD contain one sub-directory called content. Other sub-directories MAY be present to assist with backward compatibility of other preservation structures. There MUST be no files contained in a version directory, other than an inventory file, and an inventory digest.

@rosy1280

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

i also wonder if @dbrower has any feedback...

@zimeon

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

I think the problem I have with saying just three sub-directory names are allowed (content, data, manifests) is that either we are just dealing with backward compatibility for one system we happen to know about (Moab) or we have made some odd arbitrary choices that imply some meaning but don't really have it. I think if we open it up it should be more open, I would like something along the lines of:

Version directories MAY contain one or more sub-directories in which the version content is stored. These sub-directories may have any name but SHOULD NOT be present unless they include content files. In the absence of other constraints, the directory name content is recommended. There MUST be no files as children of a version directory, other than an inventory file, and an inventory digest. The version content sub-directory names MUST NOT conflict with the inventory and inventory digest names.

@zimeon zimeon added this to the Beta milestone Apr 23, 2019

@ahankinson

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

Version directories MAY contain one or more sub-directories in which the version content is stored. There MUST be no files as children of a version directory, other than an inventory file, and an inventory digest. These sub-directories may have any name but SHOULD NOT be present unless they include content files. In the absence of other constraints, the directory name content is recommended. The version content sub-directory names MUST NOT conflict with the inventory and inventory digest names.

@julianmorley

This comment has been minimized.

Copy link
Contributor Author

commented May 8, 2019

OK - how about something like this:

  • The OCFL object must contain the inventory.json, inventory.checksum and one directory for preserved content if content is present.
  • The OCFL object should not contain other directories (the validator will ignore anything else) but it's not spec-breaking if other directories are present. This allows us to retain Moab's 'manifests' directory in the object root without breaking spec.
  • The OCFL object must not contain any other files in the object root (no change from current spec).
  • The name of the content directory may be defined in the inventory.json file, and defaults to 'content' if not explicitly set. This allows us to use 'data' as the content directory and do in-place migrations of content from existing Moab objects.
  • The content directory name must not change between versions of the same object.
  • The content directory name should be consistent across all objects in the repository.

I think this resolves @ahankinson 's concerns about the difficulty of validating an object with an arbitrary number of directories in the object root - we explicitly ignore anything that's not either 'content' or whatever the custom content directory name is set to in inventory.json, and generally discourage other directories.

@ahankinson

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

I would be OK with that -- any suggestions for how we add this to the inventory.json file?

@zimeon

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

I'm basically OK with this but have two issues:

  1. I'm not very comfortable with saying that any other directories MAY be present but are ignored in the version directory. I understand that a validator MUST ignore such directories. What should other tooling do? Does a copy or sync operation ignore them, or copy them? Are they somehow part of the object or just cruft to be ignored?

  2. I don't think we need the last bullet at all -- I don't see a need for the content directory name to be consistent in every object across a storage root. If it is defined on a per-object basis then that is all we need say. Suggesting greater consistency might encourage a bad assumption of consistency.

@julianmorley

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2019

@zimeon The 'may be present' bit is to allow existing Moabs to be converted to OCFL; they'll have a non-tracked 'manifests' directory in them. I'd say those directories are 'cruft to be ignored', and any OCFL-compliant copy/edit/sync tool must ignore them: if the spec explicitly says ignore them, OCFL tools must ignore, right?

I waffled a bit on whether or not the content dir should be mix-n-matched across the storage root; you make a great point about implying consistency where non exists, so I'm fine with dropping it.

How about this?

  • The OCFL object must contain the inventory.json, inventory.checksum and one directory designated for preserved content if content is present.
  • The OCFL object should not contain other directories.
  • The OCFL object must not contain any other files in the object root.
  • OCFL-compliant tools (including any validators) must ignore all directories in the object root except for the designated content directory. OCFL validators may log the presence of undesignated directories, but must not fail validation for that reason.
  • The name of the designated content directory may be defined in the inventory.json file. If not explicitly defined in the object's inventory.json, the name of this directory must be 'content'.
  • The designated content directory name must not change between versions of the same object.
@julianmorley

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2019

@ahankinson I'd say an optional key. Possible names:

  • designatedContentDirectory
  • designatedDirectory
  • contentDirectory
  • directory
@zimeon

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

I'd go with contentDirectory ... that specifies the name of the content directory (to be added to terminology) which has the default name content.

@birkland

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

Relaxing the requirement that all objects use a consistent name eliminates the need for a top-level file that states the root's expectations (akin to ocfl_layout.json), so that seems good.

Would client implementations be expected to provide (in their API) an option for the desired name of the content directory as specified by the client at time of object creation? Likewise, would clients be expected to provide an API for creating and writing to non-content directories (as they might for logs)?

Although this is not the intent of such a feature, it does seem that this would allow clients to create directories inside version directories for their own purposes (e.g. using a subdirectory as a staging location, or as part of a locking scheme, or for maintaining version-scoped logs), etc.

Lastly, the language changed between comments. Is this an error?

The OCFL object must contain the inventory.json, inventory.checksum and one directory designated for preserved content if content is present.

At present, the ocfl object MUST contain an inventory.json and checksum, and version directories:

OCFL Object content must be stored as a sequence of one or more versions. Each object version is stored in a version directory under the object root.

I think this proposal meant to say that a version directory must contain one directory designated for preserved content, if content is present? Or am I reading it wrong?

@julianmorley

This comment has been minimized.

Copy link
Contributor Author

commented May 10, 2019

I think this proposal meant to say that a version directory must contain one directory designated for preserved content, if content is present? Or am I reading it wrong?

You're correct; just sloppy writing on my part.

@rosy1280

This comment has been minimized.

Copy link
Contributor

commented May 21, 2019

closed by #349 and #350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.