New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corruption Recovery #15
Comments
Perhaps the first question is how one can understand the state of the OCFL object? Then, what mechanisms might avoid the possibility that the corruption could affect the integrity of a version previous to the one being added? How could one revert the partial update to get back to clean state in order to re-run the update? |
One of the necessary tools for OCFL will be a validator, and so the state of an OCFL object would ultimately be one that is valid according to the spec. Of course, when writing a bunch of files to disk the possible failure states can range from "Connection to the NFS / S3 store failed" (i.e., relatively high-level) to "A disk array lost power and no battery was available to let it finish writing" (i.e., low-level). It may be that this is where the spec could specify a recommended order of operations for OCFL filesystems, e.g., take checksum, write file to disk, record checksum. This would let a validator know whether a) a file was written completely (matches recorded checksum); b) a checksum was recorded correctly (a file exists with a matching checksum recorded). I don't think we could enumerate all the possible failure states, but perhaps we could view the validation process as a bit like "fsck", where it could detect and alert the maintainer that something was wrong, giving them the ability to fix it. |
Yes, I think that given the critical place of the manifest/ |
F2F 2018.09.05: An object with a version directory and no record in the inventory is invalid, which, referencing #14, is not permitted. Specific automated or manual interventions are not prescribed and are not in scope. |
A lot of this is now covered in the implementation Notes on writing new versions now |
Editors' meeting 2023-09-22: OCFL, by its nature, cannot provide a strong notion of transaction. An application writing an OCFL object must manage that process and ensure that it completes creation of a valid object. On failure, there must be some cleanup and some ideas are document in the Implementation Notes - Clean up. Closing as out-of-scope. |
A power outage occurred when a software component was in the middle of writing an OCFL object, leaving the object in an ambiguous state. There should be mechanisms for recovering from various failure modes.
The text was updated successfully, but these errors were encountered: