Merged
Conversation
don't squash all store errors to 400 status code - if the intention was to prevent information leak, it failed to do this because it included the error string in the response. do this by making DocumentStoreError a superclass of other store exception types so exception handlers can continue to listen for the general DocumentStoreError. return 410 for keys that have a delete marker, 404 for documents with no such key, 403 for documents that are blocked. notably also catch NoSuchKey from check_for_blocked_document's get_object_tagging calls and turn this into a DocumentNotFound error, because in practise this will be the first problem a request will encounter when trying to access a missing document.
…ation gracefully instead of causing a crash this should simply return the metadata with the expiration missing, allowing the client to make up its own mind what to do. in most normal cases this is only used for informational display, and in case of a maintenance issue or misconfiguration it's better to have this information missing rather than being unable to serve any documents to anyone (a certain P1)
… app logic and Expiration metadata this check was previously done in document-download-frontend, but it really belongs in the document-download-api if anywhere, if only because the api's download endpoints are publicly accessible and a document denied access by just the frontend could always just be downloaded via the api. note this check still relies on the s3 bucket's lifecycle policies being in place and implicitly the s3 object's last modified timestamp. this will only "rescue" us in cases where a delete marker has not yet been added by the lifecycle rile for some reason.
robinjam
approved these changes
May 7, 2025
robinjam
left a comment
There was a problem hiding this comment.
This is a massive improvement, nice one
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://trello.com/c/EmuRWKZy/1037-experiment-with-removing-mrap-from-document-downloads-bucket
Best reviewed commit-by-commit.
This gets document-download-api acting more-or-less sensibly when encountering missing documents and or documents that are supposed to be expired.
Will require some frontend changes to adapt to the endpoints returning anything other than 400 for these cases. (See alphagov/document-download-frontend#292)
The app-logic
Expirationhandling added here is a straight port of the logic that the frontend currently implements, though it could stand to be a lot stronger, potentially directly interpreting theretention-periodtags and comparing them against a tag-stored last-modified timestamp (which doesn't exist yet). Doing this would reduce our 100% dependence on the correct operation of the lifecycle rules and the s3 native last-modified timestamp (which isn't guaranteed to always be there if we e.g. use s3 backup). But as I say, that stuff isn't done yet - this PR just ports the existing logic across.