-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit Whitehall attachment metadata in Asset Manager #544
Comments
We ran a version of the script in integration earlier today requesting 1000 samples. It didn't complete because it got a
|
I've described my investigation into the |
I've also investigated three of the apparent discrepancies where Whitehall responds with a
|
This has been done now. |
Given the number of problems we've run into in getting this metadata into Asset Manager, we think it would be sensible to audit the metadata for at least a significant subset of the Whitehall attachments to make sure there are no major systematic discrepancies, before switching the Nginx config over so that Asset Manager is serving the attachments.
We've made a start on writing a script to do this in this branch. The idea was to compare the responses to attachment requests served by Whitehall with the responses to attachment requests served by Asset Manager, i.e. is the response code the same, are the relevant response headers the same, is the body the same length, etc.
Unfortunately, the script is not quite ready for use in production. Here are some of the outstanding issues:
It requires the password for a user who has signin permission for Asset Manager to be set in an environment variable. This would mean the password would be left in the shell history which isn't ideal. It might be better to either (a) use
gets
to request the password interactively within the script; or (b) split the script into two (see point 2).It might be simpler to split the script into two parts: one to generate the attachment URL paths by querying
AttachmentData
data from the Whitehall DB (which would need to be run in production); and one to actually make the attachment requests to Whitehall and Asset Manager (which could be run on a local developer machine).It's not ideal that the attachment requests against Asset Manager are using
draft-assets
, but this is the only way we can currently get access to them until the Nginx config is switched over. This is the reason why the script is currently skipping attachments which are not publicly accessible.By default Mechanize raises a
Mechanize::ResponseCodeError
if an unhandled response code is returned. I've added404 Not Found
to theallowed_error_codes
list, but it might be worth adding some others, e.g.502 Bad Gateway
, or perhaps rescuing theResponseCodeError
exception to make the script more robust.I think the script might not work against production, because of the 2-step verification process.
We were imagining that this audit process would also compare more of the HTTP response headers to make sure there would be no change in behaviour when switching over the Nginx config.
The text was updated successfully, but these errors were encountered: