-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redirect Whitehall images and attachments #1568
Conversation
c67be0f
to
ab82e1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great Chris. Really meaty bit of work.
First off I have the most annoying comment that I think record is too general a name - I'm very sorry as I know it's propagated far in this PR and you're welcome to convince me otherwise as I'm sure it'll be a pain to change.
I've made a bunch of comments about code organisation but most important things to work out if we're on the same page about things like handling variants and asset_import states.
Happy to chat face-to-face about any comments if that's easier.
ab82e1b
to
7a112a4
Compare
dca6508
to
212c4cd
Compare
If a migration fails, we need to be able to re-rerun the migration task. We don't want to run into issues needlessly re-migrating assets that have already been processed. We had hoped to use Rails 'scopes' for this but it was far more complex than originally thought, as we need to scope on an association rather than a static method call. There is also some complexity to do with factories. See: #1568 (comment) As a result, I've opted for a `migratable_assets` method on the DocumentImport class which returns a subset of its `assets`.
212c4cd
to
5fe5e24
Compare
If a migration fails, we need to be able to re-rerun the migration task. We don't want to run into issues needlessly re-migrating assets that have already been processed. We had hoped to use Rails 'scopes' for this but it was far more complex than originally thought, as we need to scope on an association rather than a static method call. There is also some complexity to do with factories. See: #1568 (comment) As a result, I've opted for a `migratable_assets` method on the DocumentImport class which returns a subset of its `assets`.
5fe5e24
to
33458fb
Compare
34989ab
to
8b3710f
Compare
If a migration fails, we need to be able to re-rerun the migration task. We don't want to run into issues needlessly re-migrating assets that have already been processed. We had hoped to use Rails 'scopes' for this but it was far more complex than originally thought, as we need to scope on an association rather than a static method call. There is also some complexity to do with factories. See: #1568 (comment) As a result, I've opted for a `migratable_assets` method on the DocumentImport class which returns a subset of its `assets`.
8b3710f
to
fb7ea1e
Compare
Thanks @kevindew , I've incorporated your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one Chris 💯
If a migration fails, we need to be able to re-rerun the migration task. We don't want to run into issues needlessly re-migrating assets that have already been processed. We had hoped to use Rails 'scopes' for this but it was far more complex than originally thought, as we need to scope on an association rather than a static method call. There is also some complexity to do with factories. See: #1568 (comment) As a result, I've opted for a `migratable_assets` method on the DocumentImport class which returns a subset of its `assets`.
fb7ea1e
to
47cc793
Compare
This parameter replaces the `whitehall_document` parameter, which we can now access via the `document_import` parameter when needed. Passing a reference to WhitehallMigration::DocumentImport allows us to update the import record in real time, e.g. updating the document ID associated with the record, which we can only do once we've created the document. We will do this in the next commit.
We'll later write a custom validator to ensure only one of image_revision or file_attachment_revision is referenced by each asset.
`document_import` will be an instance of WhitehallMigration::DocumentImport. Currently we pass `nil` in place of this, but as there are a fair few code changes involved in the passing of this extra parameter to these classes, I've encapsulated them in one commit for now. In the next commit we'll change these classes to act upon the record. And in the commit after that, we'll pass the record in create_revision rather than passing 'nil'.
This meant the 'document' parameter now became redundant, as we can derive this from the 'document_import' parameter, so a lot of files have been touched. But the code itself is a relatively small change.
This factory is about to be used by a number of subsequent commits. For ease of rebasing/editing, I've encapsulated it in one commit. For the record, I attempted to name the factories `:image` and `:file_attachment`, but as these names clashed with existing factory names, it was trying to set `image`/`file_attachment` attributes instead. I've gone into more detail here: thoughtbot/factory_bot#528 (comment)
I've added a default "s960" variant of images to the image factory so that this behaviour can be tested. I've also had to specify an 'assets' association on DocumentImport so that I can properly test the new behaviour. This 'assets' association will come in handy later when we write the migration logic.
I've had to make the image_revision/file_attachment_revision associations optional as only one of them should exist per asset. In a future commit I intend to write a validator to ensure this is the case. As JSON doesn't support the concept of `nil`, and we don't anticipate encountering a variant of an asset that contains nils, I've tweaked the factory data for file attachments so that a content type and url are actually defined.
We only want to be associated with one Image::Revision or one FileAttachment::Revision. Prior to this commit, it was possible to associate a single WhitehallImportedAsset with both. Ideally this validation would happen at the database level, but we can't create the Image::Revision/FileAttachment::Revision ahead of time as we want to create the import record for it first, in case something goes wrong. Therefore both fields are optional at the database level.
This was copied from the 'asset_manager_id' method inside FileAttachment::Asset and Image::Asset, and tweaked slightly to reference the 'original_asset_url' key in the database. However, I've named this 'whitehall_asset_id' as this is the ID that Whitehall assigns the asset. This method will come in handy when we write the migration logic.
This method determines which content publisher asset URL to return, depending on whether it is a fileattachment or image revision, and whether or not it is the canonical asset or a variant of it. This decorator method will simplify the migration logic, as we can simply check if 'content_publisher_asset' is nil to determine whether or not the original whitehall asset should be redirected to it or removed altogether.
This service will take a `WhitehallMigration::DocumentImport` and attempt to tidy up all the old Whitehall assets associated with the imported document.
This will redirect 'master' image/file-attachments to their Content Publisher equivalents, and image variants to their equivalents. File attachments variants should be deleted, rather than redirected, but we will handle this in a subsequent commit. (Whitehall assets that have no Content Publisher equivalents will return 'nil' when their 'content_publisher_asset' method is called. In the next commit we will delete these.)
(Ideally these would be in two separate commits but by applying one, we get the other 'for free'.) All assets on the draft stack should be deleted, period. Some assets on the live stack should be deleted too; namely: - Any file attachment variants (there is no equivalent in Content Publisher for things like 'file attachment thumbnail') - Some image variants (there is no equivalent in Content Publisher for certain image sizes, e.g. "s216")
Images/attachments are allowed to have the same filenames - we increment the name by 1 every time - but the preceding prefix has to be different. If an identical URL is encountered, we've already processed the asset once and don't need to do it again. However, we do need to return the processed asset even if we've processed it in the past, otherwise the imported document will be missing the image/fileattachment revision. So this has become a 'find or create' method rather than simply 'create'.
Catches any errors (including GdsApi ones) we may encounter whilst migrating assets.
If a migration fails, we need to be able to re-rerun the migration task. We don't want to run into issues needlessly re-migrating assets that have already been processed. We had hoped to use Rails 'scopes' for this but it was far more complex than originally thought, as we need to scope on an association rather than a static method call. There is also some complexity to do with factories. See: #1568 (comment) As a result, I've opted for a `migratable_assets` method on the DocumentImport class which returns a subset of its `assets`.
47cc793
to
c194a6e
Compare
This PR has all the code for tidying up a Whitehall document's assets once the document has been imported into Content Publisher. After import, we can run the new
MigrateAssets
service to:How it works
At the import stage
Creates a
WhitehallMigration::AssetImport
record for every asset (Image/FileAttachment) referenced in the Whitehall export. It creates a record for each variant of each asset too. However, it avoids creating duplicate records, so that if an asset exists in multiple editions in a Whitehall export, it will only have one record created for it.At the migration stage
We've created a
MigrateAssets
service which is called from the WhitehallImportersync
/import_and_sync
methods. It iterates over eachWhitehallMigration::AssetImport
record, deleting the asset if its 'master' asset is not live, or deleting it if it is a 'variant' file attachment (e.g. thumbnail). Otherwise, it redirects the Whitehall asset to the main asset on Content Publisher.NOTE: if something goes wrong in the import_and_sync step, such that the master asset was not successfully synchronised with asset manager and does not have the 'live' state, there is a risk we will delete Whitehall assets and have no assets at all. However, in a dev discussion before Christmas we decided that the migration code should make the assumption that everything was imported correctly.
Background information
This work comes out of a spike in #1509, and further investigative work in #1530.
Trello card: https://trello.com/c/DMpTECux/1232-handle-redirecting-and-removing-old-assets-as-part-of-whitehall-migration