Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect Whitehall images and attachments #1568

Merged
merged 20 commits into from
Jan 9, 2020
Merged

Conversation

ChrisBAshton
Copy link
Contributor

@ChrisBAshton ChrisBAshton commented Dec 23, 2019

This PR has all the code for tidying up a Whitehall document's assets once the document has been imported into Content Publisher. After import, we can run the new MigrateAssets service to:

  • delete all Whitehall assets (images/file attachments) that only existed on the draft stack
  • redirect all 'master' assets (e.g. the 960 size image, rather than its thumbnails) to the new master version on Content Publisher, if the Whitehall assets were on the live stack
  • delete all variants of Whitehall file attachments (whether on live or draft stack), since they'll no longer appear anywhere
  • redirect all variants of Whitehall images to the new master image on Content Publisher, since the old images may be hotlinked from elsewhere
  • log any assets that fail to migrate, and fail the sync step at the end

How it works

At the import stage

Creates a WhitehallMigration::AssetImport record for every asset (Image/FileAttachment) referenced in the Whitehall export. It creates a record for each variant of each asset too. However, it avoids creating duplicate records, so that if an asset exists in multiple editions in a Whitehall export, it will only have one record created for it.

At the migration stage

We've created a MigrateAssets service which is called from the WhitehallImporter sync/import_and_sync methods. It iterates over each WhitehallMigration::AssetImport record, deleting the asset if its 'master' asset is not live, or deleting it if it is a 'variant' file attachment (e.g. thumbnail). Otherwise, it redirects the Whitehall asset to the main asset on Content Publisher.

NOTE: if something goes wrong in the import_and_sync step, such that the master asset was not successfully synchronised with asset manager and does not have the 'live' state, there is a risk we will delete Whitehall assets and have no assets at all. However, in a dev discussion before Christmas we decided that the migration code should make the assumption that everything was imported correctly.

Background information

This work comes out of a spike in #1509, and further investigative work in #1530.

Trello card: https://trello.com/c/DMpTECux/1232-handle-redirecting-and-removing-old-assets-as-part-of-whitehall-migration

Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great Chris. Really meaty bit of work.

First off I have the most annoying comment that I think record is too general a name - I'm very sorry as I know it's propagated far in this PR and you're welcome to convince me otherwise as I'm sure it'll be a pain to change.

I've made a bunch of comments about code organisation but most important things to work out if we're on the same page about things like handling variants and asset_import states.

Happy to chat face-to-face about any comments if that's easier.

lib/whitehall_importer/create_file_attachment_revision.rb Outdated Show resolved Hide resolved
spec/factories/whitehall_migration_asset_import_factory.rb Outdated Show resolved Hide resolved
lib/whitehall_importer/create_image_revision.rb Outdated Show resolved Hide resolved
app/models/whitehall_migration/asset_import.rb Outdated Show resolved Hide resolved
lib/whitehall_importer/migrate_assets.rb Show resolved Hide resolved
lib/whitehall_importer/create_revision.rb Outdated Show resolved Hide resolved
lib/whitehall_importer/migrate_assets.rb Outdated Show resolved Hide resolved
@ChrisBAshton ChrisBAshton force-pushed the redirect-assets-new branch 12 times, most recently from dca6508 to 212c4cd Compare January 3, 2020 10:28
ChrisBAshton added a commit that referenced this pull request Jan 3, 2020
If a migration fails, we need to be able to re-rerun the migration
task. We don't want to run into issues needlessly re-migrating
assets that have already been processed.

We had hoped to use Rails 'scopes' for this but it was far more
complex than originally thought, as we need to scope on an association
rather than a static method call. There is also some complexity to
do with factories. See:
#1568 (comment)

As a result, I've opted for a `migratable_assets` method on the
DocumentImport class which returns a subset of its `assets`.
ChrisBAshton added a commit that referenced this pull request Jan 3, 2020
If a migration fails, we need to be able to re-rerun the migration
task. We don't want to run into issues needlessly re-migrating
assets that have already been processed.

We had hoped to use Rails 'scopes' for this but it was far more
complex than originally thought, as we need to scope on an association
rather than a static method call. There is also some complexity to
do with factories. See:
#1568 (comment)

As a result, I've opted for a `migratable_assets` method on the
DocumentImport class which returns a subset of its `assets`.
ChrisBAshton added a commit that referenced this pull request Jan 8, 2020
If a migration fails, we need to be able to re-rerun the migration
task. We don't want to run into issues needlessly re-migrating
assets that have already been processed.

We had hoped to use Rails 'scopes' for this but it was far more
complex than originally thought, as we need to scope on an association
rather than a static method call. There is also some complexity to
do with factories. See:
#1568 (comment)

As a result, I've opted for a `migratable_assets` method on the
DocumentImport class which returns a subset of its `assets`.
@ChrisBAshton
Copy link
Contributor Author

Thanks @kevindew , I've incorporated your suggestions.

Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one Chris 💯

ChrisBAshton added a commit that referenced this pull request Jan 9, 2020
If a migration fails, we need to be able to re-rerun the migration
task. We don't want to run into issues needlessly re-migrating
assets that have already been processed.

We had hoped to use Rails 'scopes' for this but it was far more
complex than originally thought, as we need to scope on an association
rather than a static method call. There is also some complexity to
do with factories. See:
#1568 (comment)

As a result, I've opted for a `migratable_assets` method on the
DocumentImport class which returns a subset of its `assets`.
This parameter replaces the `whitehall_document` parameter, which
we can now access via the `document_import` parameter when needed.

Passing a reference to WhitehallMigration::DocumentImport allows
us to update the import record in real time, e.g. updating the
document ID associated with the record, which we can only do once
we've created the document. We will do this in the next commit.
We'll later write a custom validator to ensure only one of
image_revision or file_attachment_revision is referenced by each
asset.
`document_import` will be an instance of WhitehallMigration::DocumentImport.
Currently we pass `nil` in place of this, but as there are a fair
few code changes involved in the passing of this extra parameter to
these classes, I've encapsulated them in one commit for now.

In the next commit we'll change these classes to act upon the record.
And in the commit after that, we'll pass the record in create_revision
rather than passing 'nil'.
This meant the 'document' parameter now became redundant, as we
can derive this from the 'document_import' parameter, so a lot of files
have been touched. But the code itself is a relatively small change.
This factory is about to be used by a number of subsequent
commits. For ease of rebasing/editing, I've encapsulated it in
one commit.

For the record, I attempted to name the factories `:image` and
`:file_attachment`, but as these names clashed with existing
factory names, it was trying to set `image`/`file_attachment`
attributes instead. I've gone into more detail here:
thoughtbot/factory_bot#528 (comment)
I've added a default "s960" variant of images to the image factory
so that this behaviour can be tested.

I've also had to specify an 'assets' association on DocumentImport
so that I can properly test the new behaviour. This 'assets'
association will come in handy later when we write the migration
logic.
I've had to make the image_revision/file_attachment_revision
associations optional as only one of them should exist per asset.
In a future commit I intend to write a validator to ensure this is
the case.

As JSON doesn't support the concept of `nil`, and we don't
anticipate encountering a variant of an asset that contains nils,
I've tweaked the factory data for file attachments so that a
content type and url are actually defined.
We only want to be associated with one Image::Revision or one
FileAttachment::Revision. Prior to this commit, it was possible
to associate a single WhitehallImportedAsset with both.

Ideally this validation would happen at the database level, but
we can't create the Image::Revision/FileAttachment::Revision
ahead of time as we want to create the import record for it first,
in case something goes wrong. Therefore both fields are optional
at the database level.
This was copied from the 'asset_manager_id' method inside
FileAttachment::Asset and Image::Asset, and tweaked slightly to
reference the 'original_asset_url' key in the database.
However, I've named this 'whitehall_asset_id' as this is the ID
that Whitehall assigns the asset.

This method will come in handy when we write the migration logic.
This method determines which content publisher asset URL to return,
depending on whether it is a fileattachment or image revision, and
whether or not it is the canonical asset or a variant of it.

This decorator method will simplify the migration logic, as we can
simply check if 'content_publisher_asset' is nil to determine whether
or not the original whitehall asset should be redirected to it or
removed altogether.
This service will take a `WhitehallMigration::DocumentImport`
and attempt to tidy up all the old Whitehall assets associated with
the imported document.
This will redirect 'master' image/file-attachments to their
Content Publisher equivalents, and image variants to their
equivalents. File attachments variants should be deleted, rather
than redirected, but we will handle this in a subsequent commit.

(Whitehall assets that have no Content Publisher equivalents will
return 'nil' when their 'content_publisher_asset' method is called.
In the next commit we will delete these.)
(Ideally these would be in two separate commits but by applying one,
we get the other 'for free'.)

All assets on the draft stack should be deleted, period.
Some assets on the live stack should be deleted too; namely:

- Any file attachment variants (there is no equivalent in Content
  Publisher for things like 'file attachment thumbnail')
- Some image variants (there is no equivalent in Content Publisher
  for certain image sizes, e.g. "s216")
Images/attachments are allowed to have the same filenames - we
increment the name by 1 every time - but the preceding prefix
has to be different. If an identical URL is encountered, we've
already processed the asset once and don't need to do it again.

However, we do need to return the processed asset even if we've
processed it in the past, otherwise the imported document will
be missing the image/fileattachment revision. So this has become
a 'find or create' method rather than simply 'create'.
Catches any errors (including GdsApi ones) we may encounter whilst
migrating assets.
If a migration fails, we need to be able to re-rerun the migration
task. We don't want to run into issues needlessly re-migrating
assets that have already been processed.

We had hoped to use Rails 'scopes' for this but it was far more
complex than originally thought, as we need to scope on an association
rather than a static method call. There is also some complexity to
do with factories. See:
#1568 (comment)

As a result, I've opted for a `migratable_assets` method on the
DocumentImport class which returns a subset of its `assets`.
@ChrisBAshton ChrisBAshton merged commit aba29bc into master Jan 9, 2020
@ChrisBAshton ChrisBAshton deleted the redirect-assets-new branch January 9, 2020 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants