Skip to content

Commit

Permalink
Add documentation about asset behaviour
Browse files Browse the repository at this point in the history
  • Loading branch information
Tuomas Nylund committed Dec 7, 2023
1 parent 98e2c21 commit aaceb36
Show file tree
Hide file tree
Showing 11 changed files with 392 additions and 0 deletions.
160 changes: 160 additions & 0 deletions docs/assets.md
@@ -0,0 +1,160 @@

# Asset Management in Whitehall

*Assets, Attachments, Images, Logos, Featured Image, Lead Image, Response Form*

# What it is

There are several document types in Whitehall that utilize Images or
File Attachments.

For example, **News Articles** can have a list of attachments that
provide further information about the news, and a **Person** can have a picture
which is shown on the person's page and on any **Speech** page of that person.

# How it used to work

Historically, Whitehall had all file uploads stored within the Whitehall service.
Carrierwave was used to manage the http-requests and related file storage within
Whitehall.

Later on, a separate service - Asset Manager - became the store for file uploads, taking ownership of every file uploaded.
Carrierwave was still used to mange the http-request in Whitehall and a good amount of configuration was
written for carrierwave in order to make Asset Manager behave as if it was just a file storage.
These configurations rippled throughout the Whitehall codebase, made Whitehall a special case in
Asset Manager, locking asset manager design and increasing the overhead of understanding Whitehall.

An associated concept, "legacy url path", heavily blocked any refactoring attempts. "Legacy url path" was the medium
through which Whitehall defined what the url of an asset should be, even though Asset Manager would actually own the asset.
The "legacy url path" was used as a *de facto* id, based on Whitehall internal class names and their ids. Unlike Whitehall,
other publishing applications would let Asset Manager define asset-manager-ID-based urls for assets.

The "legacy url path" was eventually removed and asset references in Whitehall became simple rows in the database,
just like with all the other publishing applications. In order to store the data required to talk to Asset Manager,
a new Asset model was proposed. Find the related ADR for the asset model [here](adr/0002-new-asset-model.md).


# Technical Design

### Relationship between documents and Asset models

![Model diagram](diagrams/asset_models.png)

### How attachments and images are provided for the user

![Asset Usage](diagrams/asset_usage.png)


### File attachments upload flow
*Editionable Content* types such as **News Articles** and **Consultations**, as well as **Policy Groups**, can have attachments.
Uploads and republishing actions are managed asynchronously in the background by a Sidekiq worker.

![Attachment Upload Flow](diagrams/asset_attachment_upload_flow.png)

### Images upload flow
*Editionable Content* types such as **News Articles** and **Consultations** can have images that are used within the document.
Images can have various sizes (versions) which are uploaded asynchronously by a background job.

![Image Upload Flow](diagrams/asset_image_upload_flow.png)

### Featured Image upload flow
*Immediately published content* types such as **Person**, **Organisation** or **Topical Event** can have an associated image.
For example, a **Person** has an image which is displayed both on the person's page, and the pages of its associated speeches.
When a new **Person** image finishes uploading, both the Person and related Speeches pages need to be republished.

![Featured Image Upload Flow](diagrams/asset_featured_image_upload_flow.png)

****

# Cross functional requirements for File Attachments

*FileAttachments are considered to be important data for the document quality. They can contain detailed
information which is crucial for the containing document.*


**Correctness** over **Maintainability**

> An attachment can be configured to redirect the user to a newer version of the attachment.
This helps ensure that anyone who follows an outdated url will be given the correct up-to-date
information. This comes at the cost of complexity in maintaining redirect-urls for attachments.

**Obtainability** over **code simplicity**

> An attachment can be replaced by another attachment. These replacements are tracked in the system
so that the user can always be provided with an up-to-date version of the data.

**Recoverability** over **code simplicity**

> Attachments are always soft-deleted. This makes it possible to recover the attachments both in Whitehall and
Asset Manager when needed. The cost of soft-deletion is code that is harder to maintain and understand.


# Cross functional requirements for Images

*Images are considered mainly decorative and visually supportive items
for a document. They should not contain any crucial information required
in order for the document to be understood.*

**Accessibility**

> People who use screen readers should be able to gain just as much information
of the document at hand - whether they can see the image or not. Alt texts are always available
to provide descriptive information of an image.

**Code simplicity** over **Recoverability**

> Images are not considered crucial information. They are generally hard-deleted from Whitehall database in favor of
> having simple and readable code.
**User experience** over **Maintainability**

> Different devices require different image sizes in order for the ui to look good.
> These image variants need to be created and the code for it needs support and maintenance.

# Common cross functional requirements

**Security and Safety** over **ease of support**

> All attachments and images are scanned for potential malware in Asset Manager. This
is done to ensure the safety of the users of gov.uk. The cost of this is that sometimes
2nd line support people have to fix attachments that have false positives from the
virus scanner.

**User experience** over **code simplicity**

> File uploads are managed internally with asynchronous background jobs. This helps the publisher-user
have a fast user experience when uploading files. However, it comes at the cost of complicated code flows
that manage the successes and failures of the file uploads.

# Business Requirements

#### *Members of the public* need to see appropriate image sizes depending on the device they are using

> Whitehall creates up to 7 image sizes of each uploaded image. Urls to these images are published as part of the content
> so that rendering applications can then show the best image size for the use case.
#### *Members of the public* want to see a thumbnail of pdfs where an embedded link to the pdf is provided

> Whitehall generates thumbnails for pdf file attachments. Urls to both the pdf and its thumbnail are
> published as part of the content and then used by the rendering applications.
#### *Members of the public* should see document content with working image and attachment urls

> For *Editionable Content* types, such as **News Articles** or **Publication** - Publishing is prevented for documents
> with images and attachments that have not yet finished uploading.
#### *Publishers* need to know when publishing is blocked because file or image attachment processing has not yet finished or the processing has failed

> Whitehall UI shows a validation error for *Editionable Content* types with image or file attachments, when publishing
> fails because of unfinished background jobs.
#### *Publishers* need to know when file or image attachments are being processed so that they know to wait before expecting the new image to be live

> Whitehall UI shows a "PROCESSING" label next to images and attachments that have not yet finished uploading to Asset Manager.
#### *Members of the public* need to be able to see the latest images on the document's page as soon as the attachments finish uploading in the background

> Whitehall will republish content and any dependent content when file processing finishes in the background.
> For example, when changing a **DefaultNewsImage** for an **Organisation**, once all image variants have finished uploading, the **Organisation** is
> republished and any **News Articles** that would use the organisation's image as a lead image are republished as well.
Binary file added docs/diagrams/asset_attachment_upload_flow.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 58 additions & 0 deletions docs/diagrams/asset_attachment_upload_flow.puml
@@ -0,0 +1,58 @@
@startuml asset-attachment-upload-flow

title The sequence of uploading a file attachment in Whitehall (AttachmentData)
skinparam BoxPadding 20
skinparam ParticipantPadding 10

box "Whitehall" #f2fef8
actor publisher
participant Controller
participant AttachmentData
participant "AssetManager\nStorage" as AssetManagerStorage
participant "DraftEditor\nUpdater" as DraftEditionUpdater
end box

box Whitehall worker (Sidekiq)
participant "AssetManager\nCreateAssetWorker" as AssetManagerCreateAssetWorker
participant Asset
participant "PublishingApi\nDraftUpdateWorker" as PublishingApiDraftUpdateWorker
participant "DraftEditor\nUpdater" as bgDraftEditionUpdater
participant "AssetManager\nAttachmentMetadataWorker" as AssetManagerAttachmentMetadataWorker
end box

box "Asset Manager" #f2f8fe
participant AssetManager
end box

box "Publishing API" #f2f8fe
participant PublishingApi
end box

publisher -> Controller++ : upload an attachment
Controller -> AttachmentData : Save
AttachmentData -> AssetManagerStorage : Carrierwave invokes store!
AssetManagerStorage --> AssetManagerCreateAssetWorker : Enqueue
Controller -> DraftEditionUpdater : perform
DraftEditionUpdater -> PublishingApi : Update links, content and republish if necessary
DraftEditionUpdater --> AssetManagerAttachmentMetadataWorker: Enqueue
Controller -> publisher-- : Show attachments

activate AssetManagerCreateAssetWorker
AssetManagerCreateAssetWorker -> AssetManager : Upload the file to Asset Manager
AssetManagerCreateAssetWorker -> Asset: Save
AssetManagerCreateAssetWorker --> PublishingApiDraftUpdateWorker : Enqueue
deactivate AssetManagerCreateAssetWorker

activate PublishingApiDraftUpdateWorker
PublishingApiDraftUpdateWorker -> bgDraftEditionUpdater
bgDraftEditionUpdater -> PublishingApi : Update links, content and republish if necessary
bgDraftEditionUpdater --> AssetManagerAttachmentMetadataWorker: Enqueue
deactivate PublishingApiDraftUpdateWorker

group Happens 2 times because of DraftEditionUpdater
activate AssetManagerAttachmentMetadataWorker
AssetManagerAttachmentMetadataWorker -> AssetManager: Update metadata of the Asset\n:access_limited\n:draft\n:parent_document_url\n:redirect_url
deactivate AssetManagerAttachmentMetadataWorker
end group

@enduml
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 48 additions & 0 deletions docs/diagrams/asset_featured_image_upload_flow.puml
@@ -0,0 +1,48 @@
@startuml asset-featured-image-upload-flow

title The sequence of uploading an image to content types that are immediately republished (FeaturedImageData)
skinparam BoxPadding 20
skinparam ParticipantPadding 10

box "Whitehall" #f2fef8
actor publisher
participant Controller
participant "AssetManager\nCreateAssetWorker" as AssetManagerCreateAssetWorker
participant Person
participant FeaturedImageData
participant "AssetManager\nStorage" as AssetManagerStorage
end box

box Whitehall-background-jobs
participant "AssetManager\nCreateAssetWorker" as AssetManagerCreateAssetWorker
participant Asset as bgAsset
participant FeaturedImageData as bgFeaturedImageData
participant Person as bgPerson
end box

box "Asset Manager" #f2f8fe
participant AssetManager
end box

box "Publishing API" #f2f8fe
participant PublishingApi
end box

publisher -> Controller++ : Upload a new image
Controller -> Person : Save
Person -> PublishingApi : Update content, patch links and republish
Person -> FeaturedImageData : Save
FeaturedImageData -> AssetManagerStorage : Carrierwave invokes store!
AssetManagerStorage --> AssetManagerCreateAssetWorker : Enqueue 7 times
Controller -> publisher-- : Show person

group 7 times, once per image variant
activate AssetManagerCreateAssetWorker
AssetManagerCreateAssetWorker -> AssetManager : Upload the image version to Asset Manager
AssetManagerCreateAssetWorker -> bgAsset: Save
AssetManagerCreateAssetWorker -> bgFeaturedImageData : Republish
bgFeaturedImageData -> bgPerson : Republish
bgPerson -> PublishingApi-- : Update content, patch links and republish
end group

@enduml
Binary file added docs/diagrams/asset_image_upload_flow.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 42 additions & 0 deletions docs/diagrams/asset_image_upload_flow.puml
@@ -0,0 +1,42 @@
@startuml asset-image-upload-flow

title The sequence of uploading an image in Whitehall to editions (ImageData)
skinparam BoxPadding 20
skinparam ParticipantPadding 10

box "Whitehall" #f2fef8
actor publisher
participant Controller
participant ImageData
participant "AssetManager\nStorage" as AssetManagerStorage
end box

box Whitehall-background-jobs
participant "AssetManager\nCreateAssetWorker" as AssetManagerCreateAssetWorker
participant Asset
end box

box "Asset Manager" #f2f8fe
participant AssetManager
end box

box "Publishing API" #f2f8fe
participant PublishingApi
end box

publisher -> Controller++ : upload an image
Controller -> ImageData : Save
ImageData -> AssetManagerStorage : Carrierwave invokes store!
AssetManagerStorage --> AssetManagerCreateAssetWorker++ : Enqueue 7 times
Controller -> publisher-- : Show images

group 7 times, once per image variant
AssetManagerCreateAssetWorker -> AssetManager : Upload the image variant to Asset Manager
AssetManagerCreateAssetWorker -> Asset--: Save
end group

publisher -> Controller++ : publish edition
Controller -> ImageData : Validate that images\nare ready to be published
Controller -> PublishingApi : update content, patch links and publish

@enduml
Binary file added docs/diagrams/asset_models.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 55 additions & 0 deletions docs/diagrams/asset_models.puml
@@ -0,0 +1,55 @@
@startuml asset-models
allowmixing
hide empty members

object ImmediatelyPublishedDocuments {
Organisation
WorldwideOrganisation
Person
TopicalEvent
PolicyGroup
...
}
note right: Some content types are published\nimmediately as the user saves the changes.

object EditionableDocuments {
NewsArticle
Speech
Consultation
DetailedGuide
...
}
note right: Some content types are editionable\nThese documents have a\nseparate workflow for publishing the content.

class FeaturedImageData
class Asset
note right: References the\nAsset in Asset Manager

Attachment -up-> EditionableDocuments
Attachment -up-> ImmediatelyPublishedDocuments
AttachmentData -up-> Attachment
Asset -up-> AttachmentData

Image -up-> EditionableDocuments
ImageData -up-> Image
Asset -up-> ImageData

FeaturedImageData -up-> ImmediatelyPublishedDocuments
Asset -up-> FeaturedImageData

CallForEvidenceResponseForm -up-> EditionableDocuments
CallForEvidenceResponseFormData -up-> CallForEvidenceResponseForm
Asset -up-> CallForEvidenceResponseFormData

ConsultationResponseForm -up-> EditionableDocuments
ConsultationResponseFormData -up-> ConsultationResponseForm
Asset -up-> ConsultationResponseFormData

TopicalEventFeaturing -up-> ImmediatelyPublishedDocuments
TopicalEventFeaturingImageData -up-> TopicalEventFeaturing
Asset -up-> TopicalEventFeaturingImageData

PromotionalFeature -up-> ImmediatelyPublishedDocuments
PromotionalFeatureItem -up-> PromotionalFeature
Asset -up-> PromotionalFeatureItem
@enduml
Binary file added docs/diagrams/asset_usage.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
29 changes: 29 additions & 0 deletions docs/diagrams/asset_usage.puml
@@ -0,0 +1,29 @@
@startuml
title Example flow how an Organisation's image becomes available for the End user


frame "Publishing" {
actor "Publisher"
database MySQL
database "MongoDB"
circle "Asset Manager S3"

"Publisher" --> [Whitehall]: 1. Upload a new\nimage for an\nOrganisation
[Whitehall] -u-> [Asset Manager]: 2. Upload the\nimage
[Whitehall] -> "MySQL": 3. Store Asset\nreference
[Asset Manager] -> [Asset Manager]: Run Virus Scanner
[Asset Manager] --> "Asset Manager S3": Store file
[Whitehall] --> [Publishing Api]: 4. Publish\nOrganisation\ncontent
[Publishing Api] -l-> [Content Store]: 5. Store\nOrganisation\ncontent
[Content Store] -d-> "MongoDB": 6. Store\nOrganisation\ncontent
}

frame "Rendering" {
actor User
"User" -u-> [Collections]: 7. Request\nOrganisations\npage
[Collections] --> [Content Store]: 8. Fetch latest\nOrganisation\nContent
[Collections] --> "User": 9. Render page\nwith a url to\nthe image in\nAsset Manager
"User" --> [Asset Manager]: 10. Request Image
}

@enduml

0 comments on commit aaceb36

Please sign in to comment.