Skip to content

CP-13299 Fix lazy preview caching#64

Merged
spaulsandhu merged 1 commit into
mainfrom
CP-13299
May 11, 2026
Merged

CP-13299 Fix lazy preview caching#64
spaulsandhu merged 1 commit into
mainfrom
CP-13299

Conversation

@spaulsandhu
Copy link
Copy Markdown
Member

Overview

Fixes a bug in production where any page after 15 was rendering intermittently.

Pages beyond MAX_NUMBER_OF_PAGES_PROCESSED (15) are rendered on-demand
by PreviewDocumentPageController#show. Before regenerating, the
controller looks for an existing cached blob:

preview_image = attachment.preview_images.joins(:blob)
                          .find_by(blob: { filename: ["#{params[:id]}.png", "#{params[:id]}.jpg"] })

But Templates::ProcessDocument.generate_pdf_preview_from_file was
saving the blob as "#{page_number}.jpeg"  .jpeg  .jpg, so the
cache lookup never matched. Every request to /preview/<token>/<n>.jpg
re-downloaded the source PDF, re-rendered the page through Pdfium, and
created a duplicate ActiveStorage::Attachment.

Impact in production

- Editor shows broken images for pages 16+ when Pdfium fails under repeated load.
- Duplicate ActiveStorage::Attachment records pile up (no dedup on create!).
- Wasted S3 upload + Pdfium CPU on every editor render of pages 16+.

###  Change

One-line fix: save lazy-generated previews as .jpg so the
controller's cache lookup matches and redirects to the existing blob
on subsequent requests.

### Test

Added spec/lib/templates/process_document_spec.rb running the exact
controller-side find_by against a freshly generated preview. Locks
in the filename contract so a future regression to .jpeg (or any
other non-matching extension) fails immediately.

Manually verified the failure mode by reverting the fix locally  spec
fails with got: nil from the cache lookup.

Two notes if you want to tweak before pushing:
- The triple-backtick block in the description renders nicely on GitHub but you may want to drop it if your PR template is bare text.
- This fix doesn't address the other two issues we discussed (nil-blob 500s, partial-tempfile corruption). Worth a one-liner at the end of the description like *"Follow-ups (separate tickets): nil-blob handling in controller, tempfile completeness check"* — so reviewers don't think those are out of scope by omission.

## Summary

  Pages beyond `MAX_NUMBER_OF_PAGES_PROCESSED` (15) are rendered on-demand
  by `PreviewDocumentPageController#show`. Before regenerating, the
  controller looks for an existing cached blob:

  ```ruby
  preview_image = attachment.preview_images.joins(:blob)
                            .find_by(blob: { filename: ["#{params[:id]}.png", "#{params[:id]}.jpg"] })

  But Templates::ProcessDocument.generate_pdf_preview_from_file was
  saving the blob as "#{page_number}.jpeg" — .jpeg ≠ .jpg, so the
  cache lookup never matched. Every request to /preview/<token>/<n>.jpg
  re-downloaded the source PDF, re-rendered the page through Pdfium, and
  created a duplicate ActiveStorage::Attachment.

  Impact in production

  - Editor shows broken images for pages 16+ when Pdfium fails under
  repeated load.
  - Duplicate ActiveStorage::Attachment records pile up (no dedup on
  create!).
  - Wasted S3 upload + Pdfium CPU on every editor render of pages 16+.

  Change

  One-line fix: save lazy-generated previews as .jpg so the
  controller's cache lookup matches and redirects to the existing blob
  on subsequent requests.

  Test

  Added spec/lib/templates/process_document_spec.rb running the exact
  controller-side find_by against a freshly generated preview. Locks
  in the filename contract so a future regression to .jpeg (or any
  other non-matching extension) fails immediately.

  Manually verified the failure mode by reverting the fix locally — spec
  fails with got: nil from the cache lookup.

  Two notes if you want to tweak before pushing:
  - The triple-backtick block in the description renders nicely on GitHub but you may want to drop it if your PR template is bare text.
  - This fix doesn't address the other two issues we discussed (nil-blob 500s, partial-tempfile corruption). Worth a one-liner at the end of the
  description like *"Follow-ups (separate tickets): nil-blob handling in controller, tempfile completeness check"* — so reviewers don't think those
  are out of scope by omission.
def generate_pdf_preview_from_file(attachment, file_path, page_number)
doc = Pdfium::Document.open_file(file_path)

blob = build_and_upload_blob(doc, page_number, '.jpeg')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Non-blocking] Do we need to worry about any existing .jpeg blobs in production?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call out, probably not since this still still in early labs, but I'll double check!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't since nothing will break, we'll have a small amount of duplicate .jpeg files sitting around in prod but since the number of people using this product is so low I don't mind letting those orphaned dupe images sit in S3, the number is probably insignificant and not worth the clean-up.

Great question, and thanks for the sanity check!

Copy link
Copy Markdown

@jewls618 jewls618 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 just left one non-blocking comment/concern.

@spaulsandhu spaulsandhu merged commit c848888 into main May 11, 2026
5 checks passed
@spaulsandhu spaulsandhu deleted the CP-13299 branch May 11, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants