Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce print preview extraction #22612

Merged
merged 15 commits into from Apr 12, 2024
Merged

Introduce print preview extraction #22612

merged 15 commits into from Apr 12, 2024

Conversation

darkdh
Copy link
Member

@darkdh darkdh commented Mar 14, 2024

Resolves brave/brave-browser#36649

In nutshell:
This PR make AIChatPanel a printing::mojom::PrintPreviewUI in order to initiate print preview then we compose print preview result into a pdf and we convert each page of pdf into image and do OCR for each image, finally we concat the results from OCR. This feature will only be rolled out on docs.google.com initially, we will introduce it as general fallback in future PR.

Since there are two PrintPreviewUI sharing printing::mojom::PrintRenderFrame, we have to deal with

  • PrintPreviewUI ID and print preview requests ID conflicts
  • PrintRenderFrame will be double bound if AIChatUI is doing print preview extraction with print dialog open.
  • Print dialog shows up when AIChatUI initiate print preview extraction due to PrintManagerHost::RequestPrintPreview calling PrintPreviewDialogController::PrintPreview

Unlike print dialog, AIChatPanel will disconnect PrintRenderFrame when print preview is done or failed.
We also introduce a new printing service(PdftoBitmapConverter) to convert the selected pdf page into a SkBitmap and
since large context will be truncated, we short-circuit the per page OCR process if context limit is reached or page limit is reached.

Submitter Checklist:

  • I confirm that no security/privacy review is needed and no other type of reviews are needed, or that I have requested them
  • There is a ticket for my issue
  • Used Github auto-closing keywords in the PR description above
  • Wrote a good PR/commit description
  • Squashed any review feedback or "fixup" commits before merge, so that history is a record of what happened in the repo, not your PR
  • Added appropriate labels (QA/Yes or QA/No; release-notes/include or release-notes/exclude; OS/...) to the associated issue
  • Checked the PR locally:
    • npm run test -- brave_browser_tests, npm run test -- brave_unit_tests wiki
    • npm run presubmit wiki, npm run gn_check, npm run tslint
  • Ran git rebase master (if needed)

Reviewer Checklist:

  • A security review is not needed, or a link to one is included in the PR description
  • New files have MPL-2.0 license header
  • Adequate test coverage exists to prevent regressions
  • Major classes, functions and non-trivial code blocks are well-commented
  • Changes in component dependencies are properly reflected in gn
  • Code follows the style guide
  • Test plan is specified in PR before merging

After-merge Checklist:

Test Plan: (Windows and MacOS only)

Regression test on previous google doc support

  1. Open a google doc with only 1 page of content and summarize it
  2. Summary should be relevant

Test on full page google doc support

  1. Open a google doc with only multiple pages of content and summarize it
  2. Summary should be relevant

Test on page limit (20)

  1. Open a google doc with 19 pages of blank pages and 2 pages with completely different scoped of contents
  2. Summarize the page
  3. Summary should be only relevant to 20th page but not 21st page

Compatibility with print dialog

  1. Open a google doc and print it
  2. When print dialog is open, open Leo panel and summarize the page
  3. Summary should show up
  4. Close print dialog and trigger print again
  5. Print dialog should still show up with print preview available

@darkdh darkdh self-assigned this Mar 14, 2024
@darkdh darkdh marked this pull request as ready for review March 14, 2024 23:51
@darkdh darkdh requested review from a team as code owners March 14, 2024 23:51
browser/ui/webui/ai_chat/ai_chat_ui_page_handler.cc Outdated Show resolved Hide resolved
pdf_to_bitmap_converter_.BindNewPipeAndPassReceiver());
pdf_to_bitmap_converter_.set_disconnect_handler(
base::BindOnce(&AIChatUIPageHandler::BitmapConverterDisconnected,
base::Unretained(this)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:

- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated


Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml


Cc @thypon @goodov @iefremov

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We own the mojo remote

pdf_to_bitmap_converter_->GetBitmap(
std::move(pdf_region.region),
base::BindOnce(&AIChatUIPageHandler::OnGetBitmaps,
base::Unretained(this)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:

- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated


Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml


Cc @thypon @goodov @iefremov

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We own the mojo remote

chromium_src/chrome/browser/printing/print_view_manager.h Outdated Show resolved Hide resolved
build/commands/lib/util.js Outdated Show resolved Hide resolved
const SkImageInfo info =
SkImageInfo::Make(size.width(), size.height(), kBGRA_8888_SkColorType,
kOpaque_SkAlphaType);
if (!bitmap.tryAllocPixels(info, info.minRowBytes())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the actual memory footprint of these bitmaps? A single A4 page @ 300dpi might take ~30MB in 8-bit RGBA. I'm afraid right now this will eat a lot of RAM, just think of 100+ page documents.

Please consider a queued conversion here, you're calling the text-recognition API on per-page basis anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that is a good point. We currently impose page limit of 20 on text recognition process so it would be wasting resources to generate bitmaps more than that limit. I will add a max_pages to the API and call it with the same limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in cc93d16

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recorded a new trace with legacy UI
Screenshot 2024-03-19 at 17 29 41

Copy link
Member Author

@darkdh darkdh Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 300 dpi, the dimension for rendering 560px x 795px image is 1.87in x 2.65in and the size gfx::Size size = gfx::ToCeiledSize(*page_size); (420pt x 596pt = 5.83in x 8.28in with CSS standard 96 dpi) passed into chrome_pdf::RenderPDFPageToBitmap should be sufficient enough to contain the image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I think I figured what's going on and why things seems to be working fine, but I wasn't able to connect the dots.

  1. GetPDFPageSizeByIndex returns the size in points, NOT pixels. Each point is 1/72 of an inch (see chrome_pdf::CalculatePosition and printing::kPointsPerInch).
  2. You allocate the bitmap using points you get from this call, NOT pixels.
  3. You call chrome_pdf::RenderPDFPageToBitmap with the bitmap you allocated and a requested dpi of 300, but the renderer can't really fit the 300 dpi image into the bitmap you passed.
  4. The renderer recalculates width/height and renders the image at pixel-to-point dpi (72), ignoring 300 dpi you pass.

So after these manipulations you get 72 dpi image that happens to pass OCR. Will it work in documents with a smaller font size? Did you really want to run the OCR with 72 dpi images?

Either way, please add comments on what's actually going on and maybe increase the dpi to cover documents with smaller fonts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we definitely need upscale to 300 dpi because when I changed font size from 11 to 5, the OCR result is totally wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the summary, upscale bitmap to 300 dpi in f942df0. And now the bitmap size for each page is 16.58MB which is the maximum bitmap allocation for the utility process because we do conversion and OCR on per page basis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the summary, upscale bitmap to 300 dpi in f942df0. And now the bitmap size for each page is 16.58MB which is the maximum bitmap allocation for the utility process because we do conversion and OCR on per page basis.

awesome!


void AIChatUIPageHandler::OnGetBitmaps(
const std::optional<std::vector<SkBitmap>>& bitmaps) {
VLOG(3) << __func__ << ": bitmap size: " << (bitmaps ? bitmaps->size() : -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you manipulate with bitmaps on UI thread?
If bitmaps is huge, even simple operations could result in a short UI hang.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean any huge vectors in general or a huge vector with SkiBitmap specifically?
I can remove this VLOG though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also these bitmaps will be OCR on different thread in PreviewPageTextExtractor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with 4166f2a, we no longer have this function

@darkdh darkdh force-pushed the preview-extraction branch 3 times, most recently from 6625979 to 3434c0f Compare March 18, 2024 21:30
@darkdh darkdh requested a review from goodov March 19, 2024 20:19
@darkdh darkdh force-pushed the preview-extraction branch 3 times, most recently from 42bb043 to 5eb9df9 Compare April 11, 2024 22:46
@darkdh darkdh requested a review from goodov April 11, 2024 23:19
@@ -1,4 +1,7 @@
include_rules = [
"+brave/services/printing/public/mojom",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be added into webui/ai_chat/DEPS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and squashed.

@@ -85,6 +84,10 @@ AIChatUIPageHandler::AIChatUIPageHandler(

favicon_service_ = FaviconServiceFactory::GetForProfile(
profile_, ServiceAccessType::EXPLICIT_ACCESS);
#if BUILDFLAG(ENABLE_PRINT_PREVIEW)
print_preview_extractor_ = std::make_unique<PrintPreviewExtractor>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only be created when you will use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and squashed.

// Stop processing if we have reached the maximum number of pages or the
// maximum length of the content
if (current_page_index_ + 1 >= kMaxPreviewPages ||
preview_text_.str().length() >= max_page_content_length_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preview_text_.str() will always allocate a new string, please don't do that.

Do you really need to use std::stringstream? std::string and base::StrAppend should work just fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and squashed.

bool IsPrintPreviewUIBound() const;
void SetPreviewUIId();
void ClearPreviewUIId();
void OnPrintPreviewRequest(int request_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of these methods should be private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and squashed.

Copy link
Contributor

[puLL-Merge] - brave/brave-core@22612

Description

This PR adds print preview support to the AI Chat feature in Brave. It allows extracting text from PDF documents using OCR in the print preview flow. The main motivation is to enable AI Chat to provide assistance on document-like websites such as Google Docs.

Changes

Changes

  • browser/ai_chat/BUILD.gn, browser/ai_chat/ai_chat_ui_browsertest.cc: Added new browser tests for AI Chat print preview functionality.
  • browser/ai_chat/page_content_fetcher_browsertest.cc: Removed print preview related tests as they were moved to ai_chat_ui_browsertest.cc.
  • browser/ui/BUILD.gn, browser/ui/webui/ai_chat/ai_chat_ui_page_handler.cc, browser/ui/webui/ai_chat/ai_chat_ui_page_handler.h, browser/ui/webui/ai_chat/print_preview_extractor.cc, browser/ui/webui/ai_chat/print_preview_extractor.h: Implemented the print preview extractor which creates print previews, converts PDF pages to bitmaps, and extracts text using OCR.
  • Several patches to hook into Chromium's print preview flow and expose necessary interfaces.
  • components/ai_chat/content/browser/ai_chat_tab_helper.cc, components/ai_chat/content/browser/ai_chat_tab_helper.h, components/ai_chat/content/browser/page_content_fetcher.cc: Added logic to trigger print preview based text extraction for certain document hosts.
  • components/ai_chat/core/browser/conversation_driver.cc, components/ai_chat/core/browser/conversation_driver.h: Added OnPrintPreviewRequested() observer method.
  • components/ai_chat/core/browser/constants.cc, components/ai_chat/core/browser/constants.h, components/ai_chat/core/browser/utils.cc, components/ai_chat/core/browser/utils.h: Moved some common constants and OCR utility functions.
  • services/printing/*: Added mojo interfaces and implementation for a PDF to bitmap converter service.
  • test/data/leo/*: Added test HTML files for print preview testing.

Security Considerations

  • The print preview extractor runs in the browser process and converts potentially untrusted web page content to PDF. Need to ensure the PDF library handles untrusted input securely. Low risk as Chromium's print preview does this conversion already.
  • OCR is performed on PDF page bitmaps. The OCR library needs to handle arbitrary image input securely. Low-medium risk depending on robustness of OCR implementation.
  • There are several new IPC interfaces added (e.g. PdfToBitmapConverter). Need to validate that the IPC bindings are secure and cannot be abused by the renderer. Low risk if using standard mojo binding security practices.

Let me know if you have any other questions! The PR looks good overall with some important new functionality. The main areas to double-check are around the new mojo IPC interfaces and handling of untrusted PDFs and images in the print preview extractor.

@darkdh darkdh merged commit 9d4cfea into master Apr 12, 2024
19 checks passed
@darkdh darkdh deleted the preview-extraction branch April 12, 2024 20:33
@github-actions github-actions bot added this to the 1.67.x - Nightly milestone Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants