Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations #22334

Merged
merged 8 commits into from Mar 21, 2024

Conversation

petemill
Copy link
Member

@petemill petemill commented Feb 27, 2024

Resolves brave/brave-browser#34945

  • Creates throttle to intercept a specific subresource request
  • Parses the subresource and shares transcript decision with existing JS global extraction method (which is also used for the initial page in the history)
  • Creates a new mojom interface for the renderer to send the new content detected back to the browser notify the browser that page-changing content was detected. Actual content sending from renderer will only occur when asked for by browser during a user-initiated AIChat event.
  • Tests for throttle and for transcript decision making

Submitter Checklist:

  • I confirm that no security/privacy review is needed and no other type of reviews are needed, or that I have requested them
  • There is a ticket for my issue
  • Used Github auto-closing keywords in the PR description above
  • Wrote a good PR/commit description
  • Squashed any review feedback or "fixup" commits before merge, so that history is a record of what happened in the repo, not your PR
  • Added appropriate labels (QA/Yes or QA/No; release-notes/include or release-notes/exclude; OS/...) to the associated issue
  • Checked the PR locally:
    • npm run test -- brave_browser_tests, npm run test -- brave_unit_tests wiki
    • npm run presubmit wiki, npm run gn_check, npm run tslint
  • Ran git rebase master (if needed)

Reviewer Checklist:

  • A security review is not needed, or a link to one is included in the PR description
  • New files have MPL-2.0 license header
  • Adequate test coverage exists to prevent regressions
  • Major classes, functions and non-trivial code blocks are well-commented
  • Changes in component dependencies are properly reflected in gn
  • Code follows the style guide
  • Test plan is specified in PR before merging

After-merge Checklist:

Test Plan:

Contains unit and browser tests.

  • Navigate to youtube.com
  • Navigate to a video (same-tab)
  • optional: ask Leo to summarize
  • Navigate to another video (same-tab)
  • ask Leo to summarize
    Observe that with this PR, the correct video content is summarised and without then only the video content from the initially-navigated-to youtube page is available to Leo.

@petemill petemill self-assigned this Feb 27, 2024
@petemill petemill requested a review from a team as a code owner February 27, 2024 10:11
@petemill petemill force-pushed the ai-chat-subresource-throttle branch 3 times, most recently from 4845e53 to ca507fa Compare February 28, 2024 08:54
@petemill
Copy link
Member Author

Creating security review


namespace {

constexpr uint32_t kReadBufferSize = 37000; // average subresource size
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't 100% sure on what to use for this part...

DVLOG(4) << "Not binding extractor host to non-main frame";
return;
}
auto* sender = content::WebContents::FromRenderFrameHost(rfh);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know you love auto but i can't infer what the type of sender is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? It's literally in rhs - content::WebContents::From.... That's why I've used auto here. It's unneccessary repetition in the same line.

DVLOG(1) << "Cannot bind extractor host, no valid WebContents";
return;
}
auto* tab_helper = AIChatTabHelper::FromWebContents(sender);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's webcontents*

components/ai_chat/renderer/page_content_extractor.cc Outdated Show resolved Hide resolved
// "page" change.
mojo::AssociatedRemote<mojom::PageContentExtractorHost> host;
render_frame()->GetRemoteAssociatedInterfaces()->GetInterface(&host);
if (host.is_bound()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to check if it's bound? GetInterface gurantees a bind, right? it's internally calling BindNewEndpointAndPassReceiver on a remote.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the condition

namespace ai_chat {

class AIChatResourceSnifferURLLoader
: public body_sniffer::BodySnifferURLLoader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an upcoming refactoring of the body sniffer, hopefully the new sniffer design is more logical than the current

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// Parse YT metadata json string and choose the most appropriate caption track
// url.
std::optional<std::string> ParseAndChooseCaptionTrackUrl(std::string& body);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the parameter should be either const reference or string_view

// Extract a caption url from an array of YT caption tracks, from the YT page
// API.
std::optional<std::string> ChooseCaptionTrackUrl(
base::Value::List* caption_tracks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const ?

@@ -0,0 +1,319 @@
// Copyright (c) 2024 The Brave Authors. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure, but it looks like browser tests would be easier to write, also they can test some near to real api calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following upstreams example from MimeSniffingThrottle. And these tests seem successful in testing the parts we care about - whether the throttle was created and whether the delegate is called as expected. I think the team prefers unit tests to browser tests, in general - less flakey and more performant. Any issue with this?

@petemill petemill force-pushed the ai-chat-subresource-throttle branch from c416d95 to 0a7a360 Compare March 14, 2024 05:31
// |mojom::PageContent|.
if (url.SchemeIsHTTPOrHTTPS() && base::Contains(kYouTubeHosts, url.host()) &&
base::EqualsCaseInsensitiveASCII(url.path(), kYouTubePlayerAPIPath)) {
VLOG(1) << __func__ << " Creating throttle for url: " << url.spec();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kind of logging should normally be removed before merge per chromium guidelines

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOLOG here, in prod builds +1

@petemill
Copy link
Member Author

@iefremov @boocmp perhaps we can merge this before #21792 as this is a P2 issue and will need to be uplifted

// |mojom::PageContent|.
if (url.SchemeIsHTTPOrHTTPS() && base::Contains(kYouTubeHosts, url.host()) &&
base::EqualsCaseInsensitiveASCII(url.path(), kYouTubePlayerAPIPath)) {
VLOG(1) << __func__ << " Creating throttle for url: " << url.spec();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOLOG here, in prod builds +1

Copy link

@bcaller bcaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We looked over the PR together

@iefremov
Copy link
Contributor

in general the PR looks good to me, pls fix nits. Also the amount of logging probably could be decreased. It seems Pavel is ok to merge it before the body sniffer refactoring

@petemill petemill force-pushed the ai-chat-subresource-throttle branch from 6f42779 to 2780596 Compare March 21, 2024 04:42
Copy link
Contributor

[puLL-Merge] - brave/brave-core@22334

Description

This PR makes changes to improve the content extraction for AI Chat conversations, particularly for YouTube videos. It introduces a new resource throttle that intercepts specific YouTube API requests and parses out caption track URLs. This allows AI Chat to get up-to-date caption data even when the page content doesn't change via navigation.

Changes

Changes

browser/brave_content_browser_client.cc:

  • Registers new Mojo interfaces for the AI Chat page content extractor host

chromium_src/chrome/renderer/chrome_content_renderer_client.cc:

  • Initializes the AI Chat page content extractor in the renderer process if AI Chat is enabled and not in incognito mode

components/ai_chat/content/browser/ai_chat_tab_helper.cc|h:

  • Adds ability to bind the page content extractor host
  • Handles intercepted page content change events from the renderer

components/ai_chat/content/browser/page_content_fetcher.cc:

  • Refactors the PageContentFetcher class to take the URLLoaderFactory in the constructor instead of Start methods

components/ai_chat/core/browser/conversation_driver.cc|h:

  • Adds OnPageContentUpdated method to handle out-of-band page content updates from the renderer

components/ai_chat/core/common/mojom/page_content_extractor.mojom:

  • Defines new PageContentExtractorHost interface for renderer->browser communication

components/ai_chat/renderer/*:

  • Implements the AI Chat resource throttle to intercept YouTube player API requests
  • Parses the caption track URLs out of the YouTube player API response
  • Sends the extracted content to the browser process via the PageContentExtractorHost interface

renderer/brave_url_loader_throttle_provider_impl.cc:

  • Instantiates the AI Chat resource throttle for YouTube player API requests if AI Chat is enabled

Security Hotspots

No major security risks identified. The main additions are:

  1. New Mojo interfaces for renderer->browser communication of extracted page content. These follow standard Chrome practices.
  2. Resource throttle to inspect YouTube API responses. This is limited to specific YouTube URLs to avoid unnecessary overhead. The content extraction doesn't involve untrusted inputs.

The changes look reasonable from a security perspective. As always, parsing of untrusted content like web pages and APIs responses should be done cautiously. The existing unit tests help validate the safety of the parsing logic.

@petemill petemill force-pushed the ai-chat-subresource-throttle branch 2 times, most recently from 55b8b55 to f36d8ff Compare March 21, 2024 05:21
@petemill petemill force-pushed the ai-chat-subresource-throttle branch from f36d8ff to 2e9a5bf Compare March 21, 2024 05:23
@petemill
Copy link
Member Author

@iefremov nits fixed, please take a look. Logging decreased / changed to DVLOG where I can. I'll do a follow-up to change existing logs to DVLOG. They are very useful in general for debugging the endless stream of issues with web content since AI Chat is dependent on that, and a hassle to have to keep removing and adding.

@thypon
Copy link
Collaborator

thypon commented Mar 21, 2024

good to go @petemill @diracdeltas

@petemill petemill merged commit 5f8f514 into master Mar 21, 2024
19 checks passed
@petemill petemill deleted the ai-chat-subresource-throttle branch March 21, 2024 21:46
@github-actions github-actions bot added this to the 1.66.x - Nightly milestone Mar 21, 2024
petemill added a commit that referenced this pull request Mar 25, 2024
… metadata for same-page navigations (#22334)

* AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations

* optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
petemill added a commit that referenced this pull request Mar 25, 2024
… metadata for same-page navigations (#22334)

* AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations

* optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
@kjozwiak
Copy link
Member

kjozwiak commented Mar 26, 2024

Verification PASSED on Win 11 x64 using the following build(s):

Brave | 1.66.33 Chromium: 123.0.6312.58 (Official Build) nightly (64-bit)
-- | --
Revision | 47f11b3f5c715a0d5d551adb1b4028fd12c8dcca
OS | Windows 11 Version 23H2 (Build 22631.3296)

Using 1.66.24 Chromium: 123.0.6312.58 and the STR/Cases outlined via #22334 (comment), reproduced the issue where Leo wouldn't summarize YT videos correctly when you're summarizing several in a row. For example, in this case, it wouldn't summarize the video and started describing Leo (the feature) rather than using the videos transcripts to summarize what the user is currently watching:

reproducedLeo

Using the same STR/Cases mentioned above, verified that each YT video was being summarized correctly as per the following:

Example Example
workingLeo workingLeo2

Also ensured that the original issue that was described via brave/brave-browser#34945 (comment) wasn't occurring even though I couldn't reproduce the issue using 1.66.24 Chromium: 123.0.6312.58.

petemill added a commit that referenced this pull request Mar 26, 2024
… metadata for same-page navigations (#22334)

* AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations

* optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
kjozwiak pushed a commit that referenced this pull request Mar 26, 2024
…tect new content metadata for same-page navigations (#22744)

AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations (#22334)

* AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations

* optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
kjozwiak pushed a commit that referenced this pull request Mar 26, 2024
…tect new content metadata for same-page navigations (#22745)

AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations (#22334)

* AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations

* optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants