New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations #22334
Conversation
4845e53
to
ca507fa
Compare
Creating security review |
|
||
namespace { | ||
|
||
constexpr uint32_t kReadBufferSize = 37000; // average subresource size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't 100% sure on what to use for this part...
DVLOG(4) << "Not binding extractor host to non-main frame"; | ||
return; | ||
} | ||
auto* sender = content::WebContents::FromRenderFrameHost(rfh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know you love auto
but i can't infer what the type of sender
is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? It's literally in rhs - content::WebContents::From...
. That's why I've used auto
here. It's unneccessary repetition in the same line.
DVLOG(1) << "Cannot bind extractor host, no valid WebContents"; | ||
return; | ||
} | ||
auto* tab_helper = AIChatTabHelper::FromWebContents(sender); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it's webcontents*
// "page" change. | ||
mojo::AssociatedRemote<mojom::PageContentExtractorHost> host; | ||
render_frame()->GetRemoteAssociatedInterfaces()->GetInterface(&host); | ||
if (host.is_bound()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to check if it's bound? GetInterface
gurantees a bind, right? it's internally calling BindNewEndpointAndPassReceiver
on a remote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the condition
namespace ai_chat { | ||
|
||
class AIChatResourceSnifferURLLoader | ||
: public body_sniffer::BodySnifferURLLoader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an upcoming refactoring of the body sniffer, hopefully the new sniffer design is more logical than the current
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i've forgotten to link https://github.com/brave/brave-core/pull/21792/files
|
||
// Parse YT metadata json string and choose the most appropriate caption track | ||
// url. | ||
std::optional<std::string> ParseAndChooseCaptionTrackUrl(std::string& body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the parameter should be either const reference or string_view
// Extract a caption url from an array of YT caption tracks, from the YT page | ||
// API. | ||
std::optional<std::string> ChooseCaptionTrackUrl( | ||
base::Value::List* caption_tracks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const ?
@@ -0,0 +1,319 @@ | |||
// Copyright (c) 2024 The Brave Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure, but it looks like browser tests would be easier to write, also they can test some near to real api calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm following upstreams example from MimeSniffingThrottle
. And these tests seem successful in testing the parts we care about - whether the throttle was created and whether the delegate is called as expected. I think the team prefers unit tests to browser tests, in general - less flakey and more performant. Any issue with this?
c416d95
to
0a7a360
Compare
// |mojom::PageContent|. | ||
if (url.SchemeIsHTTPOrHTTPS() && base::Contains(kYouTubeHosts, url.host()) && | ||
base::EqualsCaseInsensitiveASCII(url.path(), kYouTubePlayerAPIPath)) { | ||
VLOG(1) << __func__ << " Creating throttle for url: " << url.spec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kind of logging should normally be removed before merge per chromium guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOLOG here, in prod builds +1
// |mojom::PageContent|. | ||
if (url.SchemeIsHTTPOrHTTPS() && base::Contains(kYouTubeHosts, url.host()) && | ||
base::EqualsCaseInsensitiveASCII(url.path(), kYouTubePlayerAPIPath)) { | ||
VLOG(1) << __func__ << " Creating throttle for url: " << url.spec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOLOG here, in prod builds +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We looked over the PR together
in general the PR looks good to me, pls fix nits. Also the amount of logging probably could be decreased. It seems Pavel is ok to merge it before the body sniffer refactoring |
… metadata for same-page navigations
…ai chat message is sent by the user
6f42779
to
2780596
Compare
[puLL-Merge] - brave/brave-core@22334 DescriptionThis PR makes changes to improve the content extraction for AI Chat conversations, particularly for YouTube videos. It introduces a new resource throttle that intercepts specific YouTube API requests and parses out caption track URLs. This allows AI Chat to get up-to-date caption data even when the page content doesn't change via navigation. ChangesChangesbrowser/brave_content_browser_client.cc:
chromium_src/chrome/renderer/chrome_content_renderer_client.cc:
components/ai_chat/content/browser/ai_chat_tab_helper.cc|h:
components/ai_chat/content/browser/page_content_fetcher.cc:
components/ai_chat/core/browser/conversation_driver.cc|h:
components/ai_chat/core/common/mojom/page_content_extractor.mojom:
components/ai_chat/renderer/*:
renderer/brave_url_loader_throttle_provider_impl.cc:
Security HotspotsNo major security risks identified. The main additions are:
The changes look reasonable from a security perspective. As always, parsing of untrusted content like web pages and APIs responses should be done cautiously. The existing unit tests help validate the safety of the parsing logic. |
55b8b55
to
f36d8ff
Compare
f36d8ff
to
2e9a5bf
Compare
@iefremov nits fixed, please take a look. Logging decreased / changed to DVLOG where I can. I'll do a follow-up to change existing logs to DVLOG. They are very useful in general for debugging the endless stream of issues with web content since AI Chat is dependent on that, and a hassle to have to keep removing and adding. |
good to go @petemill @diracdeltas |
… metadata for same-page navigations (#22334) * AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations * optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
… metadata for same-page navigations (#22334) * AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations * optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
Verification PASSED on
Using Using the same STR/Cases mentioned above, verified that each YT video was being summarized correctly as per the following:
Also ensured that the original issue that was described via brave/brave-browser#34945 (comment) wasn't occurring even though I couldn't reproduce the issue using |
… metadata for same-page navigations (#22334) * AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations * optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
…tect new content metadata for same-page navigations (#22744) AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations (#22334) * AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations * optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
…tect new content metadata for same-page navigations (#22745) AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations (#22334) * AI Chat: sniff subresource content via throttle to detect new content metadata for same-page navigations * optimization: don't parse yt metadata (or fetch transcript) until an ai chat message is sent by the user
Resolves brave/brave-browser#34945
send the new content detected back to the browsernotify the browser that page-changing content was detected. Actual content sending from renderer will only occur when asked for by browser during a user-initiated AIChat event.Submitter Checklist:
QA/Yes
orQA/No
;release-notes/include
orrelease-notes/exclude
;OS/...
) to the associated issuenpm run test -- brave_browser_tests
,npm run test -- brave_unit_tests
wikinpm run presubmit
wiki,npm run gn_check
,npm run tslint
git rebase master
(if needed)Reviewer Checklist:
gn
After-merge Checklist:
changes has landed on
Test Plan:
Contains unit and browser tests.
Observe that with this PR, the correct video content is summarised and without then only the video content from the initially-navigated-to youtube page is available to Leo.