Skip to content

Commit

Permalink
[ZPS][Journeys] Utilize hybrid strategy for ZPS/Journeys integration.
Browse files Browse the repository at this point in the history
The initial iteration of the "related searches" ZPS/Journeys integration
logic (as implemented by crrev.com/c/4090705) was able to successfully
overcome some of the flakiness associated with the traditional DOM
extraction approach by sourcing "related searches" from prefetched ZPS
response data.

One consequence of this approach was that the "related searches"
annotations provided by the ZPS prefetch logic would often completely
overwrite those supplied by the SRP DOM extraction flow. Based on
feedback from others, it turned out that the SRP-derived "related
searches", which were now being overwritten by the integration logic,
were often qualitatively better than those provided by the ZPS prefetch
flow.

After some further discussion, it was determined that the ideal approach
would be to augment (rather than overwrite) the set of SRP-derived
"related searches" using data obtained via the ZPS prefetching infra. As
such, this CL updates the current ZPS/Journeys integration to implement
this new strategy, thereby resulting in increased "related searches"
coverage in Journeys without having to sacrifice any annotations
obtained via DOM extraction on SRP.

Furthermore, prior to this CL, the ZPS prefetch flow could attempt to
add "related searches" annotations before the relevant visit was added
to the history DB. Given the async nature of both history DB updates and
ZPS prefetching, this resulted in a potential race condition which could
trigger annotation failures due to a missing entry in the history DB.
This CL eliminates this race condition by ensuring that the ZPS-derived
annotations are only added AFTER any SRP-derived annotations.

BEFORE/AFTER screenshots:

  - Query = "tom cruise": https://drive.google.com/drive/folders/1va_4RP5NGby9ERQx0GjtUtEfh76NOVJ3?resourcekey=0-3MPRDgGnI_nPF3mbvnGzgQ&usp=sharing
  - Query = "turtle": https://drive.google.com/drive/folders/1rHOIPvub4_qU3EvHA8Cat3dA5I63lfFV?resourcekey=0-F0ljShpwKDFAXC3wPyHv_Q&usp=sharing

Bug: 1378776
Change-Id: I7720912d77980215b6489e7d6a0692a570d649e0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4764885
Commit-Queue: Khalid Peer <khalidpeer@chromium.org>
Reviewed-by: Sophie Chang <sophiechang@chromium.org>
Code-Coverage: findit-for-me@appspot.gserviceaccount.com <findit-for-me@appspot.gserviceaccount.com>
Reviewed-by: Moe Ahmadi <mahmadi@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1187829}
  • Loading branch information
Khalid Peer authored and Chromium LUCI CQ committed Aug 24, 2023
1 parent 7757ff2 commit 6183d7f
Show file tree
Hide file tree
Showing 8 changed files with 207 additions and 104 deletions.
1 change: 1 addition & 0 deletions components/optimization_guide/content/browser/BUILD.gn
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ static_library("browser") {
"//components/optimization_guide/core",
"//components/optimization_guide/core:entities",
"//components/optimization_guide/proto:optimization_guide_proto",
"//components/search",
"//components/search_engines",
"//content/public/browser",
"//services/metrics/public/cpp:metrics_cpp",
Expand Down
1 change: 1 addition & 0 deletions components/optimization_guide/content/browser/DEPS
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ include_rules = [
"+components/omnibox/common",
"+components/optimization_guide/core",
"+components/optimization_guide/proto",
"+components/search",
"+components/search_engines",
"+components/ukm",
"+content/public/browser",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include "base/barrier_closure.h"
#include "base/containers/adapters.h"
#include "base/i18n/case_conversion.h"
#include "base/metrics/histogram_functions.h"
#include "base/metrics/histogram_macros_local.h"
#include "base/ranges/algorithm.h"
Expand All @@ -29,6 +30,7 @@
#include "components/optimization_guide/core/optimization_guide_logger.h"
#include "components/optimization_guide/core/optimization_guide_model_provider.h"
#include "components/optimization_guide/core/optimization_guide_switches.h"
#include "components/search/search.h"
#include "services/metrics/public/cpp/metrics_utils.h"
#include "services/metrics/public/cpp/ukm_builders.h"
#include "services/metrics/public/cpp/ukm_recorder.h"
Expand Down Expand Up @@ -88,6 +90,13 @@ void LogRelatedSearchesExtracted(bool success) {
success);
}

void LogRelatedSearchesCacheHit(bool cache_hit) {
base::UmaHistogramBoolean(
"OptimizationGuide.PageContentAnnotationsService.RelatedSearchesCache."
"CacheHit",
cache_hit);
}

#if BUILDFLAG(BUILD_WITH_TFLITE_LIB)
// Record the visibility score of the provided visit as a RAPPOR-style record to
// UKM.
Expand Down Expand Up @@ -134,6 +143,30 @@ void MaybeRecordVisibilityUKM(
}
#endif /* BUILDFLAG(BUILD_WITH_TFLITE_LIB) */

// Generates the canonical URL associated with the the given search |url|.
// |template_url_service| must not be null.
//
// In the context of "related searches" annotation, the canonical
// search URL computed by this function is used as a cache key to ensure that
// the cache entry written by the ZPS prefetch flow can be properly read by the
// SRP DOM extraction flow. We cannot directly use the SRP URL as a cache key
// because the initial URL obtained during prefetch differs from the final URL
// obtained once navigation has been committed (i.e. it contains extraneous URL
// params), even though both URLs are referring to the same logical SRP visit.
std::string GetCanonicalSearchURL(const GURL& url,
TemplateURLService* template_url_service) {
const TemplateURL* default_provider =
template_url_service->GetDefaultSearchProvider();

GURL canonical_search_url;
default_provider->KeepSearchTermsInURL(
url, template_url_service->search_terms_data(),
/*keep_search_intent_params=*/true, /*normalize_search_terms=*/true,
&canonical_search_url);

return canonical_search_url.spec();
}

} // namespace

PageContentAnnotationsService::PageContentAnnotationsService(
Expand All @@ -154,6 +187,7 @@ PageContentAnnotationsService::PageContentAnnotationsService(
history_service_(history_service),
template_url_service_(template_url_service),
zero_suggest_cache_service_(zero_suggest_cache_service),
prefetched_related_searches_(features::MaxRelatedSearchesCacheSize()),
last_annotated_history_visits_(
features::MaxContentAnnotationRequestsCached()),
missing_title_visits_by_url_(
Expand Down Expand Up @@ -478,10 +512,6 @@ void PageContentAnnotationsService::RequestAndNotifyWhenModelAvailable(
void PageContentAnnotationsService::ExtractRelatedSearches(
const HistoryVisit& visit,
content::WebContents* web_contents) {
if (ShouldExtractRelatedSearchesFromZPSCache()) {
return;
}

search_result_extractor_client_.RequestData(
web_contents, {continuous_search::mojom::ResultType::kRelatedSearches},
base::BindOnce(&PageContentAnnotationsService::OnRelatedSearchesExtracted,
Expand Down Expand Up @@ -548,7 +578,9 @@ void PageContentAnnotationsService::OnPageContentAnnotated(
bool PageContentAnnotationsService::ShouldExtractRelatedSearchesFromZPSCache() {
return base::FeatureList::IsEnabled(
features::kExtractRelatedSearchesFromPrefetchedZPSResponse) &&
autocomplete_provider_client_ && zero_suggest_cache_service_;
autocomplete_provider_client_ &&
search::DefaultSearchProviderIsGoogle(template_url_service_) &&
zero_suggest_cache_service_;
}

void PageContentAnnotationsService::OnZeroSuggestResponseUpdated(
Expand All @@ -562,28 +594,9 @@ void PageContentAnnotationsService::OnZeroSuggestResponseUpdated(
return;
}

history_service_->QueryURL(
GURL(page_url), /*want_visits=*/true,
base::BindOnce(&PageContentAnnotationsService::
ExtractRelatedSearchesFromZeroSuggestResponse,
weak_ptr_factory_.GetWeakPtr(), response),
&history_service_task_tracker_);
}

void PageContentAnnotationsService::
ExtractRelatedSearchesFromZeroSuggestResponse(
const ZeroSuggestCacheService::CacheEntry& response,
history::QueryURLResult url_result) {
if (!url_result.success || url_result.visits.empty()) {
LogPageContentAnnotationsStorageStatus(
PageContentAnnotationsStorageStatus::kNoVisitsForUrl,
PageContentAnnotationsType::kRelatedSearches);
return;
}

AutocompleteInput input(u"", metrics::OmniboxEventProto::JOURNEYS,
autocomplete_provider_client_->GetSchemeClassifier());
auto suggest_results =
const auto suggest_results =
response.GetSuggestResults(input, *autocomplete_provider_client_);

std::vector<std::string> related_searches;
Expand All @@ -599,32 +612,43 @@ void PageContentAnnotationsService::
}

if (related_searches.empty()) {
LogRelatedSearchesExtracted(false);
return;
}

LogRelatedSearchesExtracted(true);

auto visit_id = url_result.visits.front().visit_id;
history_service_->AddRelatedSearchesForVisit(related_searches, visit_id);

LogPageContentAnnotationsStorageStatus(
PageContentAnnotationsStorageStatus::kSuccess,
PageContentAnnotationsType::kRelatedSearches);
prefetched_related_searches_.Put(
GetCanonicalSearchURL(GURL(page_url), template_url_service_),
related_searches);
}

void PageContentAnnotationsService::OnRelatedSearchesExtracted(
const HistoryVisit& visit,
continuous_search::SearchResultExtractorClientStatus status,
continuous_search::mojom::CategoryResultsPtr results) {
// Fetch any cached "related searches" data obtained via ZPS prefetch.
std::vector<std::string> related_searches_from_zps_prefetch;
if (ShouldExtractRelatedSearchesFromZPSCache()) {
bool found = false;
const auto it = prefetched_related_searches_.Get(
GetCanonicalSearchURL(visit.url, template_url_service_));
if (it != prefetched_related_searches_.end()) {
related_searches_from_zps_prefetch = it->second;
found = true;
prefetched_related_searches_.Erase(it);
}
LogRelatedSearchesCacheHit(found);
}

const bool success =
status == continuous_search::SearchResultExtractorClientStatus::kSuccess;
status ==
continuous_search::SearchResultExtractorClientStatus::kSuccess ||
!related_searches_from_zps_prefetch.empty();
LogRelatedSearchesExtracted(success);

if (!success) {
return;
}

// Construct `related_searches` using data obtained from SRP DOM extraction.
std::vector<std::string> related_searches;
for (const auto& group : results->groups) {
if (group->type != continuous_search::mojom::ResultType::kRelatedSearches) {
Expand All @@ -639,6 +663,11 @@ void PageContentAnnotationsService::OnRelatedSearchesExtracted(
break;
}

// Augment `related_searches` using data obtained via ZPS prefetch.
for (const auto& search_query : related_searches_from_zps_prefetch) {
related_searches.push_back(search_query);
}

if (related_searches.empty()) {
return;
}
Expand All @@ -647,6 +676,12 @@ void PageContentAnnotationsService::OnRelatedSearchesExtracted(
return;
}

AddRelatedSearchesForVisit(visit, related_searches);
}

void PageContentAnnotationsService::AddRelatedSearchesForVisit(
const HistoryVisit& visit,
const std::vector<std::string>& related_searches) {
QueryURL(visit,
base::BindOnce(&history::HistoryService::AddRelatedSearchesForVisit,
history_service_->AsWeakPtr(), related_searches),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,14 @@ class PageContentAnnotationsService : public KeyedService,
return optimization_guide_logger_;
}

protected:
// Callback invoked when related searches have been extracted for |visit|.
// protected instead of private for testing purposes.
void OnRelatedSearchesExtracted(
const HistoryVisit& visit,
continuous_search::SearchResultExtractorClientStatus status,
continuous_search::mojom::CategoryResultsPtr results);

private:
#if BUILDFLAG(BUILD_WITH_TFLITE_LIB)
// Callback invoked when a single |visit| has been annotated.
Expand Down Expand Up @@ -283,11 +291,13 @@ class PageContentAnnotationsService : public KeyedService,
virtual void ExtractRelatedSearches(const HistoryVisit& visit,
content::WebContents* web_contents);

// Callback invoked when related searches have been extracted for |visit|.
void OnRelatedSearchesExtracted(
// Annotates the provided `visit` in the history DB with the given list of
// `related_searches`.
//
// Virtualized for testing.
virtual void AddRelatedSearchesForVisit(
const HistoryVisit& visit,
continuous_search::SearchResultExtractorClientStatus status,
continuous_search::mojom::CategoryResultsPtr results);
const std::vector<std::string>& related_searches);

// Persist |page_entities_metadata| for |visit| in |history_service_|.
//
Expand Down Expand Up @@ -378,8 +388,15 @@ class PageContentAnnotationsService : public KeyedService,
base::ScopedObservation<ZeroSuggestCacheService,
PageContentAnnotationsService>
zero_suggest_cache_service_observation_{this};
// The client of continuous_search::mojom::SearchResultExtractor interface
// used for extracting data from the main frame of Google SRP |web_contents|.
// A LRU cache mapping each SRP URL to the associated set of "related
// searches" which were obtained via ZPS prefetch logic. This cache is used to
// coordinate the SRP DOM extraction and ZPS prefetch flows to ensure that the
// appropriate history visit is targeted for annotation.
base::HashingLRUCache<std::string, std::vector<std::string>>
prefetched_related_searches_;
// The client of continuous_search::mojom::SearchResultExtractor
// interface used for extracting data from the main frame of Google SRP
// |web_contents|.
continuous_search::SearchResultExtractorClient
search_result_extractor_client_;
// A LRU Cache keeping track of the visits that have been requested for
Expand Down

0 comments on commit 6183d7f

Please sign in to comment.