Skip to content

Commit

Permalink
[Merge 103] [journeys] Make SimilarVisitDeduperClusterFinalizer use u…
Browse files Browse the repository at this point in the history
…rl_for_deduping

In a previous patch, we made SimilarVisitDeduperClusterFinalizer use
url_for_display instead of url_for_deduping:
https://chromium-review.googlesource.com/c/chromium/src/+/3646808

That was not the way to go.

Instead, this CL restores the usage of url_for_deduping, as well as
increases the aggression of url_for_deduping to ALSO strip way the URL
query, so that URLs that differ only by the query part may also be
deduped by SimilarVisitDeduperClusterFinalizer, assuming that the page
title differs.

I added more commentary too, to explain that so long as
url_for_deduping is strictly more aggressive than url_for_display, we
should not display any identical rows in the UI.

(cherry picked from commit d60f7bf)

Bug: 1325154
Change-Id: I105f9e74e98924deddf3c2481517962cd12a6b15
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3645237
Reviewed-by: Sophie Chang <sophiechang@chromium.org>
Commit-Queue: Tommy Li <tommycli@chromium.org>
Cr-Original-Commit-Position: refs/heads/main@{#1003896}
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3652251
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Reviewed-by: Tommy Li <tommycli@chromium.org>
Auto-Submit: Tommy Li <tommycli@chromium.org>
Cr-Commit-Position: refs/branch-heads/5060@{#64}
Cr-Branched-From: b83393d-refs/heads/main@{#1002911}
  • Loading branch information
Tommy Li authored and Chromium LUCI CQ committed May 18, 2022
1 parent 3c9efd5 commit db97c93
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 11 deletions.
13 changes: 8 additions & 5 deletions components/history/core/browser/history_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -843,11 +843,14 @@ struct ClusterVisit {
// should not be used by the UI.
float engagement_score = 0.0;

// The visit URL modified for better dupe finding. The result may not be
// navigable or even valid; it's only meant to be used for detecting
// duplicates. This is similar in intent to
// `AutocompleteMatch::stripped_destination_url`, but is not the same, as
// History Clusters and Omnibox have different deduping requirements.
// The visit URL stripped down for aggressive deduping. This GURL may not be
// navigable or even valid. The stripping on `url_for_deduping` must be
// strictly more aggressive than on `url_for_display`. This ensures that the
// UI never shows two visits that look completely identical.
//
// The stripping is so aggressive that the URL should not be used alone for
// deduping. See `SimilarVisitDeDeduperClusterFinalizer` for an example usage
// that combines this with the page title as a deduping key.
GURL url_for_deduping;

// The normalized URL for the visit (i.e. a SRP URL normalized based on the
Expand Down
8 changes: 7 additions & 1 deletion components/history_clusters/core/history_clusters_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ void PromoteMatchingVisitsAboveNonMatchingVisits(

GURL ComputeURLForDeduping(const GURL& url) {
// Below is a simplified version of `AutocompleteMatch::GURLToStrippedGURL`
// that is thread-safe, stateless, never hits the disk, a bit more aggressive,
// that is thread-safe, stateless, never hits the disk, a lot more aggressive,
// and without a dependency on omnibox components.
if (!url.is_valid())
return url;
Expand All @@ -142,6 +142,12 @@ GURL ComputeURLForDeduping(const GURL& url) {
if (url_for_deduping.SchemeIs(url::kHttpScheme))
replacements.SetSchemeStr(url::kHttpsScheme);

// It's unusual to clear the query for deduping, because it's normally
// considered a meaningful part of the URL. However, the return value of this
// function isn't used naively. Details at `ClusterVisit::url_for_deduping`.
if (url.has_query())
replacements.ClearQuery();

if (url.has_ref())
replacements.ClearRef();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ TEST(HistoryClustersUtilTest, ComputeURLForDeduping) {
<< "Normalizes scheme to https.";
EXPECT_EQ(
ComputeURLForDeduping(GURL("https://google.com/path?foo=bar#reftag")),
"https://google.com/path?foo=bar")
<< "Strips ref, leaves path and query.";
"https://google.com/path")
<< "Strips ref and query, leaves path.";
EXPECT_EQ(
ComputeURLForDeduping(GURL("http://www.google.com/path?foo=bar#reftag")),
"https://google.com/path?foo=bar")
"https://google.com/path")
<< "Does all of the above at once.";
EXPECT_EQ(ComputeURLForDeduping(GURL("https://google.com/path?foo=bar")),
"https://google.com/path?foo=bar")
EXPECT_EQ(ComputeURLForDeduping(GURL("https://google.com/path")),
"https://google.com/path")
<< "Sanity check when no replacements needed.";
}

Expand Down

0 comments on commit db97c93

Please sign in to comment.