-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RQ] Extracts Most Repeated Search Terms for use in Most Visited tiles
Segments and Keyword Search Terms which power the most visited sites and the zero-prefix suggestions in the omnibox respectively, are queried from different tables in the database and scored differently. In order to organically show the most repeated search terms along with the most visited segments/sites in the MV tiles, the search terms and the segments can continue to be queried differently, but scored and ranked consistently to be ultimately merged into a single list of tiles. Segments are currently ranked using a version of frecency score which is accumulated across timeslots (i.e., visit days) given the visit count for the segment in those timeslots. To achieve a consistent score for the search terms, this CL introduces, 1) A simple utility enumerator class which is initialized by the URLDatabase to enumerate KeywordSearchTermVisits ordered first by |normalized_search_term| then |last_visit_time| in ascending order. 2) A helper class that uses the enumerator to group the visits to unique search terms into timeslots and produce the final score for the search terms accumulated across the timeslots. Note that having the visits returned by the enumerator ordered first by the search term allows the helper class to efficiently aggregate the visits to unique search terms. Also having the visits ordered then from the oldest to the newest helps ensure the metadata associated with the most recent visit such as the |last_visit_time| are present in the aggregated search terms produced by the helper. In the future, the enumerator may be used by other helper classes to produce prefix and zero-prefix suggestions in the omnibox. Bug: 1317829 Change-Id: Id1aee371fb58a79bd000f388748ea330fe4ce57c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3611078 Commit-Queue: Mohamad Ahmadi <mahmadi@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Cr-Commit-Position: refs/heads/main@{#1001651}
- Loading branch information
Moe Ahmadi
authored and
Chromium LUCI CQ
committed
May 10, 2022
1 parent
425151a
commit 0587998
Showing
8 changed files
with
404 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
156 changes: 156 additions & 0 deletions
156
components/history/core/browser/keyword_search_term_util.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
// Copyright 2022 The Chromium Authors. All rights reserved. | ||
// Use of this source code is governed by a BSD-style license that can be | ||
// found in the LICENSE file. | ||
|
||
#include "components/history/core/browser/keyword_search_term_util.h" | ||
|
||
#include "base/time/time.h" | ||
#include "components/history/core/browser/keyword_search_term.h" | ||
|
||
namespace history { | ||
|
||
namespace { | ||
|
||
// Calculates the score for the given number of visits in a given day. | ||
// Recent visits count more than historical ones, so we multiply in a boost | ||
// depending on how long ago this day was. This boost is a curve that | ||
// smoothly goes through these values: Today gets 3x, a week ago 2x, three | ||
// weeks ago 1.5x, falling off to 1x at the limit of how far we reach into | ||
// the past. | ||
double GetMostVisitedFrecencyScore(int visit_count, | ||
base::Time day, | ||
base::Time now) { | ||
double day_score = 1.0 + log(static_cast<double>(visit_count)); | ||
int days_ago = (now - day).InDays(); | ||
double recency_boost = 1.0 + (2.0 * (1.0 / (1.0 + days_ago / 7.0))); | ||
return recency_boost * day_score; | ||
} | ||
|
||
// Returns whether two search terms are identical, i.e., have the same | ||
// normalized search terms. | ||
bool IsSameSearchTerm(const KeywordSearchTermVisit& search_term, | ||
const KeywordSearchTermVisit& other_search_term) { | ||
return search_term.normalized_term == other_search_term.normalized_term; | ||
} | ||
|
||
// Transforms a visit time to its timeslot, i.e., day of the viist. | ||
base::Time VisitTimeToTimeslot(base::Time visit_time) { | ||
return visit_time.LocalMidnight(); | ||
} | ||
|
||
// Returns whether two search term visits are in the same timeslot. | ||
bool IsSameTimeslot(const KeywordSearchTermVisit& search_term, | ||
const KeywordSearchTermVisit& other_search_term) { | ||
return VisitTimeToTimeslot(search_term.last_visit_time) == | ||
VisitTimeToTimeslot(other_search_term.last_visit_time); | ||
} | ||
|
||
} // namespace | ||
|
||
// MostRepeatedSearchTermHelper ------------------------------------------------ | ||
|
||
// A helper class to return keyword search terms with frecency scores | ||
// accumulated across days for use in the Most Visited tiles. | ||
class MostRepeatedSearchTermHelper { | ||
public: | ||
MostRepeatedSearchTermHelper() = default; | ||
|
||
MostRepeatedSearchTermHelper(const MostRepeatedSearchTermHelper&) = delete; | ||
MostRepeatedSearchTermHelper& operator=(const MostRepeatedSearchTermHelper&) = | ||
delete; | ||
|
||
~MostRepeatedSearchTermHelper() = default; | ||
|
||
// |enumerator| enumerates keyword search term visits from the URLDatabase. | ||
// |now| is the time used to score the search term. | ||
std::unique_ptr<KeywordSearchTermVisit> GetNextSearchTermFromEnumerator( | ||
KeywordSearchTermVisitEnumerator& enumerator, | ||
base::Time now) { | ||
// |next_search_term| acts as the fast pointer and |last_search_term_| acts | ||
// as the slow pointer accumulating the search term score across visits. | ||
while (auto next_search_term = enumerator.GetNextVisit()) { | ||
bool is_same_search_term = | ||
last_search_term_ && | ||
IsSameSearchTerm(*next_search_term, *last_search_term_); | ||
if (is_same_search_term && | ||
IsSameTimeslot(*next_search_term, *last_search_term_)) { | ||
// We are in the same timeslot for the same search term: | ||
// 1. Move |last_search_term_| forward. | ||
// 2. Add up the search term visit count in the timeslot. | ||
// 3. Carry over the search term score. | ||
int visit_count = last_search_term_->visit_count; | ||
double score = last_search_term_->score.value_or(0.0); | ||
last_search_term_ = std::move(next_search_term); | ||
last_search_term_->visit_count += visit_count; | ||
last_search_term_->score = | ||
last_search_term_->score.value_or(0.0) + score; | ||
|
||
} else if (is_same_search_term) { | ||
// We are in a new timeslot for the same search term: | ||
// 1. Update the search term score by adding the last timeslot's score. | ||
// 2. Move |last_search_term_| forward. | ||
// 3. Carry over the search term score. | ||
double score = | ||
last_search_term_->score.value_or(0.0) + | ||
GetMostVisitedFrecencyScore( | ||
last_search_term_->visit_count, | ||
VisitTimeToTimeslot(last_search_term_->last_visit_time), now); | ||
last_search_term_ = std::move(next_search_term); | ||
last_search_term_->score = score; | ||
|
||
} else if (last_search_term_) { | ||
// We encountered a new search term and |last_search_term_| has a value: | ||
// 1. Update the search term score by adding the last timeslot's score. | ||
// 2. Move |last_search_term_| forward. | ||
// 3. Return the old |last_search_term_|. | ||
double score = | ||
last_search_term_->score.value_or(0.0) + | ||
GetMostVisitedFrecencyScore( | ||
last_search_term_->visit_count, | ||
VisitTimeToTimeslot(last_search_term_->last_visit_time), now); | ||
last_search_term_->score = score; | ||
auto search_term_to_return = std::move(last_search_term_); | ||
last_search_term_ = std::move(next_search_term); | ||
return search_term_to_return; | ||
} else { | ||
// We encountered a new search term and |last_search_term_| has no | ||
// value: | ||
// 1. Move |last_search_term_| forward. | ||
last_search_term_ = std::move(next_search_term); | ||
} | ||
} | ||
|
||
// |last_search_term_| has a value: | ||
// 1. Update the search term score by adding the last timeslot's score. | ||
if (last_search_term_) { | ||
double score = | ||
last_search_term_->score.value_or(0.0) + | ||
GetMostVisitedFrecencyScore( | ||
last_search_term_->visit_count, | ||
VisitTimeToTimeslot(last_search_term_->last_visit_time), now); | ||
last_search_term_->score = score; | ||
} | ||
|
||
return last_search_term_ ? std::move(last_search_term_) : nullptr; | ||
} | ||
|
||
// The last seen search term. | ||
std::unique_ptr<KeywordSearchTermVisit> last_search_term_; | ||
}; | ||
|
||
void GetMostRepeatedSearchTermsFromEnumerator( | ||
KeywordSearchTermVisitEnumerator& enumerator, | ||
std::vector<std::unique_ptr<KeywordSearchTermVisit>>* search_terms) { | ||
MostRepeatedSearchTermHelper helper; | ||
const base::Time now = base::Time::Now(); | ||
while (auto search_term = | ||
helper.GetNextSearchTermFromEnumerator(enumerator, now)) { | ||
search_terms->push_back(std::move(search_term)); | ||
} | ||
// Order the search terms by descending frecency scores. | ||
std::stable_sort( | ||
search_terms->begin(), search_terms->end(), | ||
[](const auto& a, const auto& b) { return a->score > b->score; }); | ||
} | ||
|
||
} // namespace history |
27 changes: 27 additions & 0 deletions
27
components/history/core/browser/keyword_search_term_util.h
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
// Copyright 2022 The Chromium Authors. All rights reserved. | ||
// Use of this source code is governed by a BSD-style license that can be | ||
// found in the LICENSE file. | ||
|
||
#ifndef COMPONENTS_HISTORY_CORE_BROWSER_KEYWORD_SEARCH_TERM_UTIL_H_ | ||
#define COMPONENTS_HISTORY_CORE_BROWSER_KEYWORD_SEARCH_TERM_UTIL_H_ | ||
|
||
#include <memory> | ||
#include <vector> | ||
|
||
namespace history { | ||
|
||
class KeywordSearchTermVisitEnumerator; | ||
struct KeywordSearchTermVisit; | ||
|
||
// Returns keyword search terms ordered by descending frecency scores | ||
// accumulated across days for use in the Most Visited tiles. |enumerator| | ||
// enumerates keyword search term visits from the URLDatabase. It must return | ||
// visits ordered first by |normalized_term| and then by |last_visit_time| in | ||
// ascending order, i.e., from the oldest to the newest. | ||
void GetMostRepeatedSearchTermsFromEnumerator( | ||
KeywordSearchTermVisitEnumerator& enumerator, | ||
std::vector<std::unique_ptr<KeywordSearchTermVisit>>* search_terms); | ||
|
||
} // namespace history | ||
|
||
#endif // COMPONENTS_HISTORY_CORE_BROWSER_KEYWORD_SEARCH_TERM_UTIL_H_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.