Skip to content

Commit

Permalink
[Omnibox] Local history suggest revamp (part 1)
Browse files Browse the repository at this point in the history
Based on the proposal in go/chrome-local-history-suggest-revamp, this
CL introduces a utility function along with a companion utility helper
class to return keyword search terms for use as (zero-)prefix suggestion
in the omnibox. It builds on top of the KeywordSearchTermVisitEnumerator
introduced in crrev.com/c/3611078 which is created by the URLDatabase to
enumerate KeywordSearchTermVisits (KSTVs) ordered first by
|normalized_search_term| then |last_visit_time| in ascending order.

The new utility function uses the enumerator to accumulate the visit
counts for the visits to unique normalized search terms and returns a
final list which is ranked either by frecency (for zero-prefix) or
recency (for prefix) suggestions. This utility function will be used
behind a flag in LocalHistoryZeroSuggestProvider and SearchProvider in
a follow-up CL. In order to make that integration more straightforward
and avoid incurring unnecessary cost as a result of copying the KSTVs,
this CL makes KSTV not copyable/movable and changes the data types
currently returned from the URLDatabase to vectors of unique pointers.

Bug: 1119654
Change-Id: I493a00e8435436a09eb50a0191013a77899b1456
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3640223
Reviewed-by: Justin Donnelly <jdonnelly@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Commit-Queue: Mohamad Ahmadi <mahmadi@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1002351}
  • Loading branch information
Moe Ahmadi authored and Chromium LUCI CQ committed May 11, 2022
1 parent cf1637e commit 43b9054
Show file tree
Hide file tree
Showing 12 changed files with 478 additions and 119 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -613,7 +613,7 @@ class InMemoryHistoryBackendTest : public HistoryBackendTestBase {

size_t GetNumberOfMatchingSearchTerms(const int keyword_id,
const std::u16string& prefix) {
std::vector<KeywordSearchTermVisit> matching_terms;
std::vector<std::unique_ptr<KeywordSearchTermVisit>> matching_terms;
mem_backend_->db()->GetMostRecentKeywordSearchTerms(
keyword_id, prefix, 1, &matching_terms);
return matching_terms.size();
Expand Down
15 changes: 0 additions & 15 deletions components/history/core/browser/keyword_search_term.cc
Original file line number Diff line number Diff line change
Expand Up @@ -32,21 +32,6 @@ std::unique_ptr<KeywordSearchTermVisit> KeywordSearchTermVisitFromStatement(

} // namespace

KeywordSearchTermVisit::KeywordSearchTermVisit() = default;
KeywordSearchTermVisit::KeywordSearchTermVisit(
const KeywordSearchTermVisit& other) = default;
KeywordSearchTermVisit::~KeywordSearchTermVisit() = default;

double KeywordSearchTermVisit::GetFrecency(base::Time now,
int recency_decay_unit_sec,
double frequency_exponent) const {
const double recency_sec = base::TimeDelta(now - last_visit_time).InSeconds();
const double recency_decayed =
recency_decay_unit_sec / (recency_sec + recency_decay_unit_sec);
const double frequency_powered = pow(visit_count, frequency_exponent);
return frequency_powered * recency_decayed;
}

// KeywordSearchTermVisitEnumerator --------------------------------------------

std::unique_ptr<KeywordSearchTermVisit>
Expand Down
29 changes: 8 additions & 21 deletions components/history/core/browser/keyword_search_term.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,15 @@

namespace history {

// KeywordSearchTermVisit is returned from GetMostRecentKeywordSearchTerms()
// and contains either the search term and the normalized search term. It also
// contains the visit count, and the last visit time for either a single keyword
// visit or a set of keyword visits, depending on the overloaded functions it is
// returned from.
// Represents one or more visits to a keyword search term. It contains the
// search term and the normalized search term in addition to the visit count and
// the last visit time. An optional frecency score may be provided by the
// utility functions/helpers in keyword_search_term_util.h where applicable.
struct KeywordSearchTermVisit {
KeywordSearchTermVisit();
KeywordSearchTermVisit(const KeywordSearchTermVisit& other);
~KeywordSearchTermVisit();

// Returns the frecency score of the visit based on the following formula:
// (frequency ^ frequency_exponent) * recency_decay_unit_in_seconds
// frecency = ————————————————————————————————————————————————————————————————
// recency_in_seconds + recency_decay_unit_in_seconds
// This score combines frequency and recency of the visit favoring ones that
// are more frequent and more recent (see go/local-zps-frecency-ranking).
// `recency_decay_unit_sec` is the number of seconds until the recency
// component of the score decays to half. `frequency_exponent` is factor by
// which the frequency of the visit is exponentiated.
double GetFrecency(base::Time now,
int recency_decay_unit_sec,
double frequency_exponent) const;
KeywordSearchTermVisit() = default;
KeywordSearchTermVisit(const KeywordSearchTermVisit&) = delete;
KeywordSearchTermVisit& operator=(const KeywordSearchTermVisit&) = delete;
~KeywordSearchTermVisit() = default;

std::u16string term; // The search term that was used.
std::u16string normalized_term; // The search term, in lower case and with
Expand Down
116 changes: 116 additions & 0 deletions components/history/core/browser/keyword_search_term_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,16 @@ bool IsSameSearchTerm(const KeywordSearchTermVisit& search_term,
return search_term.normalized_term == other_search_term.normalized_term;
}

// Return whether a visit to a search term is a duplicative visit, i.e., a visit
// to the same search term in an interval smaller than
// kAutocompleteDuplicateVisitIntervalThreshold.
bool IsDuplicateVisit(const KeywordSearchTermVisit& search_term,
const KeywordSearchTermVisit& other_search_term) {
return IsSameSearchTerm(search_term, other_search_term) &&
(search_term.last_visit_time - other_search_term.last_visit_time <=
kAutocompleteDuplicateVisitIntervalThreshold);
}

// Transforms a visit time to its timeslot, i.e., day of the viist.
base::Time VisitTimeToTimeslot(base::Time visit_time) {
return visit_time.LocalMidnight();
Expand All @@ -47,6 +57,111 @@ bool IsSameTimeslot(const KeywordSearchTermVisit& search_term,

} // namespace

const base::TimeDelta kAutocompleteDuplicateVisitIntervalThreshold =
base::Minutes(5);

// Returns the frecency score of the visit based on the following formula:
// (frequency ^ kFrequencyExponent) * kRecencyDecayUnitSec
// frecency = ————————————————————————————————————————————————————————————————
// recency_in_seconds + kRecencyDecayUnitSec
double GetFrecencyScore(int visit_count,
base::Time visit_time,
base::Time now) {
// The number of seconds until the recency component decays by half.
constexpr base::TimeDelta kRecencyDecayUnitSec = base::Seconds(60);
// The factor by which the frequency component is exponentiated.
constexpr double kFrequencyExponent = 1.15;

const double recency_decayed =
kRecencyDecayUnitSec /
(base::TimeDelta(now - visit_time) + kRecencyDecayUnitSec);
const double frequency_powered = pow(visit_count, kFrequencyExponent);
return frequency_powered * recency_decayed;
}

// SearchTermHelper ------------------------------------------------------------

// A helper class to return keyword search terms with visit counts accumulated
// across visits for use as prefix or zero-prefix suggestions in the omnibox.
class SearchTermHelper {
public:
SearchTermHelper() = default;

SearchTermHelper(const SearchTermHelper&) = delete;
SearchTermHelper& operator=(const SearchTermHelper&) = delete;

~SearchTermHelper() = default;

// |enumerator| enumerates keyword search term visits from the URLDatabase.
// |ignore_duplicate_visits| specifies whether duplicative visits to a search
// term should be ignored.
std::unique_ptr<KeywordSearchTermVisit> GetNextSearchTermFromEnumerator(
KeywordSearchTermVisitEnumerator& enumerator,
bool ignore_duplicate_visits) {
// |next_search_term| acts as the fast pointer and |last_search_term_| acts
// as the slow pointer accumulating the search term visit count across
// visits.
while (auto next_search_term = enumerator.GetNextVisit()) {
if (ignore_duplicate_visits && last_search_term_ &&
IsDuplicateVisit(*next_search_term, *last_search_term_)) {
continue;
}

if (last_search_term_ &&
IsSameSearchTerm(*next_search_term, *last_search_term_)) {
// We encountered the same search term:
// 1. Move |last_search_term_| forward.
// 2. Add up the search term visit count.
int visit_count = last_search_term_->visit_count;
last_search_term_ = std::move(next_search_term);
last_search_term_->visit_count += visit_count;
} else if (last_search_term_) {
// We encountered a new search term and |last_search_term_| has a value:
// 1. Move |last_search_term_| forward.
// 2. Return the old |last_search_term_|.
auto search_term_to_return = std::move(last_search_term_);
last_search_term_ = std::move(next_search_term);
return search_term_to_return;
} else {
// We encountered a new search term and |last_search_term_| has no
// value:
// 1. Move |last_search_term_| forward.
last_search_term_ = std::move(next_search_term);
}
}

return last_search_term_ ? std::move(last_search_term_) : nullptr;
}

private:
// The last seen search term.
std::unique_ptr<KeywordSearchTermVisit> last_search_term_;
};

void GetAutocompleteSearchTermsFromEnumerator(
KeywordSearchTermVisitEnumerator& enumerator,
bool ignore_duplicate_visits,
SearchTermRankingPolicy ranking_policy,
std::vector<std::unique_ptr<KeywordSearchTermVisit>>* search_terms) {
SearchTermHelper helper;
const base::Time now = base::Time::Now();
while (auto search_term = helper.GetNextSearchTermFromEnumerator(
enumerator, ignore_duplicate_visits)) {
if (ranking_policy == SearchTermRankingPolicy::kFrecency) {
search_term->score = GetFrecencyScore(search_term->visit_count,
search_term->last_visit_time, now);
}
search_terms->push_back(std::move(search_term));
}
// Order the search terms by descending recency or frecency.
std::stable_sort(search_terms->begin(), search_terms->end(),
[&](const auto& a, const auto& b) {
return ranking_policy == SearchTermRankingPolicy::kFrecency
? a->score > b->score
: a->last_visit_time > b->last_visit_time;
});
}

// MostRepeatedSearchTermHelper ------------------------------------------------

// A helper class to return keyword search terms with frecency scores
Expand Down Expand Up @@ -134,6 +249,7 @@ class MostRepeatedSearchTermHelper {
return last_search_term_ ? std::move(last_search_term_) : nullptr;
}

private:
// The last seen search term.
std::unique_ptr<KeywordSearchTermVisit> last_search_term_;
};
Expand Down
39 changes: 39 additions & 0 deletions components/history/core/browser/keyword_search_term_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,50 @@
#include <memory>
#include <vector>

namespace base {
class Time;
class TimeDelta;
} // namespace base

namespace history {

class KeywordSearchTermVisitEnumerator;
struct KeywordSearchTermVisit;

enum class SearchTermRankingPolicy {
kRecency, // From the most recent to the least recent.
kFrecency // By descending frecency score calculated by |GetFrecencyScore|.
};

// The time interval within which a duplicate query is considered invalid for
// autocomplete purposes.
// These invalid duplicates are extracted from search query URLs which are
// identical or nearly identical to the original search query URL and issued too
// closely to it, i.e., within this time interval. They are typically recorded
// as a result of back/forward navigations or user interactions in the search
// result page and are likely not newly initiated searches.
extern const base::TimeDelta kAutocompleteDuplicateVisitIntervalThreshold;

// Returns a score combining frequency and recency of the visit favoring ones
// that are more frequent and more recent (see go/local-zps-frecency-ranking).
double GetFrecencyScore(int visit_count, base::Time visit_time, base::Time now);

// Returns keyword search terms ordered by descending recency or frecency scores
// for use as prefix or zero-prefix suggestions in the omnibox respectively.
// |enumerator| enumerates keyword search term visits from the URLDatabase. It
// must return visits ordered first by |normalized_term| and then by
// |last_visit_time| in ascending order, i.e., from the oldest to the newest.
// |ignore_duplicate_visits| specifies whether duplicative visits to a search
// term should be ignored. A duplicative visit is defined as a visit to the
// same search term in an interval smaller than
// kAutocompleteDuplicateVisitIntervalThreshold. |ranking_policy| specifies
// how the returned keyword search terms should be ordered.
void GetAutocompleteSearchTermsFromEnumerator(
KeywordSearchTermVisitEnumerator& enumerator,
bool ignore_duplicate_visits,
SearchTermRankingPolicy ranking_policy,
std::vector<std::unique_ptr<KeywordSearchTermVisit>>* search_terms);

// Returns keyword search terms ordered by descending frecency scores
// accumulated across days for use in the Most Visited tiles. |enumerator|
// enumerates keyword search term visits from the URLDatabase. It must return
Expand Down
72 changes: 55 additions & 17 deletions components/history/core/browser/url_database.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "base/time/time.h"
#include "components/database_utils/url_converter.h"
#include "components/history/core/browser/keyword_search_term.h"
#include "components/history/core/browser/keyword_search_term_util.h"
#include "components/url_formatter/url_formatter.h"
#include "sql/statement.h"
#include "url/gurl.h"
Expand Down Expand Up @@ -577,7 +578,7 @@ void URLDatabase::GetMostRecentKeywordSearchTerms(
KeywordID keyword_id,
const std::u16string& prefix,
int max_count,
std::vector<KeywordSearchTermVisit>* visits) {
std::vector<std::unique_ptr<KeywordSearchTermVisit>>* visits) {
// NOTE: the keyword_id can be zero if on first run the user does a query
// before the TemplateURLService has finished loading. As the chances of this
// occurring are small, we ignore it.
Expand Down Expand Up @@ -607,21 +608,61 @@ void URLDatabase::GetMostRecentKeywordSearchTerms(
statement.BindString16(2, next_prefix);
statement.BindInt(3, max_count);

KeywordSearchTermVisit visit;
while (statement.Step()) {
visit.term = statement.ColumnString16(0);
visit.normalized_term = statement.ColumnString16(1);
visit.visit_count = statement.ColumnInt(2);
visit.last_visit_time =
auto visit = std::make_unique<KeywordSearchTermVisit>();
visit->term = statement.ColumnString16(0);
visit->normalized_term = statement.ColumnString16(1);
visit->visit_count = statement.ColumnInt(2);
visit->last_visit_time =
base::Time::FromInternalValue(statement.ColumnInt64(3));
visits->push_back(visit);
visits->push_back(std::move(visit));
}
}

std::unique_ptr<KeywordSearchTermVisitEnumerator>
URLDatabase::CreateKeywordSearchTermVisitEnumerator(
KeywordID keyword_id,
const std::u16string& prefix) {
// NOTE: the keyword_id can be zero if on first run the user does a query
// before the TemplateURLService has finished loading. As the chances of this
// occurring are small, we ignore it.
if (!keyword_id)
return nullptr;

auto enumerator = base::WrapUnique<KeywordSearchTermVisitEnumerator>(
new KeywordSearchTermVisitEnumerator());
enumerator->statement_.Assign(GetDB().GetCachedStatement(SQL_FROM_HERE,
R"(
SELECT
kst.term,
kst.normalized_term,
u.visit_count,
u.last_visit_time
FROM
keyword_search_terms kst JOIN urls u ON kst.url_id = u.id
WHERE
kst.keyword_id = ? AND
kst.normalized_term >= ? AND
kst.normalized_term < ?
ORDER BY kst.normalized_term, u.last_visit_time
)"));
// Keep CollapseWhitespace() and ToLower() in sync with search_provider.cc.
std::u16string normalized_prefix =
base::CollapseWhitespace(base::i18n::ToLower(prefix), false);
// This magic gives us a prefix search.
std::u16string next_prefix = normalized_prefix;
next_prefix.back() = next_prefix.back() + 1;
enumerator->statement_.BindInt64(0, keyword_id);
enumerator->statement_.BindString16(1, normalized_prefix);
enumerator->statement_.BindString16(2, next_prefix);
enumerator->initialized_ = enumerator->statement_.is_valid();
return enumerator;
}

void URLDatabase::GetMostRecentKeywordSearchTerms(
KeywordID keyword_id,
base::Time age_threshold,
std::vector<KeywordSearchTermVisit>* visits) {
std::vector<std::unique_ptr<KeywordSearchTermVisit>>* visits) {
// NOTE: the keyword_id can be zero if on first run the user does a query
// before the TemplateURLService has finished loading. As the chances of this
// occurring are small, we ignore it.
Expand Down Expand Up @@ -672,13 +713,13 @@ void URLDatabase::GetMostRecentKeywordSearchTerms(
statement.BindInt64(2, age_threshold.ToInternalValue());

while (statement.Step()) {
KeywordSearchTermVisit visit;
visit.normalized_term = statement.ColumnString16(0);
visit.term = statement.ColumnString16(1);
visit.visit_count = statement.ColumnInt(2);
visit.last_visit_time =
auto visit = std::make_unique<KeywordSearchTermVisit>();
visit->normalized_term = statement.ColumnString16(0);
visit->term = statement.ColumnString16(1);
visit->visit_count = statement.ColumnInt(2);
visit->last_visit_time =
base::Time::FromInternalValue(statement.ColumnInt64(3));
visits->push_back(visit);
visits->push_back(std::move(visit));
}
}

Expand Down Expand Up @@ -810,9 +851,6 @@ const int kLowQualityMatchTypedLimit = 1;
const int kLowQualityMatchVisitLimit = 4;
const int kLowQualityMatchAgeLimitInDays = 3;

const base::TimeDelta kAutocompleteDuplicateVisitIntervalThreshold =
base::Minutes(5);

base::Time AutocompleteAgeThreshold() {
return (base::Time::Now() - base::Days(kLowQualityMatchAgeLimitInDays));
}
Expand Down

0 comments on commit 43b9054

Please sign in to comment.