Skip to content

Commit

Permalink
Optimize SubstringSetMatcher [patch 3/5, replace flat_map]
Browse files Browse the repository at this point in the history
Replace the base::flat_map<char, NodeID> with a somewhat more custom
data structure. This is significantly tighter on memory, while also
increasing performance.

The full details are in the comments, but in general, we pack the label
and a 23-bit node ID together in a 32-bit block, which immediately
halves the RAM used for edges (pair<char, NodeID> used 64 bits,
due to padding). Furthermore, we also reduce the size of each node,
by implementing our own smaller size and capacity counters; no node
can have more than 259 outgoing edges, so a uint32_t is meaningless
for this. Finally, since most nodes have very few edges, we add
a special case for when there are two edges or fewer, where we store
those edges inline in the node instead of on the heap. This saves on
RAM and memory allocation time, plus makes for less pointer cahsing.

Note that this changes node IDs to be 23-bit instead of 32-bit,
which means we can hold 8M nodes instead of 4B. This is still
tens of megabytes of data in practice, though; if it turns out
to be a problem for some applications, it would probably be possible
to have a template parameter for 31-bit IDs (causing Node to go
from 12 to 14 bytes in the process).

We stop doing binary search and replace it with a simple linear one
instead, which is just as fast (since we generally have few edges)
and allows us to get by without sorting nodes.

SubstringSetMatcher.init_time:      60956 -> 36772 us (+65.8% perf)
SubstringSetMatcher.match_time:       138 ->   129 us (+ 7.0% perf)
SubstringSetMatcher.memory_usage:   25879 -> 13047 kB (-49.6% RAM)

Change-Id: I9ef6b24e5d20dff10a7736086584372d1c92c636
Bug: 1319422
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3596141
Commit-Queue: Steinar H Gunderson <sesse@chromium.org>
Reviewed-by: Dominic Battré <battre@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1001467}
  • Loading branch information
Steinar H. Gunderson authored and Chromium LUCI CQ committed May 10, 2022
1 parent 010704b commit 2f0ae63
Show file tree
Hide file tree
Showing 2 changed files with 198 additions and 47 deletions.
139 changes: 107 additions & 32 deletions components/url_matcher/substring_set_matcher.cc
Original file line number Diff line number Diff line change
Expand Up @@ -93,14 +93,14 @@ bool SubstringSetMatcher::Match(const std::string& text,

const AhoCorasickNode* current_node = root;
for (const char c : text) {
NodeID child = current_node->GetEdge(c);
NodeID child = current_node->GetEdge(static_cast<unsigned char>(c));

// If the child not can't be found, progressively iterate over the longest
// proper suffix of the string represented by the current node. In a sense
// we are pruning prefixes from the text.
while (child == kInvalidNodeID && current_node != root) {
current_node = &tree_[current_node->failure()];
child = current_node->GetEdge(c);
child = current_node->GetEdge(static_cast<unsigned char>(c));
}

if (child != kInvalidNodeID) {
Expand All @@ -127,14 +127,14 @@ bool SubstringSetMatcher::AnyMatch(const std::string& text) const {

const AhoCorasickNode* current_node = root;
for (const char c : text) {
NodeID child = current_node->GetEdge(c);
NodeID child = current_node->GetEdge(static_cast<unsigned char>(c));

// If the child not can't be found, progressively iterate over the longest
// proper suffix of the string represented by the current node. In a sense
// we are pruning prefixes from the text.
while (child == kInvalidNodeID && current_node != root) {
current_node = &tree_[current_node->failure()];
child = current_node->GetEdge(c);
child = current_node->GetEdge(static_cast<unsigned char>(c));
}

if (child != kInvalidNodeID) {
Expand Down Expand Up @@ -207,11 +207,6 @@ void SubstringSetMatcher::BuildAhoCorasickTree(
for (const StringPattern* pattern : patterns)
InsertPatternIntoAhoCorasickTree(pattern);

// Trie creation is complete and edges are finalized. Shrink to fit each edge
// map to save on memory.
for (AhoCorasickNode& node : tree_)
node.ShrinkEdges();

CreateFailureAndOutputEdges();
}

Expand All @@ -226,7 +221,7 @@ void SubstringSetMatcher::InsertPatternIntoAhoCorasickTree(

// Follow existing paths for as long as possible.
while (i != text_end) {
NodeID child = current_node->GetEdge(*i);
NodeID child = current_node->GetEdge(static_cast<unsigned char>(*i));
if (child == kInvalidNodeID)
break;
current_node = &tree_[child];
Expand All @@ -236,7 +231,7 @@ void SubstringSetMatcher::InsertPatternIntoAhoCorasickTree(
// Create new nodes if necessary.
while (i != text_end) {
tree_.emplace_back();
current_node->SetEdge(*i, tree_.size() - 1);
current_node->SetEdge(static_cast<unsigned char>(*i), tree_.size() - 1);
current_node = &tree_.back();
++i;
}
Expand All @@ -262,8 +257,9 @@ void SubstringSetMatcher::CreateFailureAndOutputEdges() {

NodeID root_output_link = root->IsEndOfPattern() ? kRootID : kInvalidNodeID;

for (const auto& edge : root->edges()) {
AhoCorasickNode* child = &tree_[edge.second];
for (unsigned edge_idx = 0; edge_idx < root->num_edges(); ++edge_idx) {
const AhoCorasickEdge& edge = root->edges()[edge_idx];
AhoCorasickNode* child = &tree_[edge.node_id];
child->SetFailure(kRootID);
child->SetOutputLink(root_output_link);
queue.push(child);
Expand All @@ -278,18 +274,19 @@ void SubstringSetMatcher::CreateFailureAndOutputEdges() {

// Compute the failure and output edges of children using the failure edges
// of the current node.
for (const auto& edge : current_node->edges()) {
const char edge_label = edge.first;
AhoCorasickNode* child = &tree_[edge.second];
for (unsigned edge_idx = 0; edge_idx < current_node->num_edges();
++edge_idx) {
const AhoCorasickEdge& edge = current_node->edges()[edge_idx];
AhoCorasickNode* child = &tree_[edge.node_id];

const AhoCorasickNode* failure_candidate_parent =
&tree_[current_node->failure()];
NodeID failure_candidate_id =
failure_candidate_parent->GetEdge(edge_label);
failure_candidate_parent->GetEdge(edge.label);
while (failure_candidate_id == kInvalidNodeID &&
failure_candidate_parent != root) {
failure_candidate_parent = &tree_[failure_candidate_parent->failure()];
failure_candidate_id = failure_candidate_parent->GetEdge(edge_label);
failure_candidate_id = failure_candidate_parent->GetEdge(edge.label);
}

if (failure_candidate_id == kInvalidNodeID) {
Expand Down Expand Up @@ -332,25 +329,98 @@ void SubstringSetMatcher::AccumulateMatchesForNode(
}
}

SubstringSetMatcher::AhoCorasickNode::AhoCorasickNode() = default;
SubstringSetMatcher::AhoCorasickNode::~AhoCorasickNode() = default;
SubstringSetMatcher::AhoCorasickNode::AhoCorasickNode() {
static_assert(kNumInlineEdges == 2, "Code below needs updating");
edges_.inline_edges[0].label = kEmptyLabel;
edges_.inline_edges[1].label = kEmptyLabel;
}

SubstringSetMatcher::AhoCorasickNode::~AhoCorasickNode() {
if (edges_capacity_ != 0) {
delete[] edges_.edges;
}
}

SubstringSetMatcher::AhoCorasickNode::AhoCorasickNode(AhoCorasickNode&& other) =
default;
SubstringSetMatcher::AhoCorasickNode::AhoCorasickNode(AhoCorasickNode&& other) {
*this = std::move(other);
}

SubstringSetMatcher::AhoCorasickNode&
SubstringSetMatcher::AhoCorasickNode::operator=(AhoCorasickNode&& other) =
default;
SubstringSetMatcher::AhoCorasickNode::operator=(AhoCorasickNode&& other) {
if (edges_capacity_ != 0) {
// Delete the old heap allocation if needed.
delete[] edges_.edges;
}
if (other.edges_capacity_ == 0) {
static_assert(kNumInlineEdges == 2, "Code below needs updating");
edges_.inline_edges[0] = other.edges_.inline_edges[0];
edges_.inline_edges[1] = other.edges_.inline_edges[1];
} else {
// Move over the heap allocation.
edges_.edges = other.edges_.edges;
other.edges_.edges = nullptr;
}
num_free_edges_ = other.num_free_edges_;
edges_capacity_ = other.edges_capacity_;
failure_ = other.failure_;
match_id_ = other.match_id_;
output_link_ = other.output_link_;
return *this;
}

SubstringSetMatcher::NodeID SubstringSetMatcher::AhoCorasickNode::GetEdge(
char c) const {
auto i = edges_.find(c);
return i == edges_.end() ? kInvalidNodeID : i->second;
SubstringSetMatcher::NodeID
SubstringSetMatcher::AhoCorasickNode::GetEdgeNoInline(uint32_t label) const {
DCHECK(edges_capacity_ != 0);
for (unsigned edge_idx = 0; edge_idx < num_edges(); ++edge_idx) {
const AhoCorasickEdge& edge = edges_.edges[edge_idx];
if (edge.label == label)
return edge.node_id;
}
return kInvalidNodeID;
}

void SubstringSetMatcher::AhoCorasickNode::SetEdge(char c, NodeID node) {
DCHECK_NE(kInvalidNodeID, node);
edges_[c] = node;
void SubstringSetMatcher::AhoCorasickNode::SetEdge(uint32_t label,
NodeID node) {
DCHECK_LT(node, kInvalidNodeID);

#if DCHECK_IS_ON()
// We don't support overwriting existing edges.
for (unsigned edge_idx = 0; edge_idx < num_edges(); ++edge_idx) {
DCHECK_NE(label, edges()[edge_idx].label);
}
#endif

if (edges_capacity_ == 0 && num_free_edges_ > 0) {
// Still space in the inline storage, so use that.
edges_.inline_edges[num_edges()] = AhoCorasickEdge{label, node};
--num_free_edges_;
return;
}

if (num_free_edges_ == 0) {
// We are out of space, so double our capacity. This can either be
// because we are converting from inline to heap storage, or because
// we are increasing the size of our heap storage.
unsigned old_capacity =
edges_capacity_ == 0 ? kNumInlineEdges : edges_capacity_;
unsigned new_capacity = old_capacity * 2;
AhoCorasickEdge* new_edges = new AhoCorasickEdge[new_capacity];
memcpy(new_edges, edges(), sizeof(AhoCorasickEdge) * old_capacity);
for (unsigned edge_idx = old_capacity; edge_idx < new_capacity;
++edge_idx) {
new_edges[edge_idx].label = kEmptyLabel;
}
if (edges_capacity_ != 0) {
delete[] edges_.edges;
}
edges_.edges = new_edges;
edges_capacity_ = new_capacity;
num_free_edges_ = new_capacity - old_capacity;
}

// Insert the new edge at the end of our heap storage.
edges_.edges[num_edges()] = AhoCorasickEdge{label, node};
--num_free_edges_;
}

void SubstringSetMatcher::AhoCorasickNode::SetFailure(NodeID node) {
Expand All @@ -359,7 +429,12 @@ void SubstringSetMatcher::AhoCorasickNode::SetFailure(NodeID node) {
}

size_t SubstringSetMatcher::AhoCorasickNode::EstimateMemoryUsage() const {
return base::trace_event::EstimateMemoryUsage(edges_);
if (edges_capacity_ == 0) {
return 0;
} else {
return base::trace_event::EstimateMemoryUsage(edges_.edges,
edges_capacity_);
}
}

} // namespace url_matcher
106 changes: 91 additions & 15 deletions components/url_matcher/substring_set_matcher.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
#include <string>
#include <vector>

#include "base/containers/flat_map.h"
#include "base/check_op.h"
#include "components/url_matcher/string_pattern.h"
#include "components/url_matcher/url_matcher_export.h"

Expand Down Expand Up @@ -70,13 +70,14 @@ class URL_MATCHER_EXPORT SubstringSetMatcher {

private:
// Represents the index of the node within |tree_|. It is specifically
// uint32_t so that we can be sure it takes up 4 bytes. If the computed size
// of |tree_| is larger than what can be stored within an uint32_t,
// Build() will fail.
// uint32_t so that we can be sure it takes up 4 bytes when stored together
// with the 9-bit label (so 23 bits are allocated to the NodeID, even though
// it is exposed as uint32_t). If the computed size of |tree_| is
// larger than what can be stored within 23 bits, Build() will fail.
using NodeID = uint32_t;

// This is the maximum possible size of |tree_| and hence can't be a valid ID.
static constexpr NodeID kInvalidNodeID = std::numeric_limits<NodeID>::max();
static constexpr NodeID kInvalidNodeID = (1u << 23) - 1;

static constexpr NodeID kRootID = 0;

Expand Down Expand Up @@ -111,21 +112,47 @@ class URL_MATCHER_EXPORT SubstringSetMatcher {
// If your brain thinks "Forget it, let's go shopping.", don't worry.
// Take a nap and read an introductory text on the Aho Corasick algorithm.
// It will make sense. Eventually.

// An edge internal to the tree. We pack the label (character we are
// matching on) and the destination node ID into 32 bits, to save memory.
struct AhoCorasickEdge {
// char (unsigned, so [0..255]), or a special label below.
uint32_t label : 9;
NodeID node_id : 23;
};

// Used for uninitialized label slots; used so that we do not have to test for
// them in other ways, since we know the data will be initialized and never
// match any other labels.
static constexpr uint32_t kEmptyLabel = 0x103;
static constexpr uint32_t kFirstSpecialLabel = kEmptyLabel;

// A node in the trie.
class AhoCorasickNode {
public:
// Map from edge label to NodeID.
using Edges = base::flat_map<char, NodeID>;

AhoCorasickNode();
~AhoCorasickNode();
AhoCorasickNode(AhoCorasickNode&& other);
AhoCorasickNode& operator=(AhoCorasickNode&& other);

NodeID GetEdge(char c) const;
void SetEdge(char c, NodeID node);
const Edges& edges() const { return edges_; }

void ShrinkEdges() { edges_.shrink_to_fit(); }
NodeID GetEdge(uint32_t label) const {
if (edges_capacity_ != 0) {
return GetEdgeNoInline(label);
}
static_assert(kNumInlineEdges == 2, "Code below needs updating");
if (edges_.inline_edges[0].label == label) {
return edges_.inline_edges[0].node_id;
}
if (edges_.inline_edges[1].label == label) {
return edges_.inline_edges[1].node_id;
}
return kInvalidNodeID;
}
NodeID GetEdgeNoInline(uint32_t label) const;
void SetEdge(uint32_t label, NodeID node);
const AhoCorasickEdge* edges() const {
return edges_capacity_ == 0 ? edges_.inline_edges : edges_.edges;
}

NodeID failure() const { return failure_; }
void SetFailure(NodeID failure);
Expand All @@ -150,14 +177,63 @@ class URL_MATCHER_EXPORT SubstringSetMatcher {
NodeID output_link() const { return output_link_; }

size_t EstimateMemoryUsage() const;
size_t num_edges() const {
if (edges_capacity_ == 0) {
return kNumInlineEdges - num_free_edges_;
} else {
return edges_capacity_ - num_free_edges_;
}
}

bool has_outputs() const {
return IsEndOfPattern() || output_link() != kInvalidNodeID;
}

private:
// Outgoing edges of current node.
Edges edges_;
// Outgoing edges of current node, including failure edge and output links.
// Most nodes have only one or two (or even zero) edges, not the last
// because many of them are leaves. Thus, we make an optimization for this
// common case; instead of a pointer to an edge array on the heap, we can
// pack two edges inline where the pointer would otherwise be. This reduces
// memory usage dramatically, as well as saving us a cache-line fetch.
//
// Note that even though most nodes have fewer outgoing edges, most nodes
// that we actually traverse will have any of them. This apparent
// contradiction is because we tend to spend more of our time near the root
// of the trie, where it is wide. This means that another layout would be
// possible: If we wanted to, non-inline nodes could simply store an array
// of 259 (256 possible characters plus the three special label types)
// edges, indexed directly by label type. This would use 20–50% more RAM,
// but also increases the speed of lookups due to removing the search loop.
//
// The nodes are generally unordered; since we typically index text, even
// the root will rarely be more than 20–30 wide, and at that point, it's
// better to just do a linear search than a binary one (which fares poorly
// on branch predictors). However, a special case, we put kFailureNodeLabel
// in the first slot if it exists (ie., is not equal to kRootID), since we
// need to access that label during every single node we look at during
// traversal.
static constexpr int kNumInlineEdges = 2;
union {
// Out-of-line edge storage, having room for edges_capacity_ elements.
AhoCorasickEdge* edges;

// Inline edge storage, used if edges_capacity_ == 0.
AhoCorasickEdge inline_edges[kNumInlineEdges];
} edges_;

// Number of unused left in edges_. Edges are always allocated from the
// beginning and never deleted; those after num_edges_ will be marked with
// kEmptyLabel (and have an undefined node_id). We store the number of
// free edges instead of the more common number of _used_ edges, to be
// sure that we are able to fit it in an uint8_t. num_edges() provides
// a useful abstraction over this.
uint8_t num_free_edges_ = kNumInlineEdges;

// How many edges we have allocated room for (can never be more than
// kEmptyLabel + 1). If equal to zero, we are not using heap storage,
// but instead are using inline_edges.
uint16_t edges_capacity_ = 0;

// Node index that failure edge leads to. The failure node corresponds to
// the node which represents the longest proper suffix (include empty
Expand Down

0 comments on commit 2f0ae63

Please sign in to comment.