Skip to content

Commit 6c323dc

Browse files
committed
internal/base: add doc comment discussing TrySeekUsingNext
1 parent 2922807 commit 6c323dc

File tree

3 files changed

+161
-11
lines changed

3 files changed

+161
-11
lines changed

internal/base/doc.go

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
// Copyright 2024 The LevelDB-Go and Pebble Authors. All rights reserved. Use
2+
// of this source code is governed by a BSD-style license that can be found in
3+
// the LICENSE file.
4+
5+
// Package base defines fundamental types used across Pebble, including keys,
6+
// iterators, etc.
7+
//
8+
// # Iterators
9+
//
10+
// The [InternalIterator] interface defines the iterator interface implemented
11+
// by all iterators over point keys. Internal iterators are composed to form an
12+
// "iterator stack," resulting in a single internal iterator (see mergingIter in
13+
// the pebble package) that yields a merged view of the LSM.
14+
//
15+
// The SeekGE and SeekPrefixGE positioning methods take a set of flags
16+
// [SeekGEFlags] allowing the caller to provide additional context to iterator
17+
// implementations.
18+
//
19+
// ## TrySeekUsingNext
20+
//
21+
// The TrySeekUsingNext flag is set when the caller has knowledge that no action
22+
// has been performed to move this iterator beyond the first key that would be
23+
// found if this iterator were to honestly do the intended seek. This allows a
24+
// class of optimizations where an internal iterator may avoid a full naive
25+
// repositioning if the iterator is already at a proximate position.
26+
//
27+
// Let [s] be the seek key of an InternalIterator.Seek[Prefix]GE operation with
28+
// TrySeekSeekUsingNext()=true on an internal iterator positioned at the key k_i
29+
// among k_0, k_1, ..., k_n keys known to the internal iterator. We maintain the
30+
// following universal invariants:
31+
//
32+
// U1: For all the internal iterators' keys k_j st j<i [all keys before its
33+
// current key k_i], one or more of the following hold:
34+
//
35+
// - (a) k_j < s
36+
// - (b) k_j is invisible at the iterator's sequence number
37+
// - (c) k_j is deleted by a visible range tombstone
38+
// - (d) k_j is deleted by a visible point tombstone
39+
// - (e) k_j is excluded by a block property filter, range key masking, etc.
40+
//
41+
// This contract must hold for every call passing TrySeekUsingNext, including
42+
// calls within the interior of the iterator stack. It's the responsibility of
43+
// each caller to preserve this relationship. Intuitively, the caller is
44+
// promising that nothing behind the iterator's current position is relevant and
45+
// the callee may search in the forward direction only. Note that there is no
46+
// universal responsibility on the callee's behavior outside the ordinary seek
47+
// operation's contract, and the callee may freely ignore the flag entirely.
48+
//
49+
// In addition to the universal invariants, the merging iterator and level
50+
// iterator impose additional invariants on TrySeekUsingNext due to their
51+
// responsibilities of applying range deletions and surfacing files' range
52+
// deletions respectively.
53+
//
54+
// Let [s] be the seek key of a Seek[Prefix]GE operation on a merging iterator,
55+
// and [s2] be the seek key of the resulting Seek[Prefix]GE operation on a level
56+
// iterator at level l_i among levels l_0, l_1, ..., l_n, positioned at the file
57+
// f_i among files f_0, f_1, ..., f_n and the key k_i among keys k_0, k_1, ...,
58+
// k_n known to the internal iterator. We maintain the following merging
59+
// iterator invariants:
60+
//
61+
// M1: Cascading: If TrySeekUsingNext is propagated to the level iterator at
62+
// level l_i, TrySeekUsingNext must be propagated to all the merging iterator's
63+
// iterators at levels j > i.
64+
// M2: File monotonicity: If TrySeekUsingNext is propagated to a level iterator,
65+
// the level iterator must not return a key from a file f_j where j < i, even if
66+
// file f_j includes a key k_j such that s2 ≤ k_j < k_i.
67+
//
68+
// Together, these invariants ensure that any range deletions relevant to
69+
// lower-levelled keys are either in currently open files or future files.
70+
//
71+
// Description of TrySeekUsingNext mechanics across the iterator stack:
72+
//
73+
// As the top-level entry point of user seeks, the [pebble.Iterator] is
74+
// responsible for detecting when consecutive user-initiated seeks move
75+
// monotonically forward. It saves seek keys and compares consecutive seek keys
76+
// to decide whether to propagate the TrySeekUsingNext flag to its
77+
// [InternalIterator].
78+
//
79+
// The [pebble.Iterator] also has its own TrySeekUsingNext optimization in
80+
// SeekGE: Above the [InternalIterator] interface, the [pebble.Iterator]'s
81+
// SeekGE method detects consecutive seeks to monotonically increasing keys and
82+
// examines the current key. If the iterator is already positioned appropriately
83+
// (at a key ≥ the seek key), it elides the entire seek of the internal
84+
// iterator.
85+
//
86+
// The pebble mergingIter does not perform any TrySeekUsingNext optimization
87+
// itself, but it must preserve the universal U1 invariant, as well as the M1
88+
// invariant specific to the mergingIter. It does both by always translating
89+
// calls to its SeekGE and SeekPrefixGE methods as equivalent calls to every
90+
// child iterator. There are subtleties:
91+
//
92+
// - The mergingIter takes care to avoid ever advancing a child iterator
93+
// that's already positioned beyond the current iteration prefix. During
94+
// prefix iteration, some levels may omit keys that don't match the
95+
// prefix. Meanwhile the merging iterator sometimes skips keys (eg, due to
96+
// visibility filtering). If we did not guard against iterating beyond the
97+
// iteration prefix, this key skipping could move some iterators beyond the
98+
// keys that were omitted due to prefix mismatch. A subsequent
99+
// TrySeekUsingNext could surface the omitted keys, but not relevant range
100+
// deletions that deleted them.
101+
//
102+
// The pebble levelIter makes use of the TrySeekUsingNext flag to avoid a naive
103+
// seek within the level's B-Tree of files. When TrySeekUsingNext is passed by
104+
// the caller, the relevant key must fall within the current file or a later
105+
// file. The search space is reduced from (-∞,+∞) to [current file, +∞). If the
106+
// current file's bounds overlap the key, the levelIter propagates the
107+
// TrySeekUsingNext to the current sstable iterator. If the levelIter must
108+
// advance to a new file, it drops the flag because the new file's sstable
109+
// iterator is still unpositioned.
110+
//
111+
// In-memory iterators arenaskl.Iterator and batchskl.Iterator make use of the
112+
// TrySeekUsingNext flag, attempting a fixed number of Nexts before falling back
113+
// to performing a seek using skiplist structures.
114+
//
115+
// The sstable iterators use the TrySeekUsingNext flag to avoid naive seeks
116+
// through a table's index structures. See the long comment in
117+
// sstable/reader_iter.go for more details:
118+
// - If an iterator is already exhausted, either because there are no
119+
// subsequent point keys or because the upper bound has been reached, the
120+
// iterator uses TrySeekUsingNext to avoid any repositioning at all.
121+
// - Otherwise, a TrySeekUsingNext flag causes the sstable Iterator to Next
122+
// forward a capped number of times, stopping as soon as a key ≥ the seek key
123+
// is discovered.
124+
// - The sstable iterator does not always position itself in response to a
125+
// SeekPrefixGE even when TrySeekUsingNext()=false, because bloom filters may
126+
// indicate the prefix does not exist within the file. The sstable iterator
127+
// takes care to remember when it didn't position itself, so that a
128+
// subsequent seek using TrySeekUsingNext does NOT try to reuse the current
129+
// iterator position.
130+
package base
131+
132+
// TODO(sumeer): Come back to this comment and incorporate some of the comments
133+
// from PR #3329.

internal/base/iterator.go

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -246,17 +246,28 @@ const (
246246
// SeekGEFlagsNone is the default value of SeekGEFlags, with all flags disabled.
247247
const SeekGEFlagsNone = SeekGEFlags(0)
248248

249-
// TrySeekUsingNext indicates whether a performance optimization was enabled
250-
// by a caller, indicating the caller has not done any action to move this
251-
// iterator beyond the first key that would be found if this iterator were to
252-
// honestly do the intended seek. For example, say the caller did a
253-
// SeekGE(k1...), followed by SeekGE(k2...) where k1 <= k2, without any
254-
// intermediate positioning calls. The caller can safely specify true for this
255-
// parameter in the second call. As another example, say the caller did do one
256-
// call to Next between the two Seek calls, and k1 < k2. Again, the caller can
257-
// safely specify a true value for this parameter. Note that a false value is
258-
// always safe. The callee is free to ignore the true value if its
259-
// implementation does not permit this optimization.
249+
// TODO(jackson): Rename TrySeekUsingNext to MonotonicallyForward or something
250+
// similar that avoids prescribing the implementation of the optimization but
251+
// instead focuses on the contract expected of the caller.
252+
253+
// TrySeekUsingNext is set when the caller has knowledge that it has performed
254+
// no action to move this iterator beyond the first key that would be found if
255+
// this iterator were to honestly do the intended seek. This enables a class of
256+
// performance optimizations within various internal iterator implementations.
257+
// For example, say the caller did a SeekGE(k1...), followed by SeekGE(k2...)
258+
// where k1 <= k2, without any intermediate positioning calls. The caller can
259+
// safely specify true for this parameter in the second call. As another
260+
// example, say the caller did do one call to Next between the two Seek calls,
261+
// and k1 < k2. Again, the caller can safely specify a true value for this
262+
// parameter. Note that a false value is always safe. If true, the callee should
263+
// not return a key less than the current iterator position even if a naive seek
264+
// would land there.
265+
//
266+
// The same promise applies to SeekPrefixGE: Prefixes of k1 and k2 may be
267+
// different. If the callee does not position itself for k1 (for example, an
268+
// sstable iterator that elides a seek due to bloom filter exclusion), the
269+
// callee must remember it did not position itself for k1 and that it must
270+
// perform the full seek.
260271
//
261272
// We make the caller do this determination since a string comparison of k1, k2
262273
// is not necessarily cheap, and there may be many iterators in the iterator

sstable/reader_iter_single_lvl.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -904,6 +904,12 @@ func (i *singleLevelIterator[I, PI, D, PD]) seekPrefixGE(
904904
if i.useFilterBlock {
905905
if !i.lastBloomFilterMatched {
906906
// Iterator is not positioned based on last seek.
907+
//
908+
// TODO(jackson): Would it be worth keeping the
909+
// TrySeekUsingNext optimization if the previous SeekPrefixGE call
910+
// that hit the bloom filter exclusion case also had
911+
// TrySeekUsingNext()=true (in which case the position from two
912+
// operations ago transitively still holds)?
907913
flags = flags.DisableTrySeekUsingNext()
908914
}
909915
i.lastBloomFilterMatched = false

0 commit comments

Comments
 (0)