Skip to content

Backport "HBASE-29039 Seek past delete markers instead of skipping one at a time (#8001)" to branch-2.6#8038

Merged
junegunn merged 1 commit intoapache:branch-2.6from
junegunn:HBASE-29039-branch-2.6
Apr 8, 2026
Merged

Backport "HBASE-29039 Seek past delete markers instead of skipping one at a time (#8001)" to branch-2.6#8038
junegunn merged 1 commit intoapache:branch-2.6from
junegunn:HBASE-29039-branch-2.6

Conversation

@junegunn
Copy link
Copy Markdown
Member

@junegunn junegunn commented Apr 8, 2026

No description provided.

apache#8001)

When a DeleteColumn or DeleteFamily marker is encountered during a normal
user scan, the matcher currently returns SKIP, forcing the scanner to
advance one cell at a time. This causes read latency to degrade linearly
with the number of accumulated delete markers for the same row or column.

Since these are range deletes that mask all remaining versions of the
column, seek past the entire column immediately via
columns.getNextRowOrNextColumn(). This is safe because cells arrive in
timestamp descending order, so any puts newer than the delete have
already been processed.

For DeleteFamily, also fix getKeyForNextColumn in ScanQueryMatcher to
bypass the empty-qualifier guard (HBASE-18471) when the cell is a
DeleteFamily marker. Without this, the seek barely advances past the
current cell instead of jumping to the first real qualified column.

The optimization is only applied with plain ScanDeleteTracker, and
skipped when:
- seePastDeleteMarkers is true (KEEP_DELETED_CELLS)
- newVersionBehavior is enabled (sequence IDs determine visibility)
- visibility labels are in use (delete/put label mismatch)

---

Seeking is more expensive than skipping. When each row has only one
DeleteFamily or DeleteColumn marker (common case), the seek overhead
adds up across many rows, causing performance regression.

Introduce a counter that tracks consecutive range delete markers per row.
Only switch from SKIP to SEEK after seeing SEEK_ON_DELETE_MARKER_THRESHOLD
(default 10) markers, indicating actual accumulation. This preserves skip
performance for the common case while still optimizing the accumulation
case.

Signed-off-by: Charles Connell <cconnell@apache.org>
@junegunn junegunn added the backport This PR is a back port of some issue or issues already committed to master label Apr 8, 2026
@junegunn junegunn merged commit d88d244 into apache:branch-2.6 Apr 8, 2026
24 of 25 checks passed
Asmoday pushed a commit to arenadata/hbase that referenced this pull request Apr 15, 2026
apache#8001) (apache#8038)

When a DeleteColumn or DeleteFamily marker is encountered during a normal
user scan, the matcher currently returns SKIP, forcing the scanner to
advance one cell at a time. This causes read latency to degrade linearly
with the number of accumulated delete markers for the same row or column.

Since these are range deletes that mask all remaining versions of the
column, seek past the entire column immediately via
columns.getNextRowOrNextColumn(). This is safe because cells arrive in
timestamp descending order, so any puts newer than the delete have
already been processed.

For DeleteFamily, also fix getKeyForNextColumn in ScanQueryMatcher to
bypass the empty-qualifier guard (HBASE-18471) when the cell is a
DeleteFamily marker. Without this, the seek barely advances past the
current cell instead of jumping to the first real qualified column.

The optimization is only applied with plain ScanDeleteTracker, and
skipped when:
- seePastDeleteMarkers is true (KEEP_DELETED_CELLS)
- newVersionBehavior is enabled (sequence IDs determine visibility)
- visibility labels are in use (delete/put label mismatch)

---

Seeking is more expensive than skipping. When each row has only one
DeleteFamily or DeleteColumn marker (common case), the seek overhead
adds up across many rows, causing performance regression.

Introduce a counter that tracks consecutive range delete markers per row.
Only switch from SKIP to SEEK after seeing SEEK_ON_DELETE_MARKER_THRESHOLD
(default 10) markers, indicating actual accumulation. This preserves skip
performance for the common case while still optimizing the accumulation
case.

Signed-off-by: Charles Connell <cconnell@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR is a back port of some issue or issues already committed to master

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant