You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
db: always estimate deletions via the table stats collector
When a new sstable is added to the LSM we need to populate its table stats,
including PointDeletionBytesEstimate and RangeDeletionBytesEstimate. These
statistics are typically calculated by the table stats collector
asynchronously, because their calculation requires walking the LSM table
metadata beneath the file and sometimes I/O.
Previously there was a fast path for sstables for which unsized point
tombstones (DELs, SINGLEDELs but not DELSIZEDs) made up less than 10% of all of
the table's entries. The previous logic reasoned that if there were not many
deletions within the file, we could use the average value size and compression
ratio of the file itself without introducing too much error overall. However,
an incorrect compression ratio can introduce significant error, including for
DELSIZED keys. In the previous logic, we could take the fast path even if the
entirety of the table was DELSIZED tombstones.
This commit removes the fast path for all sstables except for those with no
deletions at all. The 'slow path' of allowing the asynchronous table stats
collector to calculate these stats is not significantly more resource
intensive, and does not need to perform any I/O for point tombstones. I suspect
the fast path was a premature optimization.
Informs cockroachdb/cockroach#151633.
0 commit comments