kv: abort span access is expensive #122719
Labels
A-kv-transactions
Relating to MVCC and the transactional model.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-kv
KV Team
Projects
The abort span (
pkg/kv/kvserver/abortspan
) is a mechanism that sets markers for aborted transactions to provide protection against an aborted but active transaction not reading values it wrote due to those intents having been removed.The "span" is a slice of the range-id-local keyspace which is read on each
BatchRequest
that is part of a read-write transaction. The logic for this is here:cockroach/pkg/kv/kvserver/replica_evaluate.go
Lines 208 to 227 in 55991cb
This is an additional LSM read per BatchRequest, which can be seen prominently in CPU profiles under
checkIfTxnAborted
, accounting for 3.59% of CPU time on write-heavy workloads:profile_abort_span.pb.gz
Some basic experimentation with the sysbench workload (
sysbench/oltp_write_only/nodes=7/cpu=16/conc=128
) demonstrates about a 2% increase in throughput by disabling this abort span read (i.e. not callingcheckIfTxnAborted
). This testing reveals the cost of the mechanism. Optimizations (up and including disabling it) could provide up to this much benefit to throughput.Given how significant this cost is and how much of an edge case the scenarios that the abort span is protecting against are, we should reevaluate whether there's something better that we can do here. Are there simple optimizations that could make this mechanism perform better? Could we make it a little weaker to avoid most of the cost? These questions are worthwhile to explore.
At a minimum, we should expose an option to disable these abort span checks.
Jira issue: CRDB-38032
The text was updated successfully, but these errors were encountered: