New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC of Transaction table #2062
Comments
I think you should also remove the transaction record even if there are non-local intents. You'd just do it after successful asynchronous resolution of intents, assuming the |
yes, the RPC which updates the intents list will kill the entry when the new list is empty. |
this is step two (and not the last) for safely garbage collecting transaction entries (see cockroachdb#2062). EndTransaction calls now store the non-local portion of the intents in their transaction record (if there are any; otherwise the record is instantly deleted; this is not new in this change). on each run of the GC queue for a given range, its transaction table is scanned and the following actions taken * intents referenced from any "old" transaction records are resolved asynchronously * old pending transactions are pushed (which will succeed), effectively aborting them * old aborted transactions are added to the GC request. TODO: * necessary to send extra GC request? * unregister intents from their Txn's entries when they've been resolved (currently they are never removed, so each GC run will waste work resolving a long gone intent every time and can never GC Txns with non-local intents)
I'm worried about the timing dependence here. I think we need some extra safeguards: HeartbeatTxn and EndTransaction should refuse to write a transaction record that could have been GC'd previously. I think this means that each such command must include the previous heartbeat timestamp, and if it at least The first sentence is unclear; I would separate the committed and uncommitted case explicitly:
I'm unconvinced by "Those are HLC timestamps, but on the time scales involved, this is of no concern." We should always be explicit about which clocks are being used for any comparison. GC decisions are (presumably) made outside of raft and can see the real time, but HeartbeatTxn (and other raft commands) see only a timestamp which was captured before the command was submitted to raft. I think we need to make sure that the GC queue only makes a preliminary decision about whether a txn is GCable, and the real work is done inside a new raft command which repeats the time check (to make sure it is well-ordered with respect to any concurrent commands in the raft pipeline). In fact, if we do this I'm not sure if we even need the |
I think we can rely on the GC metadata here. That contains a timestamp from the HLC clock which is basically the Consequently, if you're heartbeating/pushing/committing and there isn't an entry, you get the persisted GC metadata from the engine. If that timestamp minus the GC threshold is higher than what you're operating at, then the entry might have been GCed. Then for a |
There are benefits to leaving the current compaction-based GC of transaction records in place. This mechanism relies on getting a cluster-wide low water mark on oldest extant intent and allowing the normal compaction to clean up any straggling transaction records.
|
re @spencerkimball's comment in #2111:
It's a an annoying edge case. At the point at which the reader tries the push and doesn't find the entry in the table, there isn't an intent any more (it's been removed synchronously with the commit). But the current behavior should be fine: The entry is recreated as The alternative to the above is to only remove the |
I've thought about this some more. My major discomfort with this was that it's a centralized system, but I'm getting more comfortable with it and think it might be the better solution. I'm 70% through the changes proposed above, but I'm going to freeze that until we've reached a decision. Let me outline my understanding of how I think it would work, please add in anything I missed.
Assuming the I'm still a little concerned about the I've looked at the |
I'm a little worried about the |
I think we need to properly flesh out the |
Just let me know what improvements/changes need to go into it. With the change to using a classic rb tree instead of the llrb one, it reduces the number of rotations required to keep it balanced. |
Using the range-local GC metadata timestamp satisfies my concerns about clock sensitivity. A cluster-wide GC watermark makes me very nervous. In addition to the |
It would save us from having to store a list of intents on successfully committed transactions. I don't see why Gossip would be an issue. The info is always from the But yeah, the globality of it worries me. If I trusted in the |
Ping. What's the status here? |
@spencerkimball and I need to pick up the discussion again. I have a half-finished branch in my graveyard for the local option. |
How about if we always write the transaction record on the first write? Because the transaction records are located on the same range as the first key, this wouldn't require any extra RPCs or latency. Then, if a push txn request arrives for a txn which is missing, it's a simple failure. There is one special case that I can think of, but it'll still work if we just treat the missing txn record as a failure (it will retry):
|
@tschottdorf I'm experimenting with adding a begin transaction request. |
the idea sounds good (not the one about adding |
Because I want to create a transaction record preemptively, not lazily. On Thu, Oct 1, 2015 at 11:57 AM, Tobias Schottdorf <notifications@github.com
|
But you're optimizing against a delay in parallel usage, which I don't think we'll have in beta. That seems like something we'd want to do when we want to make parallel txns work. |
The current plan (after some offline discussion with Tobias) is the following:
|
BeginTransactionRequest is automatically inserted by the KV client immediately before the first transactional write. On execution, BeginTransaction creates the transaction record. If a heartbeat arrives for a txn that has no transaction record, it's ignored. If a push txn arrives for a txn that has no transaction record, the txn is considered aborted. This solves the problem of errant heartbeats or pushes recreating a GC'd transaction record, addressing #2062.
BeginTransactionRequest is automatically inserted by the KV client immediately before the first transactional write. On execution, BeginTransaction creates the transaction record. If a heartbeat arrives for a txn that has no transaction record, it's ignored. If a push txn arrives for a txn that has no transaction record, the txn is considered aborted. This solves the problem of errant heartbeats or pushes recreating a GC'd transaction record, addressing #2062.
BeginTransactionRequest is automatically inserted by the KV client immediately before the first transactional write. On execution, BeginTransaction creates the transaction record. If a heartbeat arrives for a txn that has no transaction record, it's ignored. If a push txn arrives for a txn that has no transaction record, the txn is considered aborted. This solves the problem of errant heartbeats or pushes recreating a GC'd transaction record, addressing #2062.
see cockroachdb#2062. on each run of the GC queue for a given range, the transaction and sequence prefixes are scanned and the following actions taken: * old pending transactions are pushed (which will succeed), effectively aborting them * old aborted transactions are added to the GC request. * aborted and committed transactions will have the intents referenced in their record resolved synchronously and are GCed (on success) * sequence cache entries which are "old" and belong to "old" (or nonexistent) transactions are deleted.
see cockroachdb#2062. on each run of the GC queue for a given range, the transaction and sequence prefixes are scanned and the following actions taken: * old pending transactions are pushed (which will succeed), effectively aborting them * old aborted transactions are added to the GC request. * aborted and committed transactions will have the intents referenced in their record resolved synchronously and are GCed (on success) * sequence cache entries which are "old" and belong to "old" (or nonexistent) transactions are deleted.
see cockroachdb#2062. on each run of the GC queue for a given range, the transaction and sequence prefixes are scanned and the following actions taken: * old pending transactions are pushed (which will succeed), effectively aborting them * old aborted transactions are added to the GC request. * aborted and committed transactions will have the intents referenced in their record resolved synchronously and are GCed (on success) * sequence cache entries which are "old" and belong to "old" (or nonexistent) transactions are deleted.
this is the missing part of #1873 .
GC of proto.Transaction
A
Transaction
record can be removed if its last heartbeat is sufficientlyold (say,
> 4*HeartbeatInterval
) and if no intents which should be visibleremain (i.e. only if it's committed do we need to ensure that).
If no heartbeat timestamp is present,the
Transaction
timestamp is used instead.Those are HLC timestamps, but on the time scales involved, this is of no concern.
The GC queue is in charge of periodically grooming the
Transaction
table.It can't ever GC a successfully committed
Transaction
which lists possibly-unresolved intents, though. Encountering one of those requires prior intent
resolution. It will be easiest if the queue just gives those intents to the range
to handle then asynchronously as skipped intents, which should render
the
Transaction
GCable in the next pass.Any successful
ResolveIntent
(or collection thereof) should be followed byan invocation of a new RPC
ResolveTransaction
(or overloadEndTransaction
for this purpose) which updates the
Transaction
record, removingthe resolved intents from the list stored within the
Transaction
.PushTxn
When executing against an absent on-disk record, the
Transaction
isconsidered
ABORTED
if the intent'sTransaction
timestamp is older than< 2*HeartbeatInterval
, and no new entry is written in that case.The choice of
{2,4}*HeartbeatInterval
above makes sure thatPushTxn
will always find the
Transaction
record or decide correctly that it'sABORTED
without the need to write a new one.
successful EndTransaction
Resolve local intents right away. If that includes all intents, we can remove
the record entirely. Otherwise, add the remainder to the
Transaction
record and return them for asynchronous resolution.failed EndTransaction
this fails either because of regression errors (these amount to bugs), or,
more interestingly, because the
Transaction
was aborted by a concurrentwriter.
Prime objective is removing the intents as soon as possible, but we can't
write on error. So we leave the
Transaction
record and resolve all intentsasynchronously.
Note that that can leave a
Transaction
inPENDING
state so that it'lllikely never commit (it can be aborted due to timeout at some point) in
the case of regression errors (which amount to bugs).
The text was updated successfully, but these errors were encountered: