Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvclient/kvcoord: inhibit parallel commit when retrying EndTxn request #46596

Merged

Commits on Mar 31, 2020

  1. kvserver/batcheval: add a comment

    Release note: None
    Release justification: Comment only.
    andreimatei committed Mar 31, 2020
    Configuration menu
    Copy the full SHA
    def0477 View commit details
    Browse the repository at this point in the history
  2. kv, server: minor tracing improvements

    Release note: None
    Release justification: Minor new log messages.
    andreimatei committed Mar 31, 2020
    Configuration menu
    Copy the full SHA
    b1dac93 View commit details
    Browse the repository at this point in the history
  3. util/tracing: move extractMsgFromRecord to the tracking pkg

    This function was used in sql, but I want to start using it elsewhere
    when dealing with traces.
    
    Release note: None
    Release justification: Minor code move, in support of the next commit.
    andreimatei committed Mar 31, 2020
    Configuration menu
    Copy the full SHA
    92e14b2 View commit details
    Browse the repository at this point in the history
  4. kvclient/kvcoord: display traces differently

    Before this patch, a trace (recording) was String()ed by sorting
    log messages by timestamp, across spans. This proved to be a terrible
    idea. This patch moves to keeping requests from the same span together,
    but still interspersing children spans in the parent. So messages from
    siblings are no longer mixed together.
    
    Release note: None
    Release justification: testing only
    andreimatei committed Mar 31, 2020
    Configuration menu
    Copy the full SHA
    24509f2 View commit details
    Browse the repository at this point in the history
  5. kvcoord: add a log msg

    Release note: None
    Release justification: low risk, just logging
    andreimatei committed Mar 31, 2020
    Configuration menu
    Copy the full SHA
    e5b0537 View commit details
    Browse the repository at this point in the history
  6. kvclient/kvcoord: inhibit parallel commit when retrying EndTxn request

    The scenario that this patch addresses is the following (from cockroachdb#46431):
    1. txn1 sends Put(a) + Put(b) + EndTxn
    2. DistSender splits the Put(a) from the rest.
    3. Put(a) succeeds, but the rest catches some retriable error.
    4. TxnCoordSender gets the retriable error. The fact that a sub-batch
      succeeded is lost. We used to care about that fact, but we've
      successively gotten rid of that tracking across cockroachdb#35140 and cockroachdb#44661.
    5. we refresh everything that came before this batch. The refresh
      succeeds.
    6. we re-send the batch. It gets split again. The part with the EndTxn
      executes first. The transaction is now STAGING. More than that, the txn
      is in fact implicitly committed - the intent on a is already there since
      the previous attempt and, because it's at a lower timestamp than the txn
      record, it counts as golden for the purposes of verifying the implicit
      commit condition.
    7. some other transaction wonders in, sees that txn1 is in its way, and
      transitions it to explicitly committed.
    8. the Put(a) now tries to evaluate. It gets really confused. I guess
      that different things can happen; none of them good. One thing that I
      believe we've observed in cockroachdb#46299 is that, if there's another txn's
      intent there already, the Put will try to push it, enter the
      txnWaitQueue, eventually observe that its own txn is committed and
      return an error. The client thus gets an error (and a non-ambiguous one
      to boot) although the txn is committed. Even worse perhaps, I think it's
      possible for a request to return wrong results instead of an error.
    
    This patch fixes it by inhibiting the parallel commit when the EndTxn
    batch is retried. This way, there's never a STAGING record.
    
    Release note (bug fix): A rare bug causing errors to be returned for
    successfully committed transactions was fixed. The most common error
    message was "TransactionStatusError: already committed".
    
    Release justification: serious bug fix
    
    Fixes cockroachdb#46341
    andreimatei committed Mar 31, 2020
    Configuration menu
    Copy the full SHA
    7c6be57 View commit details
    Browse the repository at this point in the history