Add trace event to recovery_transaction step in recovery (release-7.0) #5029
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backporting #5015 to release 7.0.
Problem:
(1) When a recovery blocks at recovery_transaction phase and we do root cause analysis of it, it is often hard to distinguish which step in the phase blocks the recovery --- The recovery may block at the special transaction, or block at sending TxnStateStore to proxies, or block at initializing resolvers.
(2) In the root cause analysis, we may want to see which proxy is the proxy 0. Proxy 0 is the proxy that accepts special transactions that must keep serial order. However, the current way to identify the proxy 0 is somewhat tricky. We want to clearly see this information in the trace event.
Solution:
(1) Add two trace events --- SentTxnStateStoreToCommitProxies and InitializedAllResolvers to sendInitialCommitToResolvers() in masterserver.actor.cpp.
In recovery_transaction phase, the time order is:
The master sends TxnStateStore to proxies meanwhile issues the special transaction to proxy 0, then the master initializes resolvers, then the special transaction at the proxy 0 gets resolution replied from resolver.
When the recovery blocks at recovery_transaction phase, (a) if both events appear, then we can conclude that the special transaction must be blocked. (b) If only SentTxnStateStoreToCommitProxies appears, then at least initializing resolvers is blocked, and the special transaction may be blocked (at no later than when the proxy sending resolution request to resolver) (c) If none of the two events appears, then at least sending TxnStateStore to proxies is blocked, and the special transaction may be blocked (at no later than when the proxy sending resolution request to resolver).
Noting that in case (b) and (c), it is hard to determine whether the special transaction is blocked when the master issues transaction to proxy.
This is basically due to that the special transaction and initializing resolvers happen concurrently, but we have to check them in a serialized way.
(2) Add "FirstProxy" to CommitProxyReplies events in newCommitProxies().
Code-Reviewer Section
The general guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormaster
if this is the youngest branch)