Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trace event to recovery_transaction step in recovery (release-6.3) #5038

Merged
merged 1 commit into from Jun 22, 2021
Merged

Add trace event to recovery_transaction step in recovery (release-6.3) #5038

merged 1 commit into from Jun 22, 2021

Conversation

kakaiu
Copy link
Member

@kakaiu kakaiu commented Jun 21, 2021

Backporting #5015 to release 6.3.

Problem:
(1) When a recovery blocks at recovery_transaction phase and we do root cause analysis of it, it is often hard to distinguish which step in the phase blocks the recovery --- The recovery may block at the special transaction, or block at sending TxnStateStore to proxies, or block at initializing resolvers.
(2) In the root cause analysis, we may want to see which proxy is the proxy 0. Proxy 0 is the proxy that accepts special transactions that must keep serial order. However, the current way to identify the proxy 0 is somewhat tricky. We want to clearly see this information in the trace event.

Solution:
(1) Add two trace events --- SentTxnStateStoreToCommitProxies and InitializedAllResolvers to sendInitialCommitToResolvers() in masterserver.actor.cpp.
In recovery_transaction phase, the time order is:
The master sends TxnStateStore to proxies meanwhile issues the special transaction to proxy 0, then the master initializes resolvers, then the special transaction at the proxy 0 gets resolution replied from resolver.
When the recovery blocks at recovery_transaction phase, (a) if both events appear, then we can conclude that the special transaction must be blocked. (b) If only SentTxnStateStoreToCommitProxies appears, then at least initializing resolvers is blocked, and the special transaction may be blocked (at no later than when the proxy sending resolution request to resolver) (c) If none of the two events appears, then at least sending TxnStateStore to proxies is blocked, and the special transaction may be blocked (at no later than when the proxy sending resolution request to resolver).
Noting that in case (b) and (c), it is hard to determine whether the special transaction is blocked when the master issues transaction to proxy.
This is basically due to that the special transaction and initializing resolvers happen concurrently, but we have to check them in a serialized way.

(2) Add "FirstProxy" to CommitProxyReplies events in newCommitProxies().

Code-Reviewer Section

The general guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

Copy link
Contributor

@halfprice halfprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a cherry pick, maybe add the branch in the PR title?

@kakaiu kakaiu changed the title add trace event to recovery_transaction step in recovery Add trace event to recovery_transaction step in recovery (release 6.3) Jun 21, 2021
@kakaiu kakaiu changed the title Add trace event to recovery_transaction step in recovery (release 6.3) Add trace event to recovery_transaction step in recovery (release-6.3) Jun 21, 2021
@foundationdb-ci
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: foundationdb-pull-request-build
  • Commit ID: 223a0e5
  • Result: SUCCEEDED
  • Build Logs (available for 7 days)

@kakaiu kakaiu marked this pull request as ready for review June 21, 2021 23:09
@kakaiu kakaiu merged commit 904e1f8 into apple:release-6.3 Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants