Skip to content

docs: add postReady bounded timeout + failure classification guide#122

Open
weicao wants to merge 1 commit into
mainfrom
feature/addon-postready-bounded-timeout-classification
Open

docs: add postReady bounded timeout + failure classification guide#122
weicao wants to merge 1 commit into
mainfrom
feature/addon-postready-bounded-timeout-classification

Conversation

@weicao
Copy link
Copy Markdown
Contributor

@weicao weicao commented May 13, 2026

Summary

New methodology doc addon-postready-bounded-timeout-failure-classification-guide.md covering when addon DataProtection ActionSet.spec.restore.postReady issues a long-running engine command (e.g. CREATE STANDBY TENANT, secondary rebuild, second-stage restore), how to bound it and classify failures.

Body (generic methodology, version-agnostic, no engine binding):

  • §1: Why unbounded postReady operations hang and what controller / runner / closeout lose when there is no classification
  • §2: 5 hard rules — outer timeout -k 1 N per client call; bounded retry (no while true); 5-layer failure classification (env / runner / control-plane / engine-error / engine-converge-timeout) with distinct exit codes 70–74 and reason strings; trap EXIT INT TERM cleanup; budget layering (caller > internal step bounded > per-step timeout)
  • §3: patch-gate 4 indicators (postReady_has_outer_timeout, postReady_no_while_true_unbounded, postReady_failure_reason_strings, postReady_trap_cleanup) — only prove fix in live ActionSet, NOT product / functional / acceptance / release-ready
  • §4: 6-point PR review checklist
  • §5: 3 anti-pattern vs correct-pattern pairs (no outer timeout / while-true / unified exit 1)
  • §6: relation to addon-mysql-credential-hygiene-no-argv-guide.md, addon-bounded-eventual-convergence-guide.md, addon-test-acceptance-and-first-blocker-guide.md, addon-evidence-discipline-guide.md

Appendix A is OceanBase enterprise addon oceanbase-physical-backup postReady CREATE STANDBY TENANT case (live sha b74912857084451adbf707166a82bd8f55efce5ca74f3b2ce964c65451fc9925) with explicit boundary statement. 6-sample observation across two runtime paths supports only that no hang re-occurred in those samples, NOT extrapolated to PITR full coverage / addon acceptance / release-ready / hang permanently eliminated.

SKILL-INDEX.md updated: added entry under ### 1. 设计 / 开发新 addon with single-line concise description.

Test plan

  • Manual: methodology body has zero OB-specific commands (only mysql/obclient as generic MySQL-protocol client refs)
  • Manual: appendix is the only OB-specific section and contains explicit boundary statement
  • Manual: cross-doc references resolve

🤖 Generated with Claude Code

Methodology body covers:
- Why long postReady operations need bounded timeout vs. let-it-run
- 5 hard rules: outer timeout per client call, bounded retry not while-true,
  5-layer failure classification (env / runner / control-plane / engine /
  converge-timeout) with distinct exit codes 70-74, trap EXIT INT TERM cleanup,
  caller budget > internal step bounded > per-step timeout layering
- patch-gate 4 indicators (postReady_has_outer_timeout /
  postReady_no_while_true_unbounded / postReady_failure_reason_strings /
  postReady_trap_cleanup) only prove fix in place, NOT product / acceptance /
  release-ready
- 6-point PR review checklist
- 3 anti-pattern vs correct pattern pairs

Appendix A is OceanBase enterprise addon oceanbase-physical-backup postReady
CREATE STANDBY TENANT case (live sha b74912857084451adbf707166a82bd8f55efce5ca74f3b2ce964c65451fc9925)
with explicit "do not extrapolate" boundary; 6-sample observation across two
runtime paths supports only that no hang re-occurred in those samples, not
that hang can never re-occur.
@weicao
Copy link
Copy Markdown
Contributor Author

weicao commented May 13, 2026

Blocking for merge:

  1. PR body still contains 🤖 Generated with [Claude Code].... Public PR body must not include AI/tool attribution.
  2. The new guide intro has only 4 standard fields. Please add > **Affected by version skew**: ... after Applies to KB version.
  3. The guide links to addon-mysql-credential-hygiene-no-argv-guide.md, but that doc is not on main yet and PR docs: add mysql credential hygiene (passwords not in argv) guide #121 is currently blocked. Either remove/defer that sibling reference, or land/rebase after docs: add mysql credential hygiene (passwords not in argv) guide #121 is fixed and merged.

Direction looks useful; keep the appendix boundary narrow. After these are fixed I can re-run repo-level checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant