Skip to content

Catch broken.inputGate inside an output gate broken in alarms.#6427

Merged
jqmmes merged 2 commits intomainfrom
joaquim/catch-alarms-user-throw-in-blockconcurrencywhile
Mar 26, 2026
Merged

Catch broken.inputGate inside an output gate broken in alarms.#6427
jqmmes merged 2 commits intomainfrom
joaquim/catch-alarms-user-throw-in-blockconcurrencywhile

Conversation

@jqmmes
Copy link
Copy Markdown
Contributor

@jqmmes jqmmes commented Mar 26, 2026

If a user throw() inside a blockConcurrencyWhile after issuing a storage.deleteAll() it will break the input gate and subsequently break the output gate. This means that these errors are considered internal errors instead of user error (which is the correct type).

This PR adds a special case verification in runAlarm to detect these cases, marking the error as isUserError correctly.

@jqmmes jqmmes requested review from a team as code owners March 26, 2026 13:15
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catches broken.inputGateBroken errors inside alarm output-gate failure paths and reclassifies them as user errors so retries count against limits.

Findings (ranked by severity):

  1. [MEDIUM] Fragile string-prefix matching -- The detection relies on matching a string literal ("broken.inputGateBroken; ") generated in a different translation unit (io-context.h). If the annotation format in annotateBroken() or the tag in awaitIoImpl ever changes, this detection silently stops working and the bug returns. Consider extracting a shared constant or a dedicated detection helper.

  2. [MEDIUM] Duplicated detection logic -- The same stripRemoteExceptionPrefix + startsWith("broken.inputGateBroken; ") pattern appears in two separate catch blocks (lines 573-574 and 621-622). If the detection heuristic is refined, both sites must be updated in lockstep. A small helper function would reduce that risk.

  3. [LOW] No regression test -- This is a subtle behavioral change to alarm retry classification. A test that exercises the throw inside blockConcurrencyWhile after storage op scenario would prevent regressions.


This review was generated by an AI assistant and may contain inaccuracies.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 26, 2026

I'm Bonk, and I've done a quick review of your PR.

Summary: The PR adds special-case detection of broken.inputGateBroken errors in both alarm catch blocks so that when a user throws inside blockConcurrencyWhile after a storage operation, the resulting output gate breakage is correctly classified as a user error (retries count against limits).

Findings posted to the PR (ranked by severity):

  1. [MEDIUM] Fragile string-prefix matching -- The "broken.inputGateBroken; " literal is matched against a string generated by jsg::annotateBroken() in a different translation unit (io-context.h). If the annotation tag or the "; " delimiter ever changes, the detection silently breaks. Suggested extracting a shared constant or predicate.

  2. [MEDIUM] Duplicated detection logic -- The same strip-and-check pattern appears in two separate catch blocks (lines 573-574 and 621-622). Suggested extracting a small helper function so changes only need to happen in one place.

  3. [LOW] No regression test -- This is a subtle behavioral change. A test exercising the specific scenario (throw inside blockConcurrencyWhile after storage op during alarm) would prevent regressions.

No correctness bugs or safety issues were found -- the logic is sound and the kj::StringPtr lifetime from stripRemoteExceptionPrefix is safe since it borrows from either the description kj::String (first block) or e.getDescription() (second block), both of which remain alive through the check.

github run

@jqmmes jqmmes force-pushed the joaquim/catch-alarms-user-throw-in-blockconcurrencywhile branch from ba5b0b7 to c7523b3 Compare March 26, 2026 14:18
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 26, 2026

Merging this PR will not alter performance

✅ 70 untouched benchmarks
⏩ 129 skipped benchmarks1


Comparing joaquim/catch-alarms-user-throw-in-blockconcurrencywhile (ba33127) with main (c2e2dbc)

Open in CodSpeed

Footnotes

  1. 129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jqmmes jqmmes force-pushed the joaquim/catch-alarms-user-throw-in-blockconcurrencywhile branch 2 times, most recently from 9342836 to 8a82892 Compare March 26, 2026 16:47
jqmmes added 2 commits March 26, 2026 16:47
If an user throw() inside a blockConcurrencyWhile after issuing a storage.deleteAll() it will break both output and input gates, when it should be only breaking input gates.
@jqmmes jqmmes force-pushed the joaquim/catch-alarms-user-throw-in-blockconcurrencywhile branch from 8a82892 to ba33127 Compare March 26, 2026 16:47
@jqmmes jqmmes requested a review from anna-cf March 26, 2026 16:48
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 26, 2026

I'm now checking one thing -- in the second catch block (line 622-636), when isInputGateBrokenByUser is true, the else branch (line 631-636) could still override shouldRetryCountsAgainstLimits to true if EXCEPTION_IS_USER_ERROR is set. But that's fine since it's already true. However, I notice that the else branch also runs when isTunneledException or isDoNotLogException is true -- those are conditions where the exception description has special prefixes. The isInputGateBrokenByUser check is NOT applied to this else branch. Let me think about whether this matters...

Actually, looking more carefully: if isInputGateBrokenByUser is true, then shouldRetryCountsAgainstLimits starts as true (line 621). The else branch on line 631 can set it to true again (line 635) which is a no-op. So there's no behavioral issue. The only question is whether a broken.inputGateBroken description could also have isTunneledException or isDoNotLogException return true -- but that would require the description to contain special substrings like worker_do_not_log, which is unlikely for this error path.

The code looks correct and clean. The previous feedback from myself and anna-cf has been addressed. Let me verify one more subtle point -- the function name isExceptionFromInputGateBroken and comment say "originated from user code throwing inside blockConcurrencyWhile", but technically the broken.inputGateBroken annotation is applied to ANY exception that fails the input gate in blockConcurrencyWhile, not just user throws. However, in the context of the alarm handler, this is always a user error because the user's alarm callback is what's running inside blockConcurrencyWhile. So the naming and usage are contextually appropriate.

The code is well-structured, addresses all previous review feedback, and the logic is correct. No actionable issues remain.

LGTM

github run

@jqmmes jqmmes requested a review from a team March 26, 2026 16:52
@jqmmes jqmmes enabled auto-merge March 26, 2026 17:47
@jqmmes jqmmes merged commit 9043f81 into main Mar 26, 2026
48 of 50 checks passed
@jqmmes jqmmes deleted the joaquim/catch-alarms-user-throw-in-blockconcurrencywhile branch March 26, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants