Skip to content

fix: fork closure in epoch proving jobs#23390

Merged
alexghr merged 1 commit into
merge-train/spartanfrom
ag/fix-mt
May 19, 2026
Merged

fix: fork closure in epoch proving jobs#23390
alexghr merged 1 commit into
merge-train/spartanfrom
ag/fix-mt

Conversation

@alexghr
Copy link
Copy Markdown
Contributor

@alexghr alexghr commented May 19, 2026

Handle fork lifetime correctly around checkpoints that might be cancelled part way through processing.

@alexghr alexghr enabled auto-merge (squash) May 19, 2026 10:27
Copy link
Copy Markdown
Contributor

@spalladino spalladino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just left a few comments on possible refactors.

Comment on lines +345 to +346
// temporary stack to control fork lifetime
await using cleanup = new AsyncDisposableStack();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

Comment on lines +362 to +393
private async processCheckpoints(
parallelism: number,
processCheckpoint: (checkpoint: Checkpoint) => Promise<void>,
): Promise<void> {
let hasError = false;
let firstError: unknown;

await asyncPool(Math.max(parallelism, 1), this.checkpoints, async checkpoint => {
if (hasError || this.abortController.signal.aborted) {
return;
}

try {
this.checkState();
await processCheckpoint(checkpoint);
} catch (err) {
if (!hasError) {
hasError = true;
firstError = err;
this.failProcessing();
}
}
});

if (hasError) {
throw firstError;
}

if (this.abortController.signal.aborted) {
this.checkState();
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this is something we could bake into the asyncPool helper directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same thing, but I didn't because the pool was forked from some repo.

Comment on lines +504 to 531
return await execWithSignal(
() => processFn(),
processingSignal,
signal =>
signal.reason?.name === 'TimeoutError' ? new PublicProcessorTimeoutError() : new PublicProcessorAbortError(),
);
}

private getProcessingSignal(tx: Tx, deadline: Date | undefined, signal: AbortSignal | undefined) {
if (!deadline) {
return signal;
}

const timeout = +deadline - this.dateProvider.now();
if (timeout <= 0) {
throw new PublicProcessorTimeoutError();
}

const txHash = tx.getTxHash();
this.log.debug(`Processing tx ${txHash.toString()} within ${timeout}ms`, {
deadline: deadline.toISOString(),
now: new Date(this.dateProvider.now()).toISOString(),
txHash,
});

return await executeTimeout(
() => processFn(),
timeout,
() => new PublicProcessorTimeoutError(),
);
const timeoutSignal = AbortSignal.timeout(timeout);
return signal ? AbortSignal.any([signal, timeoutSignal]) : timeoutSignal;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is calling for an overload of executeTimeout or execWithSignal that has both timeout and signal?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd remove executeTimeout in favour of execWithSignal now that we can use AbortSignal.timeout in recent node versions.

@AztecBot
Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/e9e076c2e7ca4d2b�e9e076c2e7ca4d2b8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_p2p/sentinel_status_slash.parallel.test.ts "slashes the proposer with INACTIVITY when checkpoint validation records unvalidated" (204s) (code: 0) group:e2e-p2p-epoch-flakes

@alexghr alexghr merged commit b78eabf into merge-train/spartan May 19, 2026
15 checks passed
@alexghr alexghr deleted the ag/fix-mt branch May 19, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants