feat: support failure knobs for sinks by thesayyn · Pull Request #1065 · aspect-build/aspect-cli

thesayyn · 2026-05-06T22:48:51Z

Changes are visible to end-users: yes

Searched for relevant documentation and updated as needed: yes
Breaking change (forces users to change their own code or config): no
Suggested release notes appear below: yes/no

Test plan

New test cases added

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea1a2a799d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T22:51:25Z

+            retry: sink::retry::RetryConfig {
+                max_retries: max_retries as u32,
+                retry_min_delay,
+                retry_max_buffer_size: retry_max_buffer_size as usize,
+                timeout,
+                error_strategy,


Honor the configured BES timeout

The Starlark API and Workflows feature expose timeout/bes_timeout as an overall BES upload deadline, but this value is only stored in RetryConfig and is never read by Grpc::work, drive_stream, retry_lifecycle, or Client. Users setting a finite timeout still wait through all lifecycle calls, stream retries, and final upload indefinitely if the backend stalls or keeps retrying, so the new knob is effectively a no-op.

Useful? React with 👍 / 👎.

gregmagolan · 2026-05-06T23:51:10Z

            bessie_sinks.append(bazel.build_events.grpc(
                uri = bessie_endpoint,
-                metadata = {} # TODO: how does bessie authenticate?
+                metadata = {}, # TODO: how does bessie authenticate?


Within the deployment, it is just TLS so you can drop the TODO.

Address maintainer feedback on PR #1065: 1. Bound the gRPC sink's forwarder channel by retry_max_buffer_size. Previously the Tokio mpsc was unbounded, so events accumulated freely while drive_stream was sleeping between reconnects or replaying the retry buffer — defeating the very knob meant to cap memory. The forwarder now uses try_send and signals overflow via an atomic flag; drive_stream sees the closed receiver, checks the flag, and exits with BufferFull so error_strategy applies. 2. Wire the configured timeout. RetryConfig.timeout was previously stored but never read, so 'bes_timeout' was a no-op. Wrap the sink's work in tokio::time::timeout so finite deadlines actually bound lifecycle calls, stream retries, and the final upload — matching Bazel's --bes_timeout semantics. 3. Drop the TODO about bessie authentication on the workflows BES sink: within the deployment, transport security is plain TLS.

CLAassistant · 2026-05-07T22:00:44Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

The gRPC sink uses tonic::Status::unavailable directly, but tonic was only listed in Cargo.toml. Bazel's rust_library deps are the authoritative source, so the bazel build of //:cli failed with 'use of unresolved module or unlinked crate `tonic`'. Add @crates//:tonic alongside the other tokio-family deps.

aspect-workflows · 2026-05-07T22:09:43Z

Aspect CI Status · `2026-05-08T07:00:55Z`

	Task	Job	Result
🟡 running	`build`	`build-task`	in progress
🟡 running	`test`	`test-task`	in progress

thesayyn · 2026-05-07T22:55:31Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eecf20f93d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T22:58:33Z

+    // `tokio::time::timeout` mirrors Bazel's `--bes_timeout` and gives
+    // the user-set knob a real effect.
+    match timeout {
+        Some(d) => match tokio::time::timeout(d, inner).await {


Start BES timeout after the build finishes

When timeout is set, this wraps the entire sink task from the moment the Bazel process is spawned, but work_inner spends most of a long build just waiting for BEP events from event_rx. Bazel documents --bes_timeout as the time to wait for BES/BEP upload after the build and tests have finished (https://bazel.build/versions/7.5.0/reference/command-line-reference), so a healthy build that runs longer than bes_timeout will terminate the sink early; with warn/ignore it silently drops the rest of the upload, and with abort/fail_at_end it can fail an otherwise successful build.

Useful? React with 👍 / 👎.

The bounded mpsc with try_send introduced in c7702e9 caused BES handling hangs in CI. Revert to the prior unbounded forwarder so the sink behaves the same as before that commit. The timeout wrapper from work() is kept — it's a pure no-op when 'bes_timeout' defaults to "0s"/None, and it actually wires up the knob when the user sets a finite deadline. The memory bound concern raised on PR #1065 still stands as a follow-up; bounding the forwarder safely needs more thought than the simple try_send/overflow flag did.

After sending last_message and half-closing the request side, drive_stream waited for the server to close the response stream before returning Done. Some BES backends keep the response stream open after the final ack, which made every successful upload hang indefinitely at end-of-build — visible in CI as build/test/lint/ delivery jobs running until the 12-minute cancellation. Exit Done as soon as last_message_sent is true and the retry buffer drains. The same shortcut applies for the upstream-closed path: if we already half-closed without a last_message and the buffer drains, return UpstreamClosed without waiting for the server.

Symptom: CI Build Task hits "153 Targets Built" then sits until the 20-minute job timeout. wait()/sink_handles.join() is blocked on the gRPC sink thread because drive_stream is parked in response_stream.next() after the request side was half-closed — the server never sent another ack and never closed the response stream. The previous "exit Done once buffer drains" only fired on an ack, so a server that goes silent post-half-close pinned the entire build. Add a 30s hard deadline that arms when sender_opt is dropped (either after sending last_message or after the upstream broadcaster closed). If we hit it, return Done/UpstreamClosed — the events were already buffered into tonic before we dropped the sender, so further waiting only wedges the build.

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

gregmagolan reviewed May 6, 2026

View reviewed changes

Comment thread crates/aspect-cli/src/builtins/aspect/feature/workflows.axl

gregmagolan reviewed May 6, 2026

View reviewed changes

thesayyn force-pushed the bes-retry branch from ea1a2a7 to 8f0f898 Compare May 7, 2026 05:09

feat: support failure knobs for sinks

896a3c0

thesayyn force-pushed the bes-retry branch from 8f0f898 to 896a3c0 Compare May 7, 2026 05:37

thesayyn mentioned this pull request May 7, 2026

fix(bes): bound queued events during outages and honor BES timeout #1069

Closed

3 tasks

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

claude added 3 commits May 8, 2026 03:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support failure knobs for sinks#1065

feat: support failure knobs for sinks#1065
thesayyn wants to merge 6 commits intomainfrom
bes-retry

thesayyn commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

Uh oh!

gregmagolan May 6, 2026

Uh oh!

CLAassistant commented May 7, 2026

Uh oh!

aspect-workflows Bot commented May 7, 2026 •

edited

Loading

Uh oh!

thesayyn commented May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

thesayyn commented May 6, 2026

Changes are visible to end-users: yes

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gregmagolan May 6, 2026

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented May 7, 2026

Uh oh!

aspect-workflows Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Aspect CI Status · 2026-05-08T07:00:55Z

Uh oh!

thesayyn commented May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aspect-workflows Bot commented May 7, 2026 •

edited

Loading

Aspect CI Status · `2026-05-08T07:00:55Z`