Skip to content

Merge upstream main into our main#6

Merged
ConProgramming merged 644 commits into
GovSignals:mainfrom
triggerdotdev:main
Mar 25, 2026
Merged

Merge upstream main into our main#6
ConProgramming merged 644 commits into
GovSignals:mainfrom
triggerdotdev:main

Conversation

@amarbakir-govsignals
Copy link
Copy Markdown

Closes #

✅ Checklist

  • I have followed every step in the contributing guide
  • The PR title follows the convention.
  • I ran and tested the code works

Testing

[Describe the steps you took to test this change]


Changelog

[Short description of what has changed]


Screenshots

[Screenshots]

💯

ericallam and others added 30 commits December 23, 2025 08:40
…2810)

This fixes a regression introduced in #2778 - stable sort is required
for deterministic builds, but we can safely preserve order for the user
package.json during package updates
Closes #<issue>

## ✅ Checklist

- [ ] I have followed every step in the [contributing
guide](https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md)
- [ ] The PR title follows the convention.
- [ ] I ran and tested the code works

---

## Testing

_[Describe the steps you took to test this change]_

---

## Changelog

_[Short description of what has changed]_

---

## Screenshots

_[Screenshots]_

💯
**Improvements to the run ID copy button and run navigation buttons for
consistency**

- Adds some x-padding and layout adjustment to the copy ID button.
<img width="664" height="114" alt="CleanShot 2025-12-19 at 16 05 21@2x"
src="https://github.com/user-attachments/assets/ebc8e0de-011b-419c-bdcc-eb4157553d1c"
/>

- New custom navigation icons that work better at tiny sizes
<img width="330" height="196" alt="CleanShot 2025-12-19 at 16 06 38@2x"
src="https://github.com/user-attachments/assets/bfd8d6b8-8a65-4eac-9ce1-d70acf0ad265"
/>

Some other small improvements/fixes:
- Fixes a browser html error where there was a <button> inside a
<button>
- Updates the shortcut description to match the tooltip text for
consistency
- Made the hover states more consistent
- The shortcut bar at the bottom snaps to the list sooner because there
are more items now
Fixes #2835

There were still some flags in here we removed, deploying is a lot
simpler now for self-hosters.

Also updates the github actions guide.

---------

Co-authored-by: Claude <noreply@anthropic.com>
…negative (#2837)

Also improves the BatchTriggerError when a result of getting rate
limited.
Very small UI improvement to increase the spacing between the right hand
table columns. Also a tooltip wording improvement.
- Adds 4 additional alert thresholds to ensure customers are emailed if
they have runaway usage.
- Separated these into a new section called "Spike alerts" with a
tooltip so it's clear what they are.
- Tooltip message is: "Catch runaway usage from bugs or errors. We
recommend keeping these enabled as a safety net."
- A billing service PR now returns all orgs to populate the email list,
rather than oldest 5.
- Adds `defaultChecked` logic to honour existing orgs who have
configured alerts in the DB. New orgs get all alerts checked on by
default.

<img width="1316" height="1560" alt="CleanShot 2026-01-05 at 09 43
56@2x"
src="https://github.com/user-attachments/assets/ce749407-2b7f-4864-9c09-9333c5ac495a"
/>
### UI Improvements to the Concurrency page: 

- Truncates long branch names and includes a tooltip
- The Tables have a new variant if you don't want the rows to highlight
on hover
- Small fix to pluralize some words in the purchase modal
- Fix to prevent tooltip buttons being `type=submit`
- Updates the /limits docs page to include purchasing more concurrency
- Adds a clear banner when you have a positive balance of unallocated
concurrency



https://github.com/user-attachments/assets/54d927c3-84e3-4d55-8f42-726098f4daf0
…er (#2804)

The Override concurrency limit modal has 2 type="submit" buttons. The
first one in the DOM was firing when the "enter" key is hit which
canceled and reset the limit instead which is a bad UX.

### The fix
This fix adds a hidden button above in the DOM order which mirrors the
Update Override button. Having a double submit button is rare in our
modals so feels safe to add this to the specific modal that needs it.

### Alternative solution
Switching the order of the buttons in the main FormButton component,
then using `flex-row-reverse` to flip them back in CSS works, but it
reverses the tab order. Adding a `tabIndex` to fix that issue didn't
seem to work reliably.
This PR fixes some issues with the new BatchQueue by implementing the
full two-phase dequeue process in the FairQueue, and moving the
responsibility of consuming the worker queue to the BatchQueue and
independently enabling it via the `BATCH_QUEUE_WORKER_QUEUE_ENABLED` env
var. We've also introduced the `BATCH_QUEUE_SHARD_COUNT` env var to
control the count of master queue shards in the FairQueue. We can also
control how many queues are considered in each iteration of the master
queue consumer via the `BATCH_QUEUE_MASTER_QUEUE_LIMIT` env var.

This PR will also now skip trying to dequeue from tenants that are at
concurrency capacity, which should lead to fewer issues with low
concurrency tenants blocking higher concurrency tenants from processing.
Add colored console warnings when the event loop is blocked and wire
a feature flag to enable/disable notifications- Introduce
notifyEventLoopBlocked() in eventLoopMonitor.server.ts to
  log a colored warning with blocked and async type.
- Call notifyEventLoopBlocked() when an event-loop stall is detected.
- Add EVENT_LOOP_MONITOR_NOTIFY_ENABLED to env schema with a default of
  "0" so notifications are off by default.
- Will notify when over the `EVENT_LOOP_MONITOR_THRESHOLD_MS` env var

This makes it easier to spot long event-loop stalls during development
or when notifications are explicitly enabled.

<img width="840" height="132" alt="CleanShot 2026-01-07 at 15 03 24@2x"
src="https://github.com/user-attachments/assets/be20fa6a-be2b-46a1-aa89-d0913ed8b5b3"
/>
Remove the spline dependency and load it dynamically on the 404 page.

Uses a web component with a workaround for React.
Updated the billing alerts documentation to reflect the new spike alerts
feature added in PR #2829. The documentation now explains both standard
alerts (75%, 90%, 100%, 200%, 500%) and spike alerts (10x, 20x, 50x,
100x) that help catch runaway usage from bugs or errors.

## Files changed
- `docs/how-to-reduce-your-spend.mdx` - Added section explaining the two
types of billing alerts

Generated from [Chore(webapp): Adds additional billing
alerts](#2829) @samejr

Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and publish to npm
yourself or [setup this action to publish
automatically](https://github.com/changesets/action#with-publishing). If
you're not ready to do a release yet, that's fine, whenever you add more
changesets to main, this PR will be updated.


# Releases
## @trigger.dev/build@4.3.2

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.3.2`

## trigger.dev@4.3.2

### Patch Changes

- fix(cli): update command should preserve existing package.json order
([#2810](#2810))
-   Updated dependencies:
    -   `@trigger.dev/build@4.3.2`
    -   `@trigger.dev/core@4.3.2`
    -   `@trigger.dev/schema-to-json@4.3.2`

## @trigger.dev/python@4.3.2

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/sdk@4.3.2`
    -   `@trigger.dev/build@4.3.2`
    -   `@trigger.dev/core@4.3.2`

## @trigger.dev/react-hooks@4.3.2

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.3.2`

## @trigger.dev/redis-worker@4.3.2

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.3.2`

## @trigger.dev/rsc@4.3.2

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.3.2`

## @trigger.dev/schema-to-json@4.3.2

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.3.2`

## @trigger.dev/sdk@4.3.2

### Patch Changes

- Improve batch trigger error messages, especially when rate limited
([#2837](#2837))
-   Updated dependencies:
    -   `@trigger.dev/core@4.3.2`

## @trigger.dev/core@4.3.2

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…le state (#2847)

Replaces `redirectDocument` with `useFetcher` for editing environment
variables. This allows background form submission without full page
reload, which preserves:

- Scroll position in the env vars list
- "Reveal values" toggle state
- Search filter state

Fixes #2845

Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Eric Allam <ericallam@users.noreply.github.com>
ericallam and others added 28 commits March 11, 2026 06:33
…3208)

Deprecates the syncVercelEnvVars build extension and adds warnings in
both the Vercel integration docs and the extension's own page to prevent
conflicts with the native env var sync
## Adds 2 self serve features

### 1. self serve preview branches

- Copies the patterns of the self serve concurrency
- Self serve only available on Pro plan (otherwise you are linked to the
billing plans page)
- Global self serve branches limit: 180 (+20 for the Pro plan). It can
be overridden per Org
- You need to archive branches before reducing the number of extra
branches you're paying for
- Branches are removed immediately but remain billed until the end of
the billing cycle like extra concurrency

### 2. self serve team members

- Copies the patterns of the self serve concurrency
- Self serve only available on Pro plan (otherwise you are linked to the
billing plans page)
- Global self serve members is unlimited but can be limited with the
same env var quota and overridden per org if needed
- You need to remove team members before reducing the number of members
you pay for
- Team members are removed immediately but remain billed until the end
of the billing cycle like extra concurrency
…hards (#3219)

Queues with concurrency keys now appear as a single entry in the master
queue instead of one entry per key. This prevents high-CK-count tenants
from consuming the entire `parentQueueLimit` window and starving other
tenants on the same shard.

A new per-queue **CK index** (sorted set) tracks active concurrency key
sub-queues. The master queue gets one `:ck:*` wildcard entry per base
queue. Dequeuing from that entry round-robins across sub-queues,
maintaining per-CK concurrency tracking and fairness.

All existing operations (enqueue, dequeue, ack, nack, DLQ, TTL expiry)
are CK-index-aware and keep the index consistent. Old-format entries
drain naturally during rollout — no migration step needed, single
deploy.
## Summary

Major expansion of the MCP server (14 → 25 tools), context efficiency
optimizations, new API endpoints, and a fix for the dev CLI leaking
build directories on disk.

### New MCP tools

- **Query & analytics**: `get_query_schema`, `query`, `list_dashboards`,
`run_dashboard_query` — query your data using TRQL directly from AI
assistants
- **Profile management**: `whoami`, `list_profiles`, `switch_profile` —
see and switch CLI profiles per-project (persisted to
`.trigger/mcp.json`)
- **Dev server control**: `start_dev_server`, `stop_dev_server`,
`dev_server_status` — start/stop `trigger dev` and stream build output
- **Task introspection**: `get_task_schema` — get payload schema for a
specific task (split out from `get_current_worker` to reduce context)

### New API endpoints

- `GET /api/v1/query/schema` — discover TRQL tables and columns
(server-driven, multi-table)
- `GET /api/v1/query/dashboards` — list built-in dashboard widgets and
their queries

### New features

- **`--readonly` flag** — hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so agents can't make changes
- **`read:query` JWT scope** — new authorization scope for query
endpoints, with per-table granularity (`read:query:runs`,
`read:query:llm_metrics`, etc.)
- **Paginated trace output** — `get_run_details` now paginates trace
events via cursor, caching the full trace in a temp file so subsequent
pages don't re-fetch
- **MCP tool annotations** — all tools now have
`readOnlyHint`/`destructiveHint` annotations for clients that support
them
- **Project-scoped profile persistence** — `switch_profile` saves to
`.trigger/mcp.json` (gitignored), automatically loaded on next MCP
server start

### Context optimizations

- `get_query_schema` requires a table name — returns one table's schema
instead of all tables (60-80% fewer tokens)
- `get_current_worker` no longer inlines payload schemas — use
`get_task_schema` for specific tasks
- Query results formatted as text tables instead of JSON (~50% fewer
tokens for flat data)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw `JSON.stringify()`
- Schema and dashboard API responses cached (1hr and 5min respectively)

### Bug fixes

- Fixed `search_docs` failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (fixes #3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed `/api/v1/query` not accepting JWT auth (added `allowJWT: true`)

### Dev CLI build directory fix

The dev CLI was leaking `build-*` directories in `.trigger/tmp/` on
every rebuild, accumulating hundreds of MB over time (842MB observed).
Three layers of protection added:

1. **During session**: deprecated workers are pruned (capped at 2
retained) when no active runs reference them, preventing unbounded
accumulation
2. **On SIGKILL/crash**: the watchdog process now cleans up
`.trigger/tmp/` when it detects the parent CLI was killed
3. **On next startup**: existing `clearTmpDirs()` wipes any remaining
orphans

## Test plan

- [ ] `pnpm run mcp:smoke` — 17 automated smoke tests for all read-only
MCP tools
- [ ] `pnpm run mcp:test list` — verify 25 tools registered (21 in
`--readonly` mode)
- [ ] `pnpm run mcp:test --readonly list` — verify write tools hidden
- [ ] Manual: start dev server, trigger task, rebuild multiple times,
verify build dirs stay capped at 4
- [ ] Manual: SIGKILL the dev CLI, verify watchdog cleans up
`.trigger/tmp/`
- [ ] Verify new API endpoints return correct data: `GET
/api/v1/query/schema`, `GET /api/v1/query/dashboards`

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…ock contention (#3232)

When processing batchTriggerAndWait items, each batch item was acquiring
a
Redis lock on the parent run to insert a TaskRunWaitpoint row. With high
concurrency (processingConcurrency=50), this caused
LockAcquisitionTimeoutError
(880 errors/24h in prod), orphaned runs, and stuck parent runs.

Since blockRunWithCreatedBatch already transitions the parent to
EXECUTING_WITH_WAITPOINTS before items are processed, the per-item lock
is
unnecessary. The new blockRunWithWaitpointLockless method performs only
the
idempotent CTE insert and timeout scheduling without acquiring the lock.
- Automatic LLM cost enrichment for AI SDK spans (streamText,
generateText, generateObject) or any other spans that use semantic
gen_ai attributes with support for 145+ models
- New AI span inspector sidebar showing model, tokens, cost, messages,
tool calls, and response text
- LLM metrics dual-write to ClickHouse `llm_metrics_v1` table for
analytics
- LLM metrics built-in dashboard (unlinked at the moment)
- Provider cost fallback — uses gateway/OpenRouter reported costs from
`providerMetadata` when registry pricing is unavailable
- Prefix-stripping for gateway/OpenRouter model names (e.g.
`mistral/mistral-large-3` matches `mistral-large-3` pricing)
- Admin dashboard for managing LLM model pricing (list, create, edit,
delete, search, test pattern matching)
- Missing models detection page — queries ClickHouse for unpriced models
with sample spans and Claude Code-ready prompts for adding pricing
- AI span seed script (`pnpm run db:seed:ai-spans`) with 51 spans across
12 provider systems for local dev testing
- UI fixes: `completionTokens`/`promptTokens` aliases,
`ai.response.object` display for generateObject, cache read/write token
breakdown

## Screenshots:

<img width="1030" height="104" alt="CleanShot 2026-03-17 at 16 48 54@2x"
src="https://github.com/user-attachments/assets/bc8fccda-e48b-4d0c-bfb1-e620064e5979"
/>

<img width="1094" height="1512" alt="CleanShot 2026-03-17 at 16 49
23@2x"
src="https://github.com/user-attachments/assets/c2424569-d07e-4d67-a436-e8250043a1ee"
/>

<img width="1074" height="1412" alt="CleanShot 2026-03-17 at 16 49
18@2x"
src="https://github.com/user-attachments/assets/22342ac4-4769-45d1-a328-a24fb9a82a50"
/>

<img width="1012" height="2292" alt="CleanShot 2026-03-17 at 16 39
01@2x"
src="https://github.com/user-attachments/assets/59e327d1-6652-4293-8be0-bb8326e5fbc5"
/>

<img width="3680" height="2392" alt="CleanShot 2026-03-15 at 08 29
38@2x"
src="https://github.com/user-attachments/assets/1f77beb8-de67-495b-b890-bcdb8d7f1fe8"
/>

---------

Co-authored-by: James Ritchie <james@trigger.dev>
- Automatically impersonate a run when visiting /runs/<run_id> if an
admin is logged in
- Clear existing impersonation when switching
)

- Full prompt management UI: list, detail, override, and version
management for AI prompts defined with `prompts.define()`
- Rich AI span inspectors for all AI SDK operations with token usage,
messages, and prompt context
- Real-time generation tracking with live polling and filtering

## Prompt management

Define prompts in your code with `prompts.define()`, then manage
versions and overrides from the dashboard without redeploying:

```typescript
import { task, prompts } from "@trigger.dev/sdk";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

const supportPrompt = prompts.define({
  id: "customer-support",
  model: "gpt-4o",
  variables: z.object({
    customerName: z.string(),
    plan: z.string(),
    issue: z.string(),
  }),
  content: `You are a support agent for Acme SaaS.
Customer: {{customerName}} ({{plan}} plan)
Issue: {{issue}}
Respond with empathy and precision.`,
});

export const supportTask = task({
  id: "handle-support",
  run: async (payload) => {
    const resolved = await supportPrompt.resolve({
      customerName: payload.name,
      plan: payload.plan,
      issue: payload.issue,
    });

    const result = await generateText({
      model: openai(resolved.model ?? "gpt-4o"),
      system: resolved.text,
      prompt: payload.issue,
      ...resolved.toAISDKTelemetry(),
    });

    return { response: result.text };
  },
});
```

The prompts list page shows each prompt with its current version, model,
override status, and a usage sparkline over the last 24 hours.

From the prompt detail page you can:

- **Create overrides** to change the prompt template or model without
redeploying. Overrides take priority over the deployed version when
`prompt.resolve()` is called.
- **Promote** any code-deployed version to be the current version
- **Browse generations** across all versions with infinite scroll and
live polling for new results
- **Filter** by version, model, operation type, and provider
- **View metrics** (total generations, avg tokens, avg cost, latency)
broken down by version

## AI span inspectors

Every AI SDK operation now gets a custom inspector in the run trace
view:

- **`ai.generateText` / `ai.streamText`** — Shows model, token usage,
cost, the full message thread (system prompt, user message, assistant
response), and linked prompt details
- **`ai.generateObject` / `ai.streamObject`** — Same as above plus the
JSON schema and structured output
- **`ai.toolCall`** — Shows tool name, call ID, and input arguments
- **`ai.embed`** — Shows model and the text being embedded

For generation spans linked to a prompt, a "Prompt" tab shows the prompt
metadata, the input variables passed to `resolve()`, and the template
content from the prompt version.

All AI span inspectors include a compact timestamp and duration header.

## Other improvements

- Resizable panel sizes now persist across page refreshes (patched
`@window-splitter/state` to fix snapshot restoration)
- Run page panels also persist their sizes
- Fixed `<div>` inside `<p>` DOM nesting warnings in span titles and
chat messages
- Added Operations and Providers filters to the AI metrics dashboard

## Screenshots

<img width="3680" height="2392" alt="CleanShot 2026-03-21 at 10 14
17@2x"
src="https://github.com/user-attachments/assets/f3e59989-a2fa-4990-a9d0-3cacda431868"
/>

<img width="3680" height="2392" alt="CleanShot 2026-03-21 at 10 15
37@2x"
src="https://github.com/user-attachments/assets/2f2d02df-2d2b-44fb-ac6f-9153f6a6c387"
/>

<img width="3680" height="2392" alt="CleanShot 2026-03-21 at 10 15
54@2x"
src="https://github.com/user-attachments/assets/baa161e0-ef91-4fa4-a55f-986b71cccdf0"
/>
Adds an `annotations` JSONB column to task runs that captures where and
how each run was triggered.
This enables filtering and analyzing trigger origins without querying up
the run tree. Also enables making scheduling decisions based on the
trigger source, e.g., use separate affinities for scheduled runs.

Each run records:
- **triggerSource**: who initiated it (sdk, api, dashboard, cli, mcp,
schedule)
- **triggerAction**: what kind of action (trigger, replay, test)
- **rootTriggerSource**: the trigger source of the root ancestor,
propagated through the entire run
 tree
- **rootScheduleId**: schedule id, in case the run tree was triggered
from a schedule

Currently the main motivation for annotations it to determine whether a
run is part of a schedule-originated tree without traversing ancestors.

### A couple of design considerations
- **Decoupled source from method**: triggerSource and triggerAction are
separate fields to avoid
combinatorial explosion (every new source × every new action)
- **Server-side first**: all annotation values are primarily determined
on the server, only a minor SDK change needed
- **Forward-compatible**: annotation fields use
`z.enum([...]).or(anyString)` so new values can be
added without breaking validation; we currently don't need an explicit
version field for annotations.

Note: `metadata` would have been a more fitting name for the db column,
as it is consistent with other tables where we store this type of
information. It is already in use to store user metadata though, so we
go with `annotations` instead.
Returns the fully detailed span with attributes and AI enrichment data
### Lots of UI improvements to the Prompts pages: 

#### New side menu icons
<img width="234" height="110" alt="CleanShot 2026-03-25 at 14 26 33"
src="https://github.com/user-attachments/assets/8039ee8f-92a0-477d-ac91-458dfa43020b"
/>

#### Compact horizontal start finish times so scanning generations list
is consistent
<img width="618" height="174" alt="CleanShot 2026-03-25 at 14 24 03"
src="https://github.com/user-attachments/assets/7eb69475-d539-4e34-b0b5-5cd5119c1aea"
/>

#### Tidied up the metrics view
<img width="1607" height="925" alt="CleanShot 2026-03-25 at 14 23 52"
src="https://github.com/user-attachments/assets/d4bf2761-8e09-4d91-bfb8-e10d1508042f"
/>

#### Copiable metadata
<img width="274" height="160" alt="CleanShot 2026-03-25 at 14 23 44"
src="https://github.com/user-attachments/assets/487c804d-c000-4cd0-9a19-30c2681712df"
/>

#### Cleaner versions list
<img width="463" height="264" alt="CleanShot 2026-03-25 at 14 23 28"
src="https://github.com/user-attachments/assets/a2dfede7-0e7d-4f9c-8b01-00ba13070f70"
/>

#### Overall consistency improvements, shortcut keys and UI behaviours
improvements
<img width="2279" height="1349" alt="CleanShot 2026-03-25 at 14 23 02"
src="https://github.com/user-attachments/assets/0c29257b-15a0-443c-b872-a9f4a7f6af13"
/>

---------

Co-authored-by: Eric Allam <eallam@icloud.com>
Queue limit ServiceValidationErrors were being logged at error level.
These are
expected validation rejections, not bugs.

- Add logLevel property to ServiceValidationError (webapp + run-engine)
- Set logLevel: warn on all queue limit throws
- Schedule engine: detect queue limit failures and log as warn
- Redis-worker: respect logLevel on thrown errors
…es, and TSQL schema (#3270)

- Add llm-model-catalog package (renamed from llm-pricing) with Claude
CLI research pipeline
- Add Prisma schema: catalog columns + baseModelName on LlmModel
- Add ClickHouse: llm_model_aggregates MV + base_response_model column
- Add TSQL llm_models schema for query page integration
- Add ModelRegistryPresenter with catalog, metrics, and comparison
queries
- Add 3 dashboard pages: catalog (cards+table+filters), detail
(overview+metrics+cost estimator), compare
- Add sidebar navigation under AI section with hasAiAccess feature flag
- Add admin dashboard sync/seed for catalog metadata
- Add model variant grouping (dated snapshots under base models)
- Add shared formatters and design system component usage

refs TRI-7941
Scheduled runs create predictable hourly spikes that compete with
on-demand runs for node capacity. Runs triggered "on-demand" via the
SDK, API, or dashboard, are more sensitive to cold start latency since
users are typically
waiting on the result. When a burst of scheduled runs lands at the top
of the hour, it can saturate the shared pool resources causing
contention, affecting cold starts across the board.

The idea in this change is to absorb these periodic spikes in a
dedicated pool without affecting the cold starts of on-demand runs.
Scheduled runs are inherently less sensitive to cold starts.

### Changes in this PR

Follows up on run annotations (#3241), which made trigger origin
available on every run in the tree. This PR exposes
annotations at dequeue time to the supervisor. This enables scheduling
decisions based on trigger source.

The affinities are soft preferences at schedule time, so runs fall back
gracefully if the target pool is out out of capacity.
@ConProgramming ConProgramming merged commit a549a1b into GovSignals:main Mar 25, 2026
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.