feat: GovSignals fork patches on top of upstream main (replaces #2, #3, #11)#12
Draft
ConProgramming wants to merge 52 commits into
Draft
feat: GovSignals fork patches on top of upstream main (replaces #2, #3, #11)#12ConProgramming wants to merge 52 commits into
ConProgramming wants to merge 52 commits into
Conversation
3423d57 to
fd84546
Compare
Adds support for splitting deployment into separate build and register phases. New CLI options: --build-only, --register-only, --registry, --repository, --base-image-node, --containerfile-module, --skip-digest. Also adds: worker pod service account configuration, security context, DEPLOY_VERSION_SUFFIX env var, custom containerfile module support, and index metadata extraction from Docker images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Made-with: Cursor
…rences When set, bypasses auto-generation of image tags and uses the provided image reference directly for all deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…WORKER_POD_ANNOTATIONS
Replaces the hardcoded Rubix `com.palantir.rubix.service/pod-cert` annotation
with a JSON-shaped env var so the same supervisor image can run in environments
that need different per-pod annotations (Rubix in FedStart, none/empty in
GameWarden).
Default is `{}` so behavior matches upstream when the env is unset.
Mirrors the existing webapp.extraVolumes / webapp.extraVolumeMounts pattern. Required for compliance environments that need to mount a custom CA bundle ConfigMap into the supervisor pod (e.g. Palantir Rubix or GameWarden CAs) and point NODE_EXTRA_CA_CERTS at it. Both extras render unconditionally — they no longer depend on the legacy bootstrap-disabled volume block — so any consumer can opt in.
fd84546 to
e58cc82
Compare
added 11 commits
April 29, 2026 16:42
The two-phase deploy work (commit 8f98ebb) replaced the online flow (getEnvironmentVariables + createDeploymentBackgroundWorker + failDeployment via CliApiClient) with an offline flow that reads env vars from a TRIGGER_ENV_VARS build arg and writes index-metadata.json to disk. This breaks Trigger.dev cloud deployments because they don't pass the build arg. Restore the online flow as the default and gate the offline flow on the presence of TRIGGER_ENV_VARS. When set, the controller reads env from the build arg and writes metadata to disk for register-only to consume; when unset, it behaves exactly as upstream does today.
compareDeploymentVersions used `version.split('-')` and destructured the
second element as the suffix, which silently dropped everything after the
second hyphen. Two versions with multi-hyphen suffixes like
`20260101.1-pre-rc.1` and `20260101.1-pre-rc.2` would compare as equal.
The same bug existed in the SQL ORDER BY clauses in DeploymentListPresenter
and TestPresenter (`split_part(version, '-', 2)`) — they would tie-break
on `'pre'` for both versions.
Switch to `indexOf('-') + slice` in TS and `COALESCE(substring(version from
'-(.*)$'), '')` in SQL so the full suffix participates in the comparison.
`trigger deploy --build-only` writes `.triggerdeploy.json` to the project root for the host-side CLI to consume in a follow-up `--register-only` phase. Add it to: - the upstream repo's root .gitignore (so it never lands in CI artifacts if a contributor runs the build flow locally) - the project .gitignore template that `trigger init` writes (so user projects don't accidentally commit their deploy manifest)
The original `worker-serviceaccount.yaml` template hardcoded `configmaps: get,list` and `secrets:get` as default Role rules when an operator opted into `worker.rbac.create: true`. That's too broad to ship as a default and violates least-privilege expectations of most cluster admins. Default to an empty rules array. Operators who actually need the worker pods to call the Kubernetes API must explicitly supply rules via `worker.rbac.rules`.
The two-phase deploy commit (8f98ebb) added hardcoded securityContext blocks at both pod level (`runAsNonRoot: true, runAsUser: 1000, fsGroup: 1000`) and container level (`runAsNonRoot, runAsUser:1000, allowPrivilegeEscalation: false, capabilities.drop: [ALL]`) on every worker pod the supervisor schedules. Hardcoding `runAsUser: 1000` makes the chart unschedulable on OpenShift and other clusters that enforce arbitrary-UID SCC ranges. It's also opinionated: some compliance regimes also require seccompProfile, others forbid it. Replace both with env-driven JSON objects: - KUBERNETES_WORKER_POD_SECURITY_CONTEXT (V1PodSecurityContext) - KUBERNETES_WORKER_CONTAINER_SECURITY_CONTEXT (V1SecurityContext) Default is empty `{}` for both, which preserves upstream's pre-fork behavior of not setting a securityContext at all. Operators who want the previous fork defaults set them explicitly via their orchestration (e.g. supervisor.extraEnvVars in the Helm chart). Also factor a JsonObjectEnv helper into envUtil.ts so this and the existing KUBERNETES_WORKER_POD_ANNOTATIONS env var share validation logic. Adds JsonStringMap and JsonAny re-exports for callers.
…urity implications
Operators sometimes set env vars to empty strings (e.g. via Helm
`extraEnvVars` with a templated value that resolves to empty), which
would currently take the override branch in initializeDeployment because
`if (env.DEPLOY_IMAGE_OVERRIDE)` is truthy on empty strings? Actually
empty strings are falsy in JS, but Zod's optional().transform pattern
needs explicit handling to be unambiguous.
Add an explicit transform that maps empty string to undefined, and
expand the docstring to call out:
- the bypass of getDeploymentImageRef (no ECR repo creation, no
per-project tagging)
- that this is single-tenant only (every project shares the same
image digest, which is fine for a self-hosted single-org install
but a security issue in multi-tenant setups)
…d annotations Match webapp's existing CA-bundle commentary in the supervisor section of values.yaml, and add a 'CA bundle injection' / 'Worker pod security context' / 'extra worker pod annotations' section to the README that shows the canonical envar + mount pattern operators are most likely to need.
- envUtil.test.ts: JsonObjectEnv with default (string) validator and with JsonAny validator, including non-JSON, non-object, wrong-value-type, and named error message cases. - deploymentVersions.test.ts: full coverage of compareDeploymentVersions including the multi-hyphen suffix regression case (20250101.1-pre-rc.1 vs 20250101.1-pre-rc.2 must not tie at 0). - calculateNextBuildVersion.test.ts: same-day increment, new-day reset, caller-provided suffix wins, and the multi-hyphen-suffix-on-input case. webapp tests verified locally (14/14 passing). Supervisor test is formed correctly; full local run is blocked by an unrelated upstream build error in @trigger.dev/core's sharedRuntimeManager.
The version-suffix work (DEPLOY_VERSION_SUFFIX env var, suffix-aware
ordering in calculateNextBuildVersion / compareDeploymentVersions, SQL
ORDER BY rewrites in DeploymentListPresenter / TestPresenter,
suffix-aware comparisons in devQueueConsumer, test setup duplication
in run-engine) is no longer needed.
For our self-hosted deployment:
- upstream's default version scheme (YYYYMMDD.N) is fine for tracking
deployments
- custom image tagging is handled by DEPLOY_IMAGE_OVERRIDE (separate
feature, kept)
Restore the affected files to upstream main. Drop the related test
files and the suffix-fix commit's coverage.
Three blocks were dropped accidentally by overly-greedy StrReplace edits in earlier commits. Restore them to match upstream main: 1. apps/supervisor/src/env.ts: KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED and KUBERNETES_POD_DNS_NDOTS env vars (and their multi-line comment explaining the ndots-tuning rationale) — dropped in 171b09f (feat(supervisor): parameterize worker-pod and -container securityContext) when the surrounding KUBERNETES_WORKER_* parameterization was added. 2. apps/supervisor/src/workloadManager/kubernetes.ts: corresponding dnsConfig block in the worker pod spec that consumes the env vars — dropped in the same commit. 3. hosting/k8s/helm/values.yaml: webapp.serviceAccount block (create, name, annotations) — dropped in 7c5f3a5 (fix(helm): default worker Role to empty rules) when the worker.rbac block was edited; the StrReplace replacement string accidentally swallowed the unrelated webapp.serviceAccount block above it. No functional change to our patches; pure restoration of upstream code.
The `--base-image-node` CLI option was a simple alternative to the more
flexible `--containerfile-module` for swapping the per-deploy base image.
We don't use it anywhere; `--containerfile-module` covers the same use
case and more (the `custom-base-image.mjs` example shows the migration
pattern).
Drop:
- `baseImageNode` zod schema field on DeployCommandOptions
- `--base-image-node` Commander flag
- `baseImageNode` field on BuildImageOptions, DepotBuildImageOptions,
SelfHostedBuildImageOptions, GenerateContainerfileOptions, and
BuildWorkerOptions
- `baseImageNode` plumbing through buildImage / writeContainerfile
- The override branch in parseGenerateOptions
- All references in examples/README.md (point users at
--containerfile-module + the custom-base-image.mjs example instead)
…ryCatch Collapse the if/else split between DEPLOY_IMAGE_OVERRIDE and getDeploymentImageRef into one tryCatch wrapper around an inline async function, then destructure the result. Removes the let-over-let pattern and a layer of nested error handling without changing behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Three independent simplifications identified during PR review where the fork
adds surface area we don't actually exercise:
1. TRIGGER_ENV_VARS → TRIGGER_INDEX_OFFLINE flag
The original two-phase work passed a JSON map of env vars from host to
the in-container indexer to bake into the build. In practice every
--build-only invocation passes {} (the host has no API auth to fetch
real env vars from), and the JSON-parsing capability is never exercised.
Replace it with a boolean TRIGGER_INDEX_OFFLINE=1 flag that just selects
the offline-vs-online indexer path. Drops ~25 lines of parse/validate
from managed-index-controller and the index-time env-var injection from
buildImage. Rename indexEnvVars option to offlineIndex; only --build-only
sets it.
2. Restore shouldLoad to upstream
The hardcoded `return true` was working around the need to extract index
files from the built image, but only --build-only actually does that
extraction. Restore upstream's switch logic and instead set
`load: true` explicitly at the build-only call site.
3. Remove --skip-digest flag
Used in exactly one path: register-only mode when the host doesn't have
the image digest. But passing imageDigest: undefined into finalize achieves
the same result (JSON.stringify drops undefined keys). Remove the flag and
pass imageDigest unconditionally.
Net: -27 lines, no behavior change for either online or offline paths.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ile helpers Two unused additions removed during PR review: 1. _deployCommand's `if (isLocalBuild) read index-metadata.json + create BackgroundWorker / else throw` block. This block was added by the two-phase deploy work but is never reached: --build-only and --register-only take their own dedicated functions (buildOnlyDeploy / registerOnlyDeploy) which early-return at the top of _deployCommand, and the normal cloud deploy path uses upstream's in-container indexing via the API. Worse, the else-branch unconditionally `throw`s on remote (Depot) builds, regressing upstream behavior. Restore _deployCommand to flow directly from buildResult validation to getDeployment, matching upstream. 2. Three formatter helpers in containerfile-template.ts (formatBuildArgs, formatEnvVars, formatRunCommands). Zero references in fork or in downstream consumers; speculative API surface that was never wired up. Net: -135 lines. Co-authored-by: Cursor <cursoragent@cursor.com>
…ields
Two more cleanups identified during PR review:
1. Half-implemented "automatic image retagging" block in registerOnlyDeploy.
Was guarded by a TODO comment ("Implement automatic image retagging") and
only ever logged shell hints (aws ecr put-image / docker tag) for the
user to run by hand, then `await`ed an unconditional 5-second sleep.
Never useful: callers that need fixed image tags use DEPLOY_IMAGE_OVERRIDE
on the webapp side so deployData.imageTag and deployment.imageTag match.
Remove the entire block (~46 lines).
2. simulatedVersion + buildPath fields in the deployData object written to
.triggerdeploy.json. Both are written by buildOnlyDeploy but never read
by registerOnlyDeploy. simulatedVersion is still used elsewhere in
buildOnlyDeploy (deploymentVersion arg + saveLogs filename); we only
drop the dead deployData entry.
Net: -44 lines.
Co-authored-by: Cursor <cursoragent@cursor.com>
Mirror the pattern @trigger.dev/build already uses: a dedicated `./internal` subpath export for downstream tooling that wants to drive the build / deploy pipeline programmatically without going through the `trigger` CLI binary. Adds packages/cli-v3/src/internal.ts re-exporting: - loadConfig (trigger.config.ts loader) - buildWorker + BuildWorkerOptions (esbuild bundling) - buildImage + BuildImageOptions + BuildImageResults (docker buildx orchestration) - generateContainerfile + parseGenerateOptions + GenerateContainerfileOptions (Containerfile templating) - CliApiClient (typed wrapper around the deploy / register / finalize endpoints) These modules already existed; this commit just makes them addressable from outside the CLI package. Also widens the visibility of three buildImage result types (BuildImageSuccess, BuildImageFailure, BuildImageResults) from file-private to exported so consumers can statically discriminate the result union. No behavior change. The CLI binary continues to import the same modules internally; nothing on the runtime path is affected. Co-authored-by: Cursor <cursoragent@cursor.com>
The two-phase deploy machinery added in 8f98ebb has been replaced with two ~150-line driver scripts living next to the GovSignals tasks Dockerfile, which import build/deploy primitives via the new `trigger.dev/internal` subpath export instead of going through the CLI binary. None of the CLI flags or helper functions are reachable any more. Removed: - buildOnlyDeploy (~170 lines) and registerOnlyDeploy (~340 lines) - CLI options --build-only, --register-only, --registry, --repository, --containerfile-module - Zod schema fields buildOnly, registerOnly, registry, repository, containerfileModule - Early-return shims at the top of _deployCommand - The containerfileModule passthrough in _deployCommand's buildWorker call (the option is gone; buildWorker still accepts the parameter for callers using `trigger.dev/internal` directly) - init.ts changes that taught `trigger init` to add `.triggerdeploy.json` to the project .gitignore - Root .gitignore entry for .triggerdeploy.json What stays: - managed-index-controller.ts offline-mode bootstrap (TRIGGER_INDEX_OFFLINE=1 env var) — still needed by the offline indexer that runs inside the build container - buildImage.ts offlineIndex option, generateContainerfile, parseGenerateOptions - buildWorker.ts containerfileModule plumbing (now consumed via `trigger.dev/internal`) - All supervisor / helm / webapp fork patches Net: -520 lines. Co-authored-by: Cursor <cursoragent@cursor.com>
Three follow-up cleanups now that --build-only / --register-only are gone and the build is driven through `trigger.dev/internal`: 1. Delete packages/cli-v3/examples/. The README explicitly documented the `--containerfile-module` CLI flag, which has been removed; nothing else imports these example modules. Working reference for custom containerfile templates lives at infrastructure/containers/dockerfiles/trigger-dev/deployer/ in the GovSignals repo. 2. Restore packages/cli-v3/src/commands/deploy.ts to upstream. After the helper functions were deleted in 12a7128, the only remaining diff was a handful of imports referenced exclusively by those helpers (CreateBackgroundWorkerRequestBody, writeJSONFile, readJSONFile, pathExists, alwaysExternal, readdirSync), a re-ordering of updateTriggerPackages vs verifyDirectory that was needed only for the early-return shims, a stray blank line, and two `\u2502` escapes that crept in from a Python edit. None of those carry their own behavior; `git checkout upstream/main -- deploy.ts` is exact equivalence. 3. buildImage.ts: revert remoteBuildImage's offlineIndex pass-through. It was added for completeness but no caller ever sets offlineIndex on a Depot/remote build (Depot has API access; offline mode is meaningless there). Drop the type field + the build-arg conditional. Also drop a commented-out cpSync debug block that referenced a path on my local machine, the unused `cpSync` import, and revert one `logger.info(\`docker ${args}\`)` back to `logger.debug` (it was an unintentional bump while debugging the offline extraction logic and floods stderr at the default log level). Net: -396 lines, with the cli-v3 source tree closer to upstream. Co-authored-by: Cursor <cursoragent@cursor.com>
…alBuildImage The fork's `localBuildImage` carried ~250 lines of host-side image manipulation that turn out to be unnecessary once you notice the offline indexer is already wired into the multi-stage Containerfile. The Containerfile's indexer stage writes index-metadata.json (and on failure, index-error.json + process.exit(1)) inside the build container, and the final stage already COPYs those artifacts to /app/index/. They travel into the runtime image and are read by downstream tooling that mounts the image (in our case, the register-tasks Helm Job's init container) — no host-side extraction needed. Removed: - The `docker create` + `docker cp` + `docker rm` extraction block (~150 lines). Index files reach consumers via the runtime image, not via the build host. - The explicit `docker push` + `docker inspect` digest retrieval block (~100 lines). With `--output type=image,push=true` (selected by the existing `getOutputOptions` helper) BuildKit pushes natively and the pushed manifest digest lands in `metadata.json` — exactly what upstream's pre-fork `localBuildImage` already reads. Single-arch builds (which is what we use for self-hosted) report a stable digest via this path; multi-arch may need `docker inspect` back, can be re-added if anyone hits it. - Redundant `--load` flag (already implied by `type=docker` output). - Redundant `-t imageTag` flag (already in `name=imageTag` inside `outputOptions`). - Now-unused `mkdirSync` and `statSync` fs imports. - The `let buildDigest`/`let finalDigest` two-step (we just have `const digest` again, parsed once from metadata.json). What stays: - `apiClient?: CliApiClient` optional + null-guard (build.mjs has no CLI session). - `offlineIndex` build-arg passthrough. - `generateContainerfile` / `parseGenerateOptions` exports + the `containerfileModule` plumbing. - `BuildImageSuccess`/`Failure`/`Results` type exports for `internal.ts`. Net: -242 lines from buildImage.ts. Fork PR's diff for this file drops from +335/-31 to +89/-27 vs upstream main. Co-authored-by: Cursor <cursoragent@cursor.com>
…ptions The fork's two-phase deploy work removed the `deploymentSpinner?: any` field from `BuildImageOptions` (it was never used by buildImage itself, just declared for external callers). Restoring it brings the type declaration closer to upstream — the only fork-only deltas in this section are now `apiClient?` (made optional) and `offlineIndex?` (added). Co-authored-by: Cursor <cursoragent@cursor.com>
Three small refactors after the BuildKit-native cleanup:
1. buildImage.ts: move loadContainerfileModule + generateContainerfile back
above parseGenerateOptions to match upstream's ordering. The fork's
addition of loadContainerfileModule had pushed the helpers below their
sole consumer; restoring upstream order makes the file's structure
easier to scan against the upstream version.
2. buildImage.ts: inline `BASE_IMAGE[options.runtime]` directly into the
parseGenerateOptions return literal instead of binding it to a local
const first. One fewer named binding for a value used exactly once.
3. managed-index-controller.ts: introduce offlineEnvShim() that returns the
same `{ success: true, data: { variables } }` envelope as
`cliApiClient.getEnvironmentVariables`. Lets the indexer treat both
modes uniformly — single `if (!$env.success)` check, single
`$env.data.variables` use site — instead of branching on `result.mode`
to assemble the envVars value.
No behavior change.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
build.mjs only consumes loadConfig, buildWorker, and buildImage from trigger.dev/internal. Everything else the fork was re-exporting was unused. internal.ts: trimmed from 6 named exports + 4 type re-exports to just the three function consumers actually use. buildImage.ts: dropped `export` from - BuildImageSuccess / BuildImageFailure / BuildImageResults (now file-local) - parseGenerateOptions (only generateBunContainerfile + generateNodeContainerfile call it, both in this file) - generateBunContainerfile / generateNodeContainerfile (only generateContainerfile dispatches to them) Net: -10 lines and the buildImage.ts diff vs upstream drops from +89/-24 to +67/-4. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The previous JSDoc still referenced `trigger deploy --build-only` /
`--register-only` (CLI flags that were removed) and described offline
mode as writing files "for the host-side CLI to read." That isn't what
happens any more: in offline mode the indexer writes the artifacts
into the working directory, the multi-stage Containerfile copies them
into the final image, and downstream tooling reads them out of the
runtime image (in our case via a Helm Job init container).
Update the comment to match the actual flow and reference
`trigger.dev/internal`'s buildImage({ offlineIndex: true }) as the
intended consumer in self-hosted setups. Also drop the now-meaningless
"behaves exactly as it did before two-phase deploy was added" line.
Co-authored-by: Cursor <cursoragent@cursor.com>
…yment
The discriminated `OnlineBootstrap | OfflineBootstrap` union forced
indexDeployment's param to stay opaque (`result: BootstrapResult`) and
required `result.mode === "offline"` checks at every branch. Collapse
the union into a single shape with the API-mode fields optional, then
restore upstream's destructured param signature:
async function indexDeployment({
cliApiClient,
projectRef,
deploymentId,
buildManifest,
}: BootstrapResult)
Offline mode is detected by `!cliApiClient` instead of a `mode` tag.
Bootstrap returns `{ buildManifest }` for offline and the full record
for online. The three offline branches (env vars / metadata write /
error write) all use `!cliApiClient` (or pair it with `!deploymentId`
where the type checker needs both narrowed).
Net: same behavior, but `indexDeployment`'s shape now matches upstream.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ng indexDeployment
Previous offline-mode plumbing carved branching into indexDeployment:
discriminated union types, `result.mode === "offline"` checks at every
API call site, parallel disk-write paths next to the API-call paths.
Replace with a CliApiClient stub that, instead of issuing HTTP
requests, writes the same payloads to disk:
- getEnvironmentVariables → returns `{ variables: {} }`
- createDeploymentBackgroundWorker → writes the body's metadata (plus
buildPlatform / targetPlatform) to `index-metadata.json`
- failDeployment → writes the error to `index-error.json`
bootstrap() picks the shim when TRIGGER_INDEX_OFFLINE=1 and the real
CliApiClient otherwise. indexDeployment is unchanged — bit-identical to
upstream — because both implementations satisfy the same interface for
the three methods it actually uses.
Diff vs upstream collapses to just two hunks: a 14-line offline branch
at the top of bootstrap() and the ~65-line shim factory at the bottom
of the file.
The flattened `index-metadata.json` shape matches what
register-tasks/register.mjs (in govsignals) already reads, so no
downstream consumer change.
Co-authored-by: Cursor <cursoragent@cursor.com>
…nated union
Bootstrap shouldn't fake projectRef/deploymentId values when those concepts
don't exist (the build container has no API auth in offline mode). Restore
the explicit two-type union:
type OnlineBootstrap = { buildManifest, cliApiClient, projectRef, deploymentId };
type OfflineBootstrap = { buildManifest, cliApiClient };
type BootstrapResult = OnlineBootstrap | OfflineBootstrap;
bootstrap() returns OnlineBootstrap or OfflineBootstrap (no `?? "offline"`
placeholder strings). indexDeployment keeps its upstream-shaped signature
typed against `OnlineBootstrap`. The bridge from union → online shape lives
at the entry-point dispatch:
await indexDeployment(
"projectRef" in result
? result
: { ...result, projectRef: "offline", deploymentId: "offline" }
);
The shim's three methods ignore those placeholder strings, so the call is
correct in both modes. The placeholders are now visible at the boundary
where they're used as type-system shims, not buried in bootstrap.
Co-authored-by: Cursor <cursoragent@cursor.com>
…upstream call site
Two adjustments:
- indexDeployment param type: OnlineBootstrap → BootstrapResult.
- Entry-point dispatch reverts to upstream's single line:
const results = await bootstrap();
await indexDeployment(results);
For BootstrapResult (the union) to satisfy indexDeployment's upstream-shaped
destructure, OfflineBootstrap declares its missing fields explicitly as
`?: undefined`:
type OfflineBootstrap = {
buildManifest: BuildManifest;
cliApiClient: CliApiClient;
projectRef?: undefined;
deploymentId?: undefined;
};
Bootstrap's offline branch still returns just `{ buildManifest, cliApiClient }`
(no fake "offline" placeholder strings). The optional-undefined declaration
just makes the union type-compatible with the destructure.
Inside indexDeployment three call sites add `!` non-null assertions where
they pass projectRef / deploymentId to the (real) CliApiClient signatures.
Three characters of divergence from upstream; the offline shim ignores
those values either way, and online always has real strings.
Co-authored-by: Cursor <cursoragent@cursor.com>
…deploy commands
Match the surface area downstream tooling needs to drive the build and
deploy flows step-for-step against the helpers `buildOnlyDeploy` and
`registerOnlyDeploy` used to call. Re-export:
- CliApiClient: typed client for the deploy / register / finalize / fail
endpoints (kills raw HTTP in downstream scripts).
- login: PAT-aware auth resolver — reads TRIGGER_ACCESS_TOKEN from the
env directly, so it works in CI without a config file.
- getProjectClient: the two-step engine API URL discovery that returns
an engine-bound CliApiClient.
- syncEnvVarsWithServer: deploy.sync.env / .parentEnv -> envvars/import.
- resolveLocalEnvVars + loadDotEnvVars: project-local .env handling.
- createGitMeta: extract git metadata for the deployment record.
- getTmpDir: scoped temp dir with cleanup tracking.
- saveLogs: dump build logs to a file.
- setGithubActionsOutputAndEnvVars: emit deployment metadata to GHA.
These are all already public-ish CLI helpers; making them addressable from
`trigger.dev/internal` lets downstream build/register scripts stay in
lockstep with the upstream CLI's flow without re-implementing the API
surface.
Co-authored-by: Cursor <cursoragent@cursor.com>
Audit of the downstream scripts (build.mjs / register.mjs in govsignals) showed two re-exports nothing imports: - CliApiClient: scripts use `getProjectClient`, which returns an engine-bound CliApiClient internally. Nobody constructs one directly. - saveLogs: added speculatively for `--save-logs` parity but never wired up. Drop both. Future-callers can request them back via PR if there's a real need. Co-authored-by: Cursor <cursoragent@cursor.com>
…on every worker pod
Adds a single env var on the supervisor that names a Kubernetes Secret to
mount via `envFrom` on every worker pod the supervisor schedules. All
key/value pairs in the secret become env vars on the worker container —
resolved by the kubelet at pod creation, so the supervisor never reads
the secret values and needs no extra RBAC.
Use case: keep task-time secrets (DB URLs, API keys, etc.) in Kubernetes
Secrets owned by ops, instead of syncing them through trigger.dev's
webapp + database. Single source of truth in K8s; the secret never
leaves the K8s plane on its way to a task pod.
When the env var is unset (default) the worker pod spec is unchanged —
upstream behavior preserved.
Configured downstream via:
supervisor:
extraEnvVars:
- name: KUBERNETES_WORKER_ENV_FROM_SECRET
value: "trigger-task-secrets"
Renders as:
spec.containers[0].envFrom = [{ secretRef: { name: "trigger-task-secrets" } }]
Co-authored-by: Cursor <cursoragent@cursor.com>
GovSignals/trigger.dev's prerelease push was failing with: Error: invalid reference: invalid repository "GovSignals/charts/trigger" OCI references must be all-lowercase. `github.repository_owner` returns the org name as-cased (`GovSignals`), so the push URL was malformed. Add a tiny `Resolve repo owner (lowercase)` step that outputs `lc` and swap every chart-OCI reference (`oci://ghcr.io/<owner>/charts/...`) to use it. Same fix in both helm-prerelease.yml and release-helm.yml since the bug exists symmetrically — the release workflow just hadn't fired yet. This keeps the workflows portable: any GovSignals fork (or upstream fork) that copies these files renders correct lowercase OCI paths regardless of the org's display casing. Co-authored-by: Cursor <cursoragent@cursor.com>
The fork's chart was pushing to ghcr.io/<owner>/charts/trigger, which collides with an existing package in the GovSignals org owned by a different repo — push was 403'd on the blob HEAD probe. Rename the published artifact to vendored-upstream-trigger by rewriting Chart.yaml's name field at package time. CHART_NAME env now drives both the local filename and the OCI path, and the rename only affects the published artifact (the on-disk chart still says name: trigger so upstream contracts and lint/test stay unchanged). Result: ghcr.io/<owner>/charts/vendored-upstream-trigger:<version>. Consumers reference it via dependencies.name=vendored-upstream-trigger (or alias: trigger for backwards-compatible subchart values). Co-authored-by: Cursor <cursoragent@cursor.com>
🧭 Helm Chart Prerelease PublishedVersion: Install: helm upgrade --install trigger \
oci://ghcr.io/govsignals/charts/vendored-upstream-trigger \
--version "4.4.6-plt663.1-pr12.dc04413"
|
ConProgramming
commented
May 5, 2026
Author
There was a problem hiding this comment.
Not part of the PR, just for making this testable for us
ConProgramming
commented
May 5, 2026
Author
There was a problem hiding this comment.
Not part of the PR, just for making this testable for us
Lets downstream `--containerfile-module` implementations reuse the upstream `parseGenerateOptions` helper that produces (buildArgs, buildEnvVars, postInstallCommands, baseInstructions, packages, baseImage) from a `GenerateContainerfileOptions`. Without the export, every fork that wants to ship its own Containerfile generator has to re-implement the same shape — which drifts as upstream evolves. The function itself is unchanged; just promoted from a file-local `const` to a named export. Co-authored-by: Cursor <cursoragent@cursor.com>
Mirrors the pattern already established for `webapp.extraEnvVars`, `webapp.extraVolumes`, and `webapp.extraVolumeMounts`. Lets consumers inject sidecars (in-pod TLS like nginx/envoy, log shippers, audit agents, etc.) and extra init containers without forking the chart. Both default to empty list — no behavior change for existing users. Use case driving this: GovSignals' FedStart deployments need an in-pod nginx TLS sidecar so supervisor / register-tasks can talk to the webapp over TLS without going through the cluster-edge oauth2-proxy. The umbrella umbrella-style patterns we'd otherwise need (kustomize post-render, fork the webapp template) are all worse. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds two new values knobs that override the hardcoded securityContext
on the built-in `volume-permissions` init container and `token-syncer`
sidecar:
webapp:
initContainers:
securityContext: {} # default: { runAsUser: 1000 }
sidecarContainers:
securityContext: {} # default: { runAsUser: 1000, runAsNonRoot: true }
When unset (default), behavior is unchanged. When set, the user's
securityContext fully replaces the chart's hardcoded one — useful for
operators that need stricter pod-security admission contexts (FedRAMP /
FIPS / Pod Security Standards "restricted" requires runAsNonRoot,
allowPrivilegeEscalation: false, capabilities.drop: [ALL],
seccompProfile.type: RuntimeDefault on every container).
Same idiom as the existing webapp.podSecurityContext and webapp.security-
Context (for the webapp container) — adds the missing knobs for the
init container and sidecar.
Co-authored-by: Cursor <cursoragent@cursor.com>
…efaults
Earlier additions baked GovSignals-specific worker SA defaults into the
chart's values.yaml. That makes the chart less mergeable upstream — it
silently changes default behavior for every consumer. Revert to
neutral defaults so this PR is purely additive (new knobs, off by default):
supervisor.config.kubernetes.workerServiceAccount:
"trigger-worker" -> ""
supervisor.config.kubernetes.workerAutomountServiceAccountToken:
true -> false
worker.serviceAccount.create:
true -> false
worker.serviceAccount.name:
"trigger-worker" -> ""
Comment-only fix: clickhouse example image reference rolled back from
25.6.1-debian-12-r0 to upstream's 25.7.5-debian-12-r0 (stale rebase
artifact).
Consumers who want the previous defaults set them explicitly. The
GovSignals umbrella will set these in its own values.yaml.
Result: 106 lines changed vs upstream main, all pure additions of new
knobs (extraContainers, extraInitContainers, initContainers/
sidecarContainers securityContext, supervisor.extraVolumes/
extraVolumeMounts, worker.serviceAccount, worker.rbac). Each new knob
has an empty/zero default, so chart behavior for default users is
unchanged.
Co-authored-by: Cursor <cursoragent@cursor.com>
Allows running migrations standalone without booting the webapp server. Used by the umbrella chart's pre-install migration Job to avoid the chicken-and-egg of the Job inheriting webapp.extraEnvVars with SKIP_POSTGRES_MIGRATIONS=1. The migrate subcommand forces SKIP_POSTGRES_MIGRATIONS=0 because that's the Job's only purpose. Co-authored-by: Cursor <cursoragent@cursor.com>
Previously the block mounted s3-access-key-id / s3-secret-access-key
from secrets.existingSecret unconditionally, breaking installs where
the operator uses an existing Apollo-managed Secret that doesn't
contain those keys. Now three branches:
- s3.deploy: true -> existing behavior (chart-managed or
s3.auth.existingSecret).
- s3.deploy: false + useIam: false:
- external.existingSecret -> mount from external secret.
- chart-managed accessKeyId -> requires secrets.enabled: true
so secrets.yaml actually emits
the s3-access-key-id key.
- s3.deploy: false + useIam: true -> skip the entire S3 env block.
Co-authored-by: Cursor <cursoragent@cursor.com>
Electric requires /app/persistent to be writable for its state dir. The template previously rendered no volumes at all, so every Electric pod crashed with 'could not make directory /app/persistent/state'. This commit always renders a /app/persistent emptyDir and adds extraVolumes/extraVolumeMounts knobs matching the webapp/supervisor pattern for consumers that want PVC-backed durability. Co-authored-by: Cursor <cursoragent@cursor.com>
When clickhouse.deploy: true AND auth.existingSecret is set, the fork's clickhouse.url helper still interpolated values.auth.password literally, so webapp authenticated with a stale default while the Bitnami chart used the real password from the secret. This commit switches the deploy-mode URL to use a CLICKHOUSE_PASSWORD env-var indirection that resolves from existingSecret when set. Co-authored-by: Cursor <cursoragent@cursor.com>
Carries the four PLT-663 chart fixes above. Co-authored-by: Cursor <cursoragent@cursor.com>
3 tasks
@kubernetes/client-node@1.0.0 tightened V1ObjectMeta.annotations from 'unknown' to 'Record<string, string>'. The parsed JSON from the KUBERNETES_WORKER_POD_ANNOTATIONS env var lands as 'unknown' which now fails TS2322 at the assignment site. Cast (or validate) at the parse boundary. Co-authored-by: Cursor <cursoragent@cursor.com>
PR #12 added supervisor.config.kubernetes.workerPodSecurityContext, workerContainerSecurityContext, and workerPodAnnotations to values.yaml but the supervisor.yaml template never read them. The supervisor's Kubernetes workload manager reads KUBERNETES_WORKER_POD_SECURITY_CONTEXT, KUBERNETES_WORKER_CONTAINER_SECURITY_CONTEXT, and KUBERNETES_WORKER_POD_ANNOTATIONS env vars at runtime (JSON-parsed) and applies them to every worker pod it schedules. Without this wiring, worker pods on FedStart / GameWarden deployments are missing their compliance-required securityContext entries and would be rejected by pod-security admission. Co-authored-by: Cursor <cursoragent@cursor.com>
82 commits of upstream churn (post triggerdotdev/main sync). Resolutions:
- .github/workflows/helm-prerelease.yml -> hand-merged (kept our
CHART_NAME rename + steps.owner.outputs.lc; adopted upstream's
STEPS_VERSION_OUTPUTS_VERSION env-var pattern for Zizmor compliance)
- .github/workflows/release-helm.yml -> hand-merged (kept our
steps.owner.outputs.lc; adopted upstream env-var pattern)
- apps/webapp/app/v3/services/initializeDeployment.server.ts
-> hand-merged: accepted upstream's PR triggerdotdev#3610 refactor that moves
getDeploymentImageRef() inside createDeploymentWithNextVersion's
per-attempt callback, and re-applied our DEPLOY_IMAGE_OVERRIDE
short-circuit inside the same callback.
Auto-merged (kept our patches intact):
- hosting/k8s/helm/templates/supervisor.yaml (upstream improved OTLP
endpoint to fully-qualified .svc.cluster.local URL; our extraVolumes,
extraVolumeMounts, and security-context wiring preserved)
- hosting/k8s/helm/templates/webapp.yaml (our extraContainers,
extraInitContainers, useIam gate, CLICKHOUSE_PASSWORD env preserved)
- hosting/k8s/helm/values.yaml (our electric.extraVolumes,
s3.external.useIam, etc preserved)
- pnpm-lock.yaml (auto-merged cleanly)
Smoke tests:
- helm dependency build hosting/k8s/helm OK
- helm template hosting/k8s/helm OK (renders without errors)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
||
| # Push to GHCR OCI registry | ||
| helm push "$CHART_PACKAGE" "oci://${{ env.REGISTRY }}/${{ github.repository_owner }}/charts" | ||
| helm push "$CHART_PACKAGE" "oci://${{ env.REGISTRY }}/${{ steps.owner.outputs.lc }}/charts" |
| echo '```bash' | ||
| echo "helm upgrade --install trigger \\" | ||
| echo " oci://${{ env.REGISTRY }}/${{ github.repository_owner }}/charts/${{ env.CHART_NAME }} \\" | ||
| echo " oci://${{ env.REGISTRY }}/${{ steps.owner.outputs.lc }}/charts/${{ env.CHART_NAME }} \\" |
|
|
||
| # Push to GHCR OCI registry | ||
| helm push "$CHART_PACKAGE" "oci://${{ env.REGISTRY }}/${{ github.repository_owner }}/charts" | ||
| helm push "$CHART_PACKAGE" "oci://${{ env.REGISTRY }}/${{ steps.owner.outputs.lc }}/charts" |
Reconciles plt-663-chart-fixes with gs-v4.4.4's main-merge (PR #12). Conflicts resolved: - hosting/k8s/helm/Chart.yaml -> hand-merged: bumped prerelease base from 4.4.5 to 4.4.6 to track gs-v4.4.4's version bump (PR triggerdotdev#3500 + triggerdotdev#3501 from upstream). New chart version is now 4.4.6-plt663.1. appVersion v4.4.6 from upstream. - hosting/k8s/helm/Chart.lock -> took gs-v4.4.4's lock (clickhouse 9.4.4 from upstream PR triggerdotdev#3524) then re-ran `helm dependency build`. Auto-merged cleanly: - hosting/k8s/helm/templates/supervisor.yaml (our security-context env wiring + worker pod annotations type fix preserved alongside upstream's extraVolumes/extraVolumeMounts and OTLP fully-qualified URL). Upstream-coverage check (4 PR #13 fixes vs newly-synced upstream main): - _helpers.tpl ($(CLICKHOUSE_PASSWORD) in deploy: true branch) -> NOVEL Upstream's deploy branch still interpolates `.Values.clickhouse.auth.password` literally; our fix is still required when auth.existingSecret is set. - electric.yaml (/app/persistent mount + extraVolumes) -> NOVEL Upstream electric.yaml still renders no volumes/volumeMounts at all. - webapp.yaml (S3 useIam gate + CLICKHOUSE_PASSWORD env) -> NOVEL Upstream webapp.yaml has no useIam reference and no CLICKHOUSE_PASSWORD env var; both are still required. - values.yaml (electric.extraVolumes + s3.external.useIam) -> NOVEL Upstream values.yaml has neither key under electric: nor s3.external:. All 4 fixes remain net-new fork-side patches. No PR #13 commits become redundant from this upstream sync. Smoke tests: - helm dependency build hosting/k8s/helm OK - helm template hosting/k8s/helm OK (renders v4.4.6-plt663.1) Co-authored-by: Cursor <cursoragent@cursor.com>
fix(chart): PLT-663 self-hosted install bugs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is the clean replacement for stale #2, #3, and #11. 14 commits cleanly stacked on top of current upstream
main(past v4.4.4). The first 4 commits are the original GovSignals features. The next 8 are upstream-quality polish + tests. The next is a revert of theDEPLOY_VERSION_SUFFIXwork (custom image tagging is handled byDEPLOY_IMAGE_OVERRIDE). The last commit restores three blocks from upstream that earlier commits accidentally deleted.The 4 feature commits
feat(cli): add two-phase deploy (build-only + register-only)—--build-only,--register-only,--containerfile-module,--skip-digest. Replaces Add two-phase deploy flags #2.feat(webapp): add DEPLOY_IMAGE_OVERRIDE env var— bypass auto-generated image tags. Replaces feat: Add DEPLOY_IMAGE_OVERRIDE for custom image references #3.feat(supervisor): parameterize worker-pod annotations via KUBERNETES_WORKER_POD_ANNOTATIONS— JSON env var, default{}. Replaces the hardcoded Rubixpod-certannotation.feat(helm): add supervisor.extraVolumes/extraVolumeMounts— mirrors the existing webapp pattern.Polish commits
fix(cli): preserve online API flow in managed-index-controller— restores upstream'sCliApiClientcalls as the default. Two-phase deploy is opt-in viaTRIGGER_ENV_VARSbuild arg.fix(webapp): preserve full suffix in deployment version comparison(no longer needed — see commit 13)chore(cli): gitignore .triggerdeploy.json— root.gitignore+trigger initboilerplate.fix(helm): default worker Role to empty rules— explicit opt-in viaworker.rbac.rules.feat(supervisor): parameterize worker-pod and -container securityContext— replaces hardcodedrunAsUser: 1000blocks (which break OpenShift) withKUBERNETES_WORKER_POD_SECURITY_CONTEXTandKUBERNETES_WORKER_CONTAINER_SECURITY_CONTEXTenv vars. Defaults preserve upstream's pre-fork behavior. FactorsJsonObjectEnvhelper.fix(webapp): treat empty DEPLOY_IMAGE_OVERRIDE as unset— empty string handling + security note.docs(helm): document supervisor extraVolumes, securityContext, and pod annotations—values.yaml+README.md.test: cover JsonObjectEnv parser and version-suffix logic— vitest cases for the env helper.Cleanup
revert: remove DEPLOY_VERSION_SUFFIX and version-suffix handling— drops 9 webapp + run-engine files we no longer need. Custom image tagging is handled byDEPLOY_IMAGE_OVERRIDE.fix: restore upstream blocks accidentally removed in earlier commits— three blocks from upstream that earlier commits' StrReplace edits dropped:KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED+KUBERNETES_POD_DNS_NDOTSenv vars (apps/supervisor/src/env.ts)dnsConfigblock in worker pod spec (apps/supervisor/src/workloadManager/kubernetes.ts)webapp.serviceAccountblock (hosting/k8s/helm/values.yaml)Diff stats
Tag
gs-v4.4.4-rc1at SHAd4b5ff93bd8272d51c4d203e05906143d66fa505.Closes