Skip to content

feat: GovSignals fork patches on top of upstream main (replaces #2, #3, #11)#12

Draft
ConProgramming wants to merge 52 commits into
mainfrom
gs-v4.4.4
Draft

feat: GovSignals fork patches on top of upstream main (replaces #2, #3, #11)#12
ConProgramming wants to merge 52 commits into
mainfrom
gs-v4.4.4

Conversation

@ConProgramming
Copy link
Copy Markdown

@ConProgramming ConProgramming commented Apr 29, 2026

Summary

This is the clean replacement for stale #2, #3, and #11. 14 commits cleanly stacked on top of current upstream main (past v4.4.4). The first 4 commits are the original GovSignals features. The next 8 are upstream-quality polish + tests. The next is a revert of the DEPLOY_VERSION_SUFFIX work (custom image tagging is handled by DEPLOY_IMAGE_OVERRIDE). The last commit restores three blocks from upstream that earlier commits accidentally deleted.

The 4 feature commits

  1. feat(cli): add two-phase deploy (build-only + register-only)--build-only, --register-only, --containerfile-module, --skip-digest. Replaces Add two-phase deploy flags #2.
  2. feat(webapp): add DEPLOY_IMAGE_OVERRIDE env var — bypass auto-generated image tags. Replaces feat: Add DEPLOY_IMAGE_OVERRIDE for custom image references #3.
  3. feat(supervisor): parameterize worker-pod annotations via KUBERNETES_WORKER_POD_ANNOTATIONS — JSON env var, default {}. Replaces the hardcoded Rubix pod-cert annotation.
  4. feat(helm): add supervisor.extraVolumes/extraVolumeMounts — mirrors the existing webapp pattern.

Polish commits

  1. fix(cli): preserve online API flow in managed-index-controller — restores upstream's CliApiClient calls as the default. Two-phase deploy is opt-in via TRIGGER_ENV_VARS build arg.
  2. fix(webapp): preserve full suffix in deployment version comparison (no longer needed — see commit 13)
  3. chore(cli): gitignore .triggerdeploy.json — root .gitignore + trigger init boilerplate.
  4. fix(helm): default worker Role to empty rules — explicit opt-in via worker.rbac.rules.
  5. feat(supervisor): parameterize worker-pod and -container securityContext — replaces hardcoded runAsUser: 1000 blocks (which break OpenShift) with KUBERNETES_WORKER_POD_SECURITY_CONTEXT and KUBERNETES_WORKER_CONTAINER_SECURITY_CONTEXT env vars. Defaults preserve upstream's pre-fork behavior. Factors JsonObjectEnv helper.
  6. fix(webapp): treat empty DEPLOY_IMAGE_OVERRIDE as unset — empty string handling + security note.
  7. docs(helm): document supervisor extraVolumes, securityContext, and pod annotationsvalues.yaml + README.md.
  8. test: cover JsonObjectEnv parser and version-suffix logic — vitest cases for the env helper.

Cleanup

  1. revert: remove DEPLOY_VERSION_SUFFIX and version-suffix handling — drops 9 webapp + run-engine files we no longer need. Custom image tagging is handled by DEPLOY_IMAGE_OVERRIDE.

  2. fix: restore upstream blocks accidentally removed in earlier commits — three blocks from upstream that earlier commits' StrReplace edits dropped:

    • KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED + KUBERNETES_POD_DNS_NDOTS env vars (apps/supervisor/src/env.ts)
    • corresponding dnsConfig block in worker pod spec (apps/supervisor/src/workloadManager/kubernetes.ts)
    • webapp.serviceAccount block (hosting/k8s/helm/values.yaml)

Diff stats

20 files changed, ~2,100 insertions, ~125 deletions

Tag

gs-v4.4.4-rc1 at SHA d4b5ff93bd8272d51c4d203e05906143d66fa505.

Closes

amarbakir-govsignals and others added 4 commits April 29, 2026 16:19
Adds support for splitting deployment into separate build and register
phases. New CLI options: --build-only, --register-only, --registry,
--repository, --base-image-node, --containerfile-module, --skip-digest.

Also adds: worker pod service account configuration, security context,
DEPLOY_VERSION_SUFFIX env var, custom containerfile module support,
and index metadata extraction from Docker images.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Made-with: Cursor
…rences

When set, bypasses auto-generation of image tags and uses the provided
image reference directly for all deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…WORKER_POD_ANNOTATIONS

Replaces the hardcoded Rubix `com.palantir.rubix.service/pod-cert` annotation
with a JSON-shaped env var so the same supervisor image can run in environments
that need different per-pod annotations (Rubix in FedStart, none/empty in
GameWarden).

Default is `{}` so behavior matches upstream when the env is unset.
Mirrors the existing webapp.extraVolumes / webapp.extraVolumeMounts pattern.
Required for compliance environments that need to mount a custom CA bundle
ConfigMap into the supervisor pod (e.g. Palantir Rubix or GameWarden CAs)
and point NODE_EXTRA_CA_CERTS at it.

Both extras render unconditionally — they no longer depend on the legacy
bootstrap-disabled volume block — so any consumer can opt in.
Conner Aldrich added 11 commits April 29, 2026 16:42
The two-phase deploy work (commit 8f98ebb) replaced the online flow
(getEnvironmentVariables + createDeploymentBackgroundWorker + failDeployment
via CliApiClient) with an offline flow that reads env vars from a
TRIGGER_ENV_VARS build arg and writes index-metadata.json to disk.

This breaks Trigger.dev cloud deployments because they don't pass the
build arg.

Restore the online flow as the default and gate the offline flow on the
presence of TRIGGER_ENV_VARS. When set, the controller reads env from
the build arg and writes metadata to disk for register-only to consume;
when unset, it behaves exactly as upstream does today.
compareDeploymentVersions used `version.split('-')` and destructured the
second element as the suffix, which silently dropped everything after the
second hyphen. Two versions with multi-hyphen suffixes like
`20260101.1-pre-rc.1` and `20260101.1-pre-rc.2` would compare as equal.

The same bug existed in the SQL ORDER BY clauses in DeploymentListPresenter
and TestPresenter (`split_part(version, '-', 2)`) — they would tie-break
on `'pre'` for both versions.

Switch to `indexOf('-') + slice` in TS and `COALESCE(substring(version from
'-(.*)$'), '')` in SQL so the full suffix participates in the comparison.
`trigger deploy --build-only` writes `.triggerdeploy.json` to the project
root for the host-side CLI to consume in a follow-up `--register-only`
phase. Add it to:

- the upstream repo's root .gitignore (so it never lands in CI artifacts
  if a contributor runs the build flow locally)
- the project .gitignore template that `trigger init` writes (so user
  projects don't accidentally commit their deploy manifest)
The original `worker-serviceaccount.yaml` template hardcoded `configmaps:
get,list` and `secrets:get` as default Role rules when an operator opted
into `worker.rbac.create: true`. That's too broad to ship as a default and
violates least-privilege expectations of most cluster admins.

Default to an empty rules array. Operators who actually need the worker
pods to call the Kubernetes API must explicitly supply rules via
`worker.rbac.rules`.
The two-phase deploy commit (8f98ebb) added hardcoded securityContext
blocks at both pod level (`runAsNonRoot: true, runAsUser: 1000, fsGroup:
1000`) and container level (`runAsNonRoot, runAsUser:1000,
allowPrivilegeEscalation: false, capabilities.drop: [ALL]`) on every
worker pod the supervisor schedules.

Hardcoding `runAsUser: 1000` makes the chart unschedulable on OpenShift
and other clusters that enforce arbitrary-UID SCC ranges. It's also
opinionated: some compliance regimes also require seccompProfile, others
forbid it.

Replace both with env-driven JSON objects:
  - KUBERNETES_WORKER_POD_SECURITY_CONTEXT (V1PodSecurityContext)
  - KUBERNETES_WORKER_CONTAINER_SECURITY_CONTEXT (V1SecurityContext)

Default is empty `{}` for both, which preserves upstream's pre-fork
behavior of not setting a securityContext at all.

Operators who want the previous fork defaults set them explicitly via
their orchestration (e.g. supervisor.extraEnvVars in the Helm chart).

Also factor a JsonObjectEnv helper into envUtil.ts so this and the
existing KUBERNETES_WORKER_POD_ANNOTATIONS env var share validation
logic. Adds JsonStringMap and JsonAny re-exports for callers.
…urity implications

Operators sometimes set env vars to empty strings (e.g. via Helm
`extraEnvVars` with a templated value that resolves to empty), which
would currently take the override branch in initializeDeployment because
`if (env.DEPLOY_IMAGE_OVERRIDE)` is truthy on empty strings? Actually
empty strings are falsy in JS, but Zod's optional().transform pattern
needs explicit handling to be unambiguous.

Add an explicit transform that maps empty string to undefined, and
expand the docstring to call out:
  - the bypass of getDeploymentImageRef (no ECR repo creation, no
    per-project tagging)
  - that this is single-tenant only (every project shares the same
    image digest, which is fine for a self-hosted single-org install
    but a security issue in multi-tenant setups)
…d annotations

Match webapp's existing CA-bundle commentary in the supervisor section
of values.yaml, and add a 'CA bundle injection' / 'Worker pod security
context' / 'extra worker pod annotations' section to the README that
shows the canonical envar + mount pattern operators are most likely to
need.
- envUtil.test.ts: JsonObjectEnv with default (string) validator and with
  JsonAny validator, including non-JSON, non-object, wrong-value-type,
  and named error message cases.
- deploymentVersions.test.ts: full coverage of compareDeploymentVersions
  including the multi-hyphen suffix regression case
  (20250101.1-pre-rc.1 vs 20250101.1-pre-rc.2 must not tie at 0).
- calculateNextBuildVersion.test.ts: same-day increment, new-day reset,
  caller-provided suffix wins, and the multi-hyphen-suffix-on-input
  case.

webapp tests verified locally (14/14 passing). Supervisor test is
formed correctly; full local run is blocked by an unrelated upstream
build error in @trigger.dev/core's sharedRuntimeManager.
The version-suffix work (DEPLOY_VERSION_SUFFIX env var, suffix-aware
ordering in calculateNextBuildVersion / compareDeploymentVersions, SQL
ORDER BY rewrites in DeploymentListPresenter / TestPresenter,
suffix-aware comparisons in devQueueConsumer, test setup duplication
in run-engine) is no longer needed.

For our self-hosted deployment:
  - upstream's default version scheme (YYYYMMDD.N) is fine for tracking
    deployments
  - custom image tagging is handled by DEPLOY_IMAGE_OVERRIDE (separate
    feature, kept)

Restore the affected files to upstream main. Drop the related test
files and the suffix-fix commit's coverage.
Three blocks were dropped accidentally by overly-greedy StrReplace edits
in earlier commits. Restore them to match upstream main:

1. apps/supervisor/src/env.ts: KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED
   and KUBERNETES_POD_DNS_NDOTS env vars (and their multi-line comment
   explaining the ndots-tuning rationale) — dropped in 171b09f
   (feat(supervisor): parameterize worker-pod and -container
   securityContext) when the surrounding KUBERNETES_WORKER_*
   parameterization was added.

2. apps/supervisor/src/workloadManager/kubernetes.ts: corresponding
   dnsConfig block in the worker pod spec that consumes the env vars —
   dropped in the same commit.

3. hosting/k8s/helm/values.yaml: webapp.serviceAccount block (create,
   name, annotations) — dropped in 7c5f3a5 (fix(helm): default worker
   Role to empty rules) when the worker.rbac block was edited; the
   StrReplace replacement string accidentally swallowed the unrelated
   webapp.serviceAccount block above it.

No functional change to our patches; pure restoration of upstream code.
The `--base-image-node` CLI option was a simple alternative to the more
flexible `--containerfile-module` for swapping the per-deploy base image.
We don't use it anywhere; `--containerfile-module` covers the same use
case and more (the `custom-base-image.mjs` example shows the migration
pattern).

Drop:
  - `baseImageNode` zod schema field on DeployCommandOptions
  - `--base-image-node` Commander flag
  - `baseImageNode` field on BuildImageOptions, DepotBuildImageOptions,
    SelfHostedBuildImageOptions, GenerateContainerfileOptions, and
    BuildWorkerOptions
  - `baseImageNode` plumbing through buildImage / writeContainerfile
  - The override branch in parseGenerateOptions
  - All references in examples/README.md (point users at
    --containerfile-module + the custom-base-image.mjs example instead)
ConProgramming and others added 12 commits May 4, 2026 15:19
…ryCatch

Collapse the if/else split between DEPLOY_IMAGE_OVERRIDE and getDeploymentImageRef
into one tryCatch wrapper around an inline async function, then destructure the
result. Removes the let-over-let pattern and a layer of nested error handling
without changing behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
Three independent simplifications identified during PR review where the fork
adds surface area we don't actually exercise:

1. TRIGGER_ENV_VARS → TRIGGER_INDEX_OFFLINE flag
   The original two-phase work passed a JSON map of env vars from host to
   the in-container indexer to bake into the build. In practice every
   --build-only invocation passes {} (the host has no API auth to fetch
   real env vars from), and the JSON-parsing capability is never exercised.
   Replace it with a boolean TRIGGER_INDEX_OFFLINE=1 flag that just selects
   the offline-vs-online indexer path. Drops ~25 lines of parse/validate
   from managed-index-controller and the index-time env-var injection from
   buildImage. Rename indexEnvVars option to offlineIndex; only --build-only
   sets it.

2. Restore shouldLoad to upstream
   The hardcoded `return true` was working around the need to extract index
   files from the built image, but only --build-only actually does that
   extraction. Restore upstream's switch logic and instead set
   `load: true` explicitly at the build-only call site.

3. Remove --skip-digest flag
   Used in exactly one path: register-only mode when the host doesn't have
   the image digest. But passing imageDigest: undefined into finalize achieves
   the same result (JSON.stringify drops undefined keys). Remove the flag and
   pass imageDigest unconditionally.

Net: -27 lines, no behavior change for either online or offline paths.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ile helpers

Two unused additions removed during PR review:

1. _deployCommand's `if (isLocalBuild) read index-metadata.json + create
   BackgroundWorker / else throw` block. This block was added by the
   two-phase deploy work but is never reached: --build-only and --register-only
   take their own dedicated functions (buildOnlyDeploy / registerOnlyDeploy)
   which early-return at the top of _deployCommand, and the normal cloud
   deploy path uses upstream's in-container indexing via the API. Worse,
   the else-branch unconditionally `throw`s on remote (Depot) builds,
   regressing upstream behavior. Restore _deployCommand to flow directly
   from buildResult validation to getDeployment, matching upstream.

2. Three formatter helpers in containerfile-template.ts (formatBuildArgs,
   formatEnvVars, formatRunCommands). Zero references in fork or in
   downstream consumers; speculative API surface that was never wired up.

Net: -135 lines.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ields

Two more cleanups identified during PR review:

1. Half-implemented "automatic image retagging" block in registerOnlyDeploy.
   Was guarded by a TODO comment ("Implement automatic image retagging") and
   only ever logged shell hints (aws ecr put-image / docker tag) for the
   user to run by hand, then `await`ed an unconditional 5-second sleep.
   Never useful: callers that need fixed image tags use DEPLOY_IMAGE_OVERRIDE
   on the webapp side so deployData.imageTag and deployment.imageTag match.
   Remove the entire block (~46 lines).

2. simulatedVersion + buildPath fields in the deployData object written to
   .triggerdeploy.json. Both are written by buildOnlyDeploy but never read
   by registerOnlyDeploy. simulatedVersion is still used elsewhere in
   buildOnlyDeploy (deploymentVersion arg + saveLogs filename); we only
   drop the dead deployData entry.

Net: -44 lines.
Co-authored-by: Cursor <cursoragent@cursor.com>
Mirror the pattern @trigger.dev/build already uses: a dedicated `./internal`
subpath export for downstream tooling that wants to drive the build / deploy
pipeline programmatically without going through the `trigger` CLI binary.

Adds packages/cli-v3/src/internal.ts re-exporting:

  - loadConfig (trigger.config.ts loader)
  - buildWorker + BuildWorkerOptions (esbuild bundling)
  - buildImage + BuildImageOptions + BuildImageResults (docker buildx orchestration)
  - generateContainerfile + parseGenerateOptions + GenerateContainerfileOptions (Containerfile templating)
  - CliApiClient (typed wrapper around the deploy / register / finalize endpoints)

These modules already existed; this commit just makes them addressable from
outside the CLI package. Also widens the visibility of three buildImage result
types (BuildImageSuccess, BuildImageFailure, BuildImageResults) from
file-private to exported so consumers can statically discriminate the result
union.

No behavior change. The CLI binary continues to import the same modules
internally; nothing on the runtime path is affected.

Co-authored-by: Cursor <cursoragent@cursor.com>
The two-phase deploy machinery added in 8f98ebb has been replaced with two
~150-line driver scripts living next to the GovSignals tasks Dockerfile,
which import build/deploy primitives via the new `trigger.dev/internal`
subpath export instead of going through the CLI binary. None of the CLI
flags or helper functions are reachable any more.

Removed:

- buildOnlyDeploy (~170 lines) and registerOnlyDeploy (~340 lines)
- CLI options --build-only, --register-only, --registry, --repository,
  --containerfile-module
- Zod schema fields buildOnly, registerOnly, registry, repository,
  containerfileModule
- Early-return shims at the top of _deployCommand
- The containerfileModule passthrough in _deployCommand's buildWorker call
  (the option is gone; buildWorker still accepts the parameter for callers
  using `trigger.dev/internal` directly)
- init.ts changes that taught `trigger init` to add `.triggerdeploy.json`
  to the project .gitignore
- Root .gitignore entry for .triggerdeploy.json

What stays:

- managed-index-controller.ts offline-mode bootstrap (TRIGGER_INDEX_OFFLINE=1
  env var) — still needed by the offline indexer that runs inside the build
  container
- buildImage.ts offlineIndex option, generateContainerfile, parseGenerateOptions
- buildWorker.ts containerfileModule plumbing (now consumed via
  `trigger.dev/internal`)
- All supervisor / helm / webapp fork patches

Net: -520 lines.
Co-authored-by: Cursor <cursoragent@cursor.com>
Three follow-up cleanups now that --build-only / --register-only are gone
and the build is driven through `trigger.dev/internal`:

1. Delete packages/cli-v3/examples/. The README explicitly documented the
   `--containerfile-module` CLI flag, which has been removed; nothing else
   imports these example modules. Working reference for custom containerfile
   templates lives at infrastructure/containers/dockerfiles/trigger-dev/deployer/
   in the GovSignals repo.

2. Restore packages/cli-v3/src/commands/deploy.ts to upstream. After the
   helper functions were deleted in 12a7128, the only remaining diff was
   a handful of imports referenced exclusively by those helpers
   (CreateBackgroundWorkerRequestBody, writeJSONFile, readJSONFile,
   pathExists, alwaysExternal, readdirSync), a re-ordering of
   updateTriggerPackages vs verifyDirectory that was needed only for the
   early-return shims, a stray blank line, and two `\u2502` escapes that
   crept in from a Python edit. None of those carry their own behavior;
   `git checkout upstream/main -- deploy.ts` is exact equivalence.

3. buildImage.ts: revert remoteBuildImage's offlineIndex pass-through.
   It was added for completeness but no caller ever sets offlineIndex on
   a Depot/remote build (Depot has API access; offline mode is meaningless
   there). Drop the type field + the build-arg conditional.

   Also drop a commented-out cpSync debug block that referenced a path on
   my local machine, the unused `cpSync` import, and revert one
   `logger.info(\`docker ${args}\`)` back to `logger.debug` (it was an
   unintentional bump while debugging the offline extraction logic and
   floods stderr at the default log level).

Net: -396 lines, with the cli-v3 source tree closer to upstream.
Co-authored-by: Cursor <cursoragent@cursor.com>
…alBuildImage

The fork's `localBuildImage` carried ~250 lines of host-side image
manipulation that turn out to be unnecessary once you notice the offline
indexer is already wired into the multi-stage Containerfile.

The Containerfile's indexer stage writes index-metadata.json (and on
failure, index-error.json + process.exit(1)) inside the build container,
and the final stage already COPYs those artifacts to /app/index/. They
travel into the runtime image and are read by downstream tooling that
mounts the image (in our case, the register-tasks Helm Job's init
container) — no host-side extraction needed.

Removed:

- The `docker create` + `docker cp` + `docker rm` extraction block
  (~150 lines). Index files reach consumers via the runtime image, not
  via the build host.
- The explicit `docker push` + `docker inspect` digest retrieval block
  (~100 lines). With `--output type=image,push=true` (selected by the
  existing `getOutputOptions` helper) BuildKit pushes natively and the
  pushed manifest digest lands in `metadata.json` — exactly what
  upstream's pre-fork `localBuildImage` already reads. Single-arch
  builds (which is what we use for self-hosted) report a stable digest
  via this path; multi-arch may need `docker inspect` back, can be
  re-added if anyone hits it.
- Redundant `--load` flag (already implied by `type=docker` output).
- Redundant `-t imageTag` flag (already in `name=imageTag` inside
  `outputOptions`).
- Now-unused `mkdirSync` and `statSync` fs imports.
- The `let buildDigest`/`let finalDigest` two-step (we just have
  `const digest` again, parsed once from metadata.json).

What stays:

- `apiClient?: CliApiClient` optional + null-guard (build.mjs has no
  CLI session).
- `offlineIndex` build-arg passthrough.
- `generateContainerfile` / `parseGenerateOptions` exports + the
  `containerfileModule` plumbing.
- `BuildImageSuccess`/`Failure`/`Results` type exports for `internal.ts`.

Net: -242 lines from buildImage.ts. Fork PR's diff for this file drops
from +335/-31 to +89/-27 vs upstream main.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ptions

The fork's two-phase deploy work removed the `deploymentSpinner?: any`
field from `BuildImageOptions` (it was never used by buildImage itself,
just declared for external callers). Restoring it brings the type
declaration closer to upstream — the only fork-only deltas in this
section are now `apiClient?` (made optional) and `offlineIndex?` (added).

Co-authored-by: Cursor <cursoragent@cursor.com>
Three small refactors after the BuildKit-native cleanup:

1. buildImage.ts: move loadContainerfileModule + generateContainerfile back
   above parseGenerateOptions to match upstream's ordering. The fork's
   addition of loadContainerfileModule had pushed the helpers below their
   sole consumer; restoring upstream order makes the file's structure
   easier to scan against the upstream version.

2. buildImage.ts: inline `BASE_IMAGE[options.runtime]` directly into the
   parseGenerateOptions return literal instead of binding it to a local
   const first. One fewer named binding for a value used exactly once.

3. managed-index-controller.ts: introduce offlineEnvShim() that returns the
   same `{ success: true, data: { variables } }` envelope as
   `cliApiClient.getEnvironmentVariables`. Lets the indexer treat both
   modes uniformly — single `if (!$env.success)` check, single
   `$env.data.variables` use site — instead of branching on `result.mode`
   to assemble the envVars value.

No behavior change.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
build.mjs only consumes loadConfig, buildWorker, and buildImage from
trigger.dev/internal. Everything else the fork was re-exporting was
unused.

internal.ts: trimmed from 6 named exports + 4 type re-exports to just the
three function consumers actually use.

buildImage.ts: dropped `export` from
- BuildImageSuccess / BuildImageFailure / BuildImageResults (now file-local)
- parseGenerateOptions (only generateBunContainerfile + generateNodeContainerfile call it, both in this file)
- generateBunContainerfile / generateNodeContainerfile (only generateContainerfile dispatches to them)

Net: -10 lines and the buildImage.ts diff vs upstream drops from +89/-24 to +67/-4.
Co-authored-by: Cursor <cursoragent@cursor.com>
ConProgramming and others added 11 commits May 5, 2026 00:48
Co-authored-by: Cursor <cursoragent@cursor.com>
The previous JSDoc still referenced `trigger deploy --build-only` /
`--register-only` (CLI flags that were removed) and described offline
mode as writing files "for the host-side CLI to read." That isn't what
happens any more: in offline mode the indexer writes the artifacts
into the working directory, the multi-stage Containerfile copies them
into the final image, and downstream tooling reads them out of the
runtime image (in our case via a Helm Job init container).

Update the comment to match the actual flow and reference
`trigger.dev/internal`'s buildImage({ offlineIndex: true }) as the
intended consumer in self-hosted setups. Also drop the now-meaningless
"behaves exactly as it did before two-phase deploy was added" line.

Co-authored-by: Cursor <cursoragent@cursor.com>
…yment

The discriminated `OnlineBootstrap | OfflineBootstrap` union forced
indexDeployment's param to stay opaque (`result: BootstrapResult`) and
required `result.mode === "offline"` checks at every branch. Collapse
the union into a single shape with the API-mode fields optional, then
restore upstream's destructured param signature:

    async function indexDeployment({
      cliApiClient,
      projectRef,
      deploymentId,
      buildManifest,
    }: BootstrapResult)

Offline mode is detected by `!cliApiClient` instead of a `mode` tag.
Bootstrap returns `{ buildManifest }` for offline and the full record
for online. The three offline branches (env vars / metadata write /
error write) all use `!cliApiClient` (or pair it with `!deploymentId`
where the type checker needs both narrowed).

Net: same behavior, but `indexDeployment`'s shape now matches upstream.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ng indexDeployment

Previous offline-mode plumbing carved branching into indexDeployment:
discriminated union types, `result.mode === "offline"` checks at every
API call site, parallel disk-write paths next to the API-call paths.

Replace with a CliApiClient stub that, instead of issuing HTTP
requests, writes the same payloads to disk:

  - getEnvironmentVariables → returns `{ variables: {} }`
  - createDeploymentBackgroundWorker → writes the body's metadata (plus
    buildPlatform / targetPlatform) to `index-metadata.json`
  - failDeployment → writes the error to `index-error.json`

bootstrap() picks the shim when TRIGGER_INDEX_OFFLINE=1 and the real
CliApiClient otherwise. indexDeployment is unchanged — bit-identical to
upstream — because both implementations satisfy the same interface for
the three methods it actually uses.

Diff vs upstream collapses to just two hunks: a 14-line offline branch
at the top of bootstrap() and the ~65-line shim factory at the bottom
of the file.

The flattened `index-metadata.json` shape matches what
register-tasks/register.mjs (in govsignals) already reads, so no
downstream consumer change.

Co-authored-by: Cursor <cursoragent@cursor.com>
…nated union

Bootstrap shouldn't fake projectRef/deploymentId values when those concepts
don't exist (the build container has no API auth in offline mode). Restore
the explicit two-type union:

  type OnlineBootstrap = { buildManifest, cliApiClient, projectRef, deploymentId };
  type OfflineBootstrap = { buildManifest, cliApiClient };
  type BootstrapResult = OnlineBootstrap | OfflineBootstrap;

bootstrap() returns OnlineBootstrap or OfflineBootstrap (no `?? "offline"`
placeholder strings). indexDeployment keeps its upstream-shaped signature
typed against `OnlineBootstrap`. The bridge from union → online shape lives
at the entry-point dispatch:

  await indexDeployment(
    "projectRef" in result
      ? result
      : { ...result, projectRef: "offline", deploymentId: "offline" }
  );

The shim's three methods ignore those placeholder strings, so the call is
correct in both modes. The placeholders are now visible at the boundary
where they're used as type-system shims, not buried in bootstrap.

Co-authored-by: Cursor <cursoragent@cursor.com>
…upstream call site

Two adjustments:

- indexDeployment param type: OnlineBootstrap → BootstrapResult.
- Entry-point dispatch reverts to upstream's single line:
    const results = await bootstrap();
    await indexDeployment(results);

For BootstrapResult (the union) to satisfy indexDeployment's upstream-shaped
destructure, OfflineBootstrap declares its missing fields explicitly as
`?: undefined`:

  type OfflineBootstrap = {
    buildManifest: BuildManifest;
    cliApiClient: CliApiClient;
    projectRef?: undefined;
    deploymentId?: undefined;
  };

Bootstrap's offline branch still returns just `{ buildManifest, cliApiClient }`
(no fake "offline" placeholder strings). The optional-undefined declaration
just makes the union type-compatible with the destructure.

Inside indexDeployment three call sites add `!` non-null assertions where
they pass projectRef / deploymentId to the (real) CliApiClient signatures.
Three characters of divergence from upstream; the offline shim ignores
those values either way, and online always has real strings.

Co-authored-by: Cursor <cursoragent@cursor.com>
…deploy commands

Match the surface area downstream tooling needs to drive the build and
deploy flows step-for-step against the helpers `buildOnlyDeploy` and
`registerOnlyDeploy` used to call. Re-export:

  - CliApiClient: typed client for the deploy / register / finalize / fail
    endpoints (kills raw HTTP in downstream scripts).
  - login: PAT-aware auth resolver — reads TRIGGER_ACCESS_TOKEN from the
    env directly, so it works in CI without a config file.
  - getProjectClient: the two-step engine API URL discovery that returns
    an engine-bound CliApiClient.
  - syncEnvVarsWithServer: deploy.sync.env / .parentEnv -> envvars/import.
  - resolveLocalEnvVars + loadDotEnvVars: project-local .env handling.
  - createGitMeta: extract git metadata for the deployment record.
  - getTmpDir: scoped temp dir with cleanup tracking.
  - saveLogs: dump build logs to a file.
  - setGithubActionsOutputAndEnvVars: emit deployment metadata to GHA.

These are all already public-ish CLI helpers; making them addressable from
`trigger.dev/internal` lets downstream build/register scripts stay in
lockstep with the upstream CLI's flow without re-implementing the API
surface.

Co-authored-by: Cursor <cursoragent@cursor.com>
Audit of the downstream scripts (build.mjs / register.mjs in govsignals)
showed two re-exports nothing imports:

- CliApiClient: scripts use `getProjectClient`, which returns an
  engine-bound CliApiClient internally. Nobody constructs one directly.
- saveLogs: added speculatively for `--save-logs` parity but never wired up.

Drop both. Future-callers can request them back via PR if there's a real
need.

Co-authored-by: Cursor <cursoragent@cursor.com>
…on every worker pod

Adds a single env var on the supervisor that names a Kubernetes Secret to
mount via `envFrom` on every worker pod the supervisor schedules. All
key/value pairs in the secret become env vars on the worker container —
resolved by the kubelet at pod creation, so the supervisor never reads
the secret values and needs no extra RBAC.

Use case: keep task-time secrets (DB URLs, API keys, etc.) in Kubernetes
Secrets owned by ops, instead of syncing them through trigger.dev's
webapp + database. Single source of truth in K8s; the secret never
leaves the K8s plane on its way to a task pod.

When the env var is unset (default) the worker pod spec is unchanged —
upstream behavior preserved.

Configured downstream via:

  supervisor:
    extraEnvVars:
      - name: KUBERNETES_WORKER_ENV_FROM_SECRET
        value: "trigger-task-secrets"

Renders as:

  spec.containers[0].envFrom = [{ secretRef: { name: "trigger-task-secrets" } }]

Co-authored-by: Cursor <cursoragent@cursor.com>
GovSignals/trigger.dev's prerelease push was failing with:

  Error: invalid reference: invalid repository "GovSignals/charts/trigger"

OCI references must be all-lowercase. `github.repository_owner` returns
the org name as-cased (`GovSignals`), so the push URL was malformed.

Add a tiny `Resolve repo owner (lowercase)` step that outputs `lc` and
swap every chart-OCI reference (`oci://ghcr.io/<owner>/charts/...`) to
use it. Same fix in both helm-prerelease.yml and release-helm.yml since
the bug exists symmetrically — the release workflow just hadn't fired
yet.

This keeps the workflows portable: any GovSignals fork (or upstream fork)
that copies these files renders correct lowercase OCI paths regardless
of the org's display casing.

Co-authored-by: Cursor <cursoragent@cursor.com>
The fork's chart was pushing to ghcr.io/<owner>/charts/trigger, which
collides with an existing package in the GovSignals org owned by a
different repo — push was 403'd on the blob HEAD probe.

Rename the published artifact to vendored-upstream-trigger by rewriting
Chart.yaml's name field at package time. CHART_NAME env now drives both
the local filename and the OCI path, and the rename only affects the
published artifact (the on-disk chart still says name: trigger so
upstream contracts and lint/test stay unchanged).

Result: ghcr.io/<owner>/charts/vendored-upstream-trigger:<version>.
Consumers reference it via dependencies.name=vendored-upstream-trigger
(or alias: trigger for backwards-compatible subchart values).

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

🧭 Helm Chart Prerelease Published

Version: 4.4.6-plt663.1-pr12.dc04413

Install:

helm upgrade --install trigger \
  oci://ghcr.io/govsignals/charts/vendored-upstream-trigger \
  --version "4.4.6-plt663.1-pr12.dc04413"

⚠️ This is a prerelease for testing. Do not use in production.

Copy link
Copy Markdown
Author

@ConProgramming ConProgramming May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of the PR, just for making this testable for us

Copy link
Copy Markdown
Author

@ConProgramming ConProgramming May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of the PR, just for making this testable for us

ConProgramming and others added 9 commits May 5, 2026 14:43
Lets downstream `--containerfile-module` implementations reuse the
upstream `parseGenerateOptions` helper that produces (buildArgs,
buildEnvVars, postInstallCommands, baseInstructions, packages, baseImage)
from a `GenerateContainerfileOptions`. Without the export, every fork
that wants to ship its own Containerfile generator has to re-implement
the same shape — which drifts as upstream evolves.

The function itself is unchanged; just promoted from a file-local
`const` to a named export.

Co-authored-by: Cursor <cursoragent@cursor.com>
Mirrors the pattern already established for `webapp.extraEnvVars`,
`webapp.extraVolumes`, and `webapp.extraVolumeMounts`. Lets consumers
inject sidecars (in-pod TLS like nginx/envoy, log shippers, audit
agents, etc.) and extra init containers without forking the chart.

Both default to empty list — no behavior change for existing users.

Use case driving this: GovSignals' FedStart deployments need an
in-pod nginx TLS sidecar so supervisor / register-tasks can talk to
the webapp over TLS without going through the cluster-edge oauth2-proxy.
The umbrella umbrella-style patterns we'd otherwise need (kustomize
post-render, fork the webapp template) are all worse.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds two new values knobs that override the hardcoded securityContext
on the built-in `volume-permissions` init container and `token-syncer`
sidecar:

  webapp:
    initContainers:
      securityContext: {}    # default: { runAsUser: 1000 }
    sidecarContainers:
      securityContext: {}    # default: { runAsUser: 1000, runAsNonRoot: true }

When unset (default), behavior is unchanged. When set, the user's
securityContext fully replaces the chart's hardcoded one — useful for
operators that need stricter pod-security admission contexts (FedRAMP /
FIPS / Pod Security Standards "restricted" requires runAsNonRoot,
allowPrivilegeEscalation: false, capabilities.drop: [ALL],
seccompProfile.type: RuntimeDefault on every container).

Same idiom as the existing webapp.podSecurityContext and webapp.security-
Context (for the webapp container) — adds the missing knobs for the
init container and sidecar.

Co-authored-by: Cursor <cursoragent@cursor.com>
…efaults

Earlier additions baked GovSignals-specific worker SA defaults into the
chart's values.yaml. That makes the chart less mergeable upstream — it
silently changes default behavior for every consumer. Revert to
neutral defaults so this PR is purely additive (new knobs, off by default):

  supervisor.config.kubernetes.workerServiceAccount:
    "trigger-worker" -> ""
  supervisor.config.kubernetes.workerAutomountServiceAccountToken:
    true -> false
  worker.serviceAccount.create:
    true -> false
  worker.serviceAccount.name:
    "trigger-worker" -> ""

Comment-only fix: clickhouse example image reference rolled back from
25.6.1-debian-12-r0 to upstream's 25.7.5-debian-12-r0 (stale rebase
artifact).

Consumers who want the previous defaults set them explicitly. The
GovSignals umbrella will set these in its own values.yaml.

Result: 106 lines changed vs upstream main, all pure additions of new
knobs (extraContainers, extraInitContainers, initContainers/
sidecarContainers securityContext, supervisor.extraVolumes/
extraVolumeMounts, worker.serviceAccount, worker.rbac). Each new knob
has an empty/zero default, so chart behavior for default users is
unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
Allows running migrations standalone without booting the webapp server.
Used by the umbrella chart's pre-install migration Job to avoid the
chicken-and-egg of the Job inheriting webapp.extraEnvVars with
SKIP_POSTGRES_MIGRATIONS=1. The migrate subcommand forces
SKIP_POSTGRES_MIGRATIONS=0 because that's the Job's only purpose.

Co-authored-by: Cursor <cursoragent@cursor.com>
Previously the block mounted s3-access-key-id / s3-secret-access-key
from secrets.existingSecret unconditionally, breaking installs where
the operator uses an existing Apollo-managed Secret that doesn't
contain those keys. Now three branches:
  - s3.deploy: true        -> existing behavior (chart-managed or
                              s3.auth.existingSecret).
  - s3.deploy: false + useIam: false:
      - external.existingSecret -> mount from external secret.
      - chart-managed accessKeyId -> requires secrets.enabled: true
                                     so secrets.yaml actually emits
                                     the s3-access-key-id key.
  - s3.deploy: false + useIam: true -> skip the entire S3 env block.

Co-authored-by: Cursor <cursoragent@cursor.com>
Electric requires /app/persistent to be writable for its state dir.
The template previously rendered no volumes at all, so every Electric
pod crashed with 'could not make directory /app/persistent/state'.
This commit always renders a /app/persistent emptyDir and adds
extraVolumes/extraVolumeMounts knobs matching the webapp/supervisor
pattern for consumers that want PVC-backed durability.

Co-authored-by: Cursor <cursoragent@cursor.com>
When clickhouse.deploy: true AND auth.existingSecret is set, the
fork's clickhouse.url helper still interpolated values.auth.password
literally, so webapp authenticated with a stale default while the
Bitnami chart used the real password from the secret. This commit
switches the deploy-mode URL to use a CLICKHOUSE_PASSWORD env-var
indirection that resolves from existingSecret when set.

Co-authored-by: Cursor <cursoragent@cursor.com>
Carries the four PLT-663 chart fixes above.

Co-authored-by: Cursor <cursoragent@cursor.com>
ConProgramming and others added 3 commits May 15, 2026 15:51
@kubernetes/client-node@1.0.0 tightened V1ObjectMeta.annotations from
'unknown' to 'Record<string, string>'. The parsed JSON from the
KUBERNETES_WORKER_POD_ANNOTATIONS env var lands as 'unknown' which now
fails TS2322 at the assignment site. Cast (or validate) at the parse
boundary.

Co-authored-by: Cursor <cursoragent@cursor.com>
PR #12 added supervisor.config.kubernetes.workerPodSecurityContext,
workerContainerSecurityContext, and workerPodAnnotations to values.yaml
but the supervisor.yaml template never read them. The supervisor's
Kubernetes workload manager reads KUBERNETES_WORKER_POD_SECURITY_CONTEXT,
KUBERNETES_WORKER_CONTAINER_SECURITY_CONTEXT, and
KUBERNETES_WORKER_POD_ANNOTATIONS env vars at runtime (JSON-parsed) and
applies them to every worker pod it schedules.

Without this wiring, worker pods on FedStart / GameWarden deployments
are missing their compliance-required securityContext entries and would
be rejected by pod-security admission.

Co-authored-by: Cursor <cursoragent@cursor.com>
82 commits of upstream churn (post triggerdotdev/main sync). Resolutions:
  - .github/workflows/helm-prerelease.yml         -> hand-merged (kept our
    CHART_NAME rename + steps.owner.outputs.lc; adopted upstream's
    STEPS_VERSION_OUTPUTS_VERSION env-var pattern for Zizmor compliance)
  - .github/workflows/release-helm.yml            -> hand-merged (kept our
    steps.owner.outputs.lc; adopted upstream env-var pattern)
  - apps/webapp/app/v3/services/initializeDeployment.server.ts
    -> hand-merged: accepted upstream's PR triggerdotdev#3610 refactor that moves
    getDeploymentImageRef() inside createDeploymentWithNextVersion's
    per-attempt callback, and re-applied our DEPLOY_IMAGE_OVERRIDE
    short-circuit inside the same callback.

  Auto-merged (kept our patches intact):
  - hosting/k8s/helm/templates/supervisor.yaml    (upstream improved OTLP
    endpoint to fully-qualified .svc.cluster.local URL; our extraVolumes,
    extraVolumeMounts, and security-context wiring preserved)
  - hosting/k8s/helm/templates/webapp.yaml        (our extraContainers,
    extraInitContainers, useIam gate, CLICKHOUSE_PASSWORD env preserved)
  - hosting/k8s/helm/values.yaml                  (our electric.extraVolumes,
    s3.external.useIam, etc preserved)
  - pnpm-lock.yaml                                (auto-merged cleanly)

Smoke tests:
  - helm dependency build hosting/k8s/helm        OK
  - helm template hosting/k8s/helm                OK (renders without errors)

Co-authored-by: Cursor <cursoragent@cursor.com>

# Push to GHCR OCI registry
helm push "$CHART_PACKAGE" "oci://${{ env.REGISTRY }}/${{ github.repository_owner }}/charts"
helm push "$CHART_PACKAGE" "oci://${{ env.REGISTRY }}/${{ steps.owner.outputs.lc }}/charts"
echo '```bash'
echo "helm upgrade --install trigger \\"
echo " oci://${{ env.REGISTRY }}/${{ github.repository_owner }}/charts/${{ env.CHART_NAME }} \\"
echo " oci://${{ env.REGISTRY }}/${{ steps.owner.outputs.lc }}/charts/${{ env.CHART_NAME }} \\"

# Push to GHCR OCI registry
helm push "$CHART_PACKAGE" "oci://${{ env.REGISTRY }}/${{ github.repository_owner }}/charts"
helm push "$CHART_PACKAGE" "oci://${{ env.REGISTRY }}/${{ steps.owner.outputs.lc }}/charts"
ConProgramming and others added 2 commits May 15, 2026 18:44
Reconciles plt-663-chart-fixes with gs-v4.4.4's main-merge (PR #12).

Conflicts resolved:
  - hosting/k8s/helm/Chart.yaml      -> hand-merged: bumped prerelease
    base from 4.4.5 to 4.4.6 to track gs-v4.4.4's version bump (PR triggerdotdev#3500
    + triggerdotdev#3501 from upstream). New chart version is now 4.4.6-plt663.1.
    appVersion v4.4.6 from upstream.
  - hosting/k8s/helm/Chart.lock      -> took gs-v4.4.4's lock (clickhouse
    9.4.4 from upstream PR triggerdotdev#3524) then re-ran `helm dependency build`.

Auto-merged cleanly:
  - hosting/k8s/helm/templates/supervisor.yaml (our security-context env
    wiring + worker pod annotations type fix preserved alongside upstream's
    extraVolumes/extraVolumeMounts and OTLP fully-qualified URL).

Upstream-coverage check (4 PR #13 fixes vs newly-synced upstream main):
  - _helpers.tpl  ($(CLICKHOUSE_PASSWORD) in deploy: true branch) -> NOVEL
    Upstream's deploy branch still interpolates `.Values.clickhouse.auth.password`
    literally; our fix is still required when auth.existingSecret is set.
  - electric.yaml (/app/persistent mount + extraVolumes)          -> NOVEL
    Upstream electric.yaml still renders no volumes/volumeMounts at all.
  - webapp.yaml   (S3 useIam gate + CLICKHOUSE_PASSWORD env)      -> NOVEL
    Upstream webapp.yaml has no useIam reference and no CLICKHOUSE_PASSWORD
    env var; both are still required.
  - values.yaml   (electric.extraVolumes + s3.external.useIam)    -> NOVEL
    Upstream values.yaml has neither key under electric: nor s3.external:.

All 4 fixes remain net-new fork-side patches. No PR #13 commits become
redundant from this upstream sync.

Smoke tests:
  - helm dependency build hosting/k8s/helm        OK
  - helm template hosting/k8s/helm                OK (renders v4.4.6-plt663.1)

Co-authored-by: Cursor <cursoragent@cursor.com>
fix(chart): PLT-663 self-hosted install bugs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants