Skip to content

fix(chart): PLT-663 self-hosted install bugs#13

Merged
ConProgramming merged 8 commits into
gs-v4.4.4from
plt-663-chart-fixes
May 15, 2026
Merged

fix(chart): PLT-663 self-hosted install bugs#13
ConProgramming merged 8 commits into
gs-v4.4.4from
plt-663-chart-fixes

Conversation

@ConProgramming
Copy link
Copy Markdown

Summary

Four bugs caught during local OrbStack validation of the umbrella chart
(GovSignals/govsignals#2346):

  1. webapp unconditionally mounted s3-access-key-id / s3-secret-access-key from secrets.existingSecret even when s3.deploy: false, s3.external.useIam: true, and no s3.external.existingSecret — causing CreateContainerConfigError on fresh installs that route around the chart-managed Secret.
  2. electric required /app/persistent to be writable but the template rendered no volumes at all → CrashLoopBackOff.
  3. clickhouse URL helper hardcoded clickhouse.auth.password literal, ignoring clickhouse.auth.existingSecret when deploy: true.
  4. docker/scripts/entrypoint.sh had no migrate subcommand — the umbrella's pre-install migration Job called entrypoint.sh migrate but the positional arg was ignored, so the Job booted the webapp server and hung until activeDeadlineSeconds: 600.

Per-commit detail

  • Each commit is a single focused fix. See commit bodies for the full rationale.

Test plan

  • Validated end-to-end on OrbStack against a fresh install with secrets.enabled: false + external Apollo-style secrets + s3.external.useIam: true. Webapp /healthcheck returns 200; supervisor / electric / clickhouse / redis all reach Ready. See umbrella PR for full pod-state output.
  • CI green on fork helm-prerelease workflow (chart artifact publishes to GHCR).
  • Existing FedStart / GameWarden Apollo deploys (operator-side test once a new umbrella OCI pin lands).

Stacked on gs-v4.4.4 because that's where the GovSignals fork lives; will roll up into PR #12 when that branch eventually rebases onto upstream.

Made with Cursor

ConProgramming and others added 5 commits May 14, 2026 17:01
Allows running migrations standalone without booting the webapp server.
Used by the umbrella chart's pre-install migration Job to avoid the
chicken-and-egg of the Job inheriting webapp.extraEnvVars with
SKIP_POSTGRES_MIGRATIONS=1. The migrate subcommand forces
SKIP_POSTGRES_MIGRATIONS=0 because that's the Job's only purpose.

Co-authored-by: Cursor <cursoragent@cursor.com>
Previously the block mounted s3-access-key-id / s3-secret-access-key
from secrets.existingSecret unconditionally, breaking installs where
the operator uses an existing Apollo-managed Secret that doesn't
contain those keys. Now three branches:
  - s3.deploy: true        -> existing behavior (chart-managed or
                              s3.auth.existingSecret).
  - s3.deploy: false + useIam: false:
      - external.existingSecret -> mount from external secret.
      - chart-managed accessKeyId -> requires secrets.enabled: true
                                     so secrets.yaml actually emits
                                     the s3-access-key-id key.
  - s3.deploy: false + useIam: true -> skip the entire S3 env block.

Co-authored-by: Cursor <cursoragent@cursor.com>
Electric requires /app/persistent to be writable for its state dir.
The template previously rendered no volumes at all, so every Electric
pod crashed with 'could not make directory /app/persistent/state'.
This commit always renders a /app/persistent emptyDir and adds
extraVolumes/extraVolumeMounts knobs matching the webapp/supervisor
pattern for consumers that want PVC-backed durability.

Co-authored-by: Cursor <cursoragent@cursor.com>
When clickhouse.deploy: true AND auth.existingSecret is set, the
fork's clickhouse.url helper still interpolated values.auth.password
literally, so webapp authenticated with a stale default while the
Bitnami chart used the real password from the secret. This commit
switches the deploy-mode URL to use a CLICKHOUSE_PASSWORD env-var
indirection that resolves from existingSecret when set.

Co-authored-by: Cursor <cursoragent@cursor.com>
Carries the four PLT-663 chart fixes above.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

🧭 Helm Chart Prerelease Published

Version: 4.4.6-plt663.1-pr13.f66ed13

Install:

helm upgrade --install trigger \
  oci://ghcr.io/govsignals/charts/vendored-upstream-trigger \
  --version "4.4.6-plt663.1-pr13.f66ed13"

⚠️ This is a prerelease for testing. Do not use in production.

ConProgramming and others added 3 commits May 15, 2026 15:51
@kubernetes/client-node@1.0.0 tightened V1ObjectMeta.annotations from
'unknown' to 'Record<string, string>'. The parsed JSON from the
KUBERNETES_WORKER_POD_ANNOTATIONS env var lands as 'unknown' which now
fails TS2322 at the assignment site. Cast (or validate) at the parse
boundary.

Co-authored-by: Cursor <cursoragent@cursor.com>
PR #12 added supervisor.config.kubernetes.workerPodSecurityContext,
workerContainerSecurityContext, and workerPodAnnotations to values.yaml
but the supervisor.yaml template never read them. The supervisor's
Kubernetes workload manager reads KUBERNETES_WORKER_POD_SECURITY_CONTEXT,
KUBERNETES_WORKER_CONTAINER_SECURITY_CONTEXT, and
KUBERNETES_WORKER_POD_ANNOTATIONS env vars at runtime (JSON-parsed) and
applies them to every worker pod it schedules.

Without this wiring, worker pods on FedStart / GameWarden deployments
are missing their compliance-required securityContext entries and would
be rejected by pod-security admission.

Co-authored-by: Cursor <cursoragent@cursor.com>
Reconciles plt-663-chart-fixes with gs-v4.4.4's main-merge (PR #12).

Conflicts resolved:
  - hosting/k8s/helm/Chart.yaml      -> hand-merged: bumped prerelease
    base from 4.4.5 to 4.4.6 to track gs-v4.4.4's version bump (PR triggerdotdev#3500
    + triggerdotdev#3501 from upstream). New chart version is now 4.4.6-plt663.1.
    appVersion v4.4.6 from upstream.
  - hosting/k8s/helm/Chart.lock      -> took gs-v4.4.4's lock (clickhouse
    9.4.4 from upstream PR triggerdotdev#3524) then re-ran `helm dependency build`.

Auto-merged cleanly:
  - hosting/k8s/helm/templates/supervisor.yaml (our security-context env
    wiring + worker pod annotations type fix preserved alongside upstream's
    extraVolumes/extraVolumeMounts and OTLP fully-qualified URL).

Upstream-coverage check (4 PR #13 fixes vs newly-synced upstream main):
  - _helpers.tpl  ($(CLICKHOUSE_PASSWORD) in deploy: true branch) -> NOVEL
    Upstream's deploy branch still interpolates `.Values.clickhouse.auth.password`
    literally; our fix is still required when auth.existingSecret is set.
  - electric.yaml (/app/persistent mount + extraVolumes)          -> NOVEL
    Upstream electric.yaml still renders no volumes/volumeMounts at all.
  - webapp.yaml   (S3 useIam gate + CLICKHOUSE_PASSWORD env)      -> NOVEL
    Upstream webapp.yaml has no useIam reference and no CLICKHOUSE_PASSWORD
    env var; both are still required.
  - values.yaml   (electric.extraVolumes + s3.external.useIam)    -> NOVEL
    Upstream values.yaml has neither key under electric: nor s3.external:.

All 4 fixes remain net-new fork-side patches. No PR #13 commits become
redundant from this upstream sync.

Smoke tests:
  - helm dependency build hosting/k8s/helm        OK
  - helm template hosting/k8s/helm                OK (renders v4.4.6-plt663.1)

Co-authored-by: Cursor <cursoragent@cursor.com>
@ConProgramming ConProgramming marked this pull request as ready for review May 15, 2026 23:01
@ConProgramming ConProgramming merged commit dc04413 into gs-v4.4.4 May 15, 2026
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant