Skip to content

fix(kaniko): deliver build context via init-container, not s3:// or http://#7

Merged
mastermanas805 merged 1 commit into
masterfrom
fix/kaniko-presigned-url
May 11, 2026
Merged

fix(kaniko): deliver build context via init-container, not s3:// or http://#7
mastermanas805 merged 1 commit into
masterfrom
fix/kaniko-presigned-url

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

Summary

PR #5's S3 approach failed live verification — twice. This is the working fix.

Why the previous attempts failed

Attempt Failure
`--context=s3://bucket/key` AWS SDK v2 (kaniko v1.23) only does vhost-style endpoints; `S3_FORCE_PATH_STYLE` is an SDK v1 knob, silently ignored. Bucket name resolves as a non-existent subdomain.
`--context=tar.gz+http://...` kaniko v1.23 only accepts `gs://, dir://, tar://, s3://, git://, https://` — `tar.gz+` is invalid and plain `http://` isn't on the list.

What works

Init-container that `curl`s the presigned URL into a shared emptyDir; main kaniko container reads via `--context=tar:///workspace/context.tar.gz`. We control the fetch — kaniko sees only a local file. Works regardless of AWS-SDK path-style support or MinIO TLS state.

Live verification (this time, before the PR)

Deployed v2.0.2 to prod. 2 MiB tarball → kaniko Job built + pushed → app pod boots → `GET /` returns the env_vars we POSTed:

```json
{"ok":true,"database_url_set":true,"custom_var":"hello","expected_string":"init-container-works"}
```

Test plan

  • `TestKanikoJobUsesInitContainerWhenHTTPURLSet` — init-container present, kaniko reads tar://, build-context is emptyDir, no legacy AWS env
  • `TestKanikoJobHasExplicitResources` — unchanged
  • `TestAppDeploymentUsesPullAlways` — unchanged
  • Live: 2 MiB tarball deploy succeeds end-to-end (v2.0.2 in prod now)

🤖 Generated with Claude Code

…ttp://

Live verification of PR #5 (MinIO build context) failed twice on the
real cluster before settling on this approach:

  Attempt 1: --context=s3://bucket/key
    → AWS SDK v2 (kaniko v1.23) resolves S3 endpoints in vhost style.
      The path-style env switch (S3_FORCE_PATH_STYLE) is an SDK v1 knob
      and is silently ignored. kaniko tried to hit
      instant-build-contexts.minio.instant-data.svc.cluster.local
      which doesn't resolve.

  Attempt 2: --context=tar.gz+http://...
    → kaniko v1.23 only accepts these prefixes:
      gs://, dir://, tar://, s3://, git://, https://
      tar.gz+ is invalid; no plaintext http:// either.

Final approach (this commit): generate a presigned URL via minio-go,
add an initContainer to the kaniko Job that curls the URL into a
shared emptyDir, kaniko reads via the standard --context=tar://. We
control the fetch; kaniko sees only a local file. Works regardless of
AWS-SDK quirks or MinIO TLS state.

Tests updated to assert:
- init-container with curl image is present
- init-container URL env points at the presigned URL
- kaniko args use --context=tar:///workspace/context.tar.gz
- build-context volume is emptyDir (not Secret)
- no legacy AWS env vars on the main kaniko container

Live verified: 2 MiB tarball → kaniko Job builds → app pod boots →
GET / returns the env_vars we sent on the initial POST.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit e9b3000 into master May 11, 2026
@mastermanas805 mastermanas805 deleted the fix/kaniko-presigned-url branch May 11, 2026 08:44
mastermanas805 added a commit that referenced this pull request May 17, 2026
…ning

Round-3 P2 remediation across deploy/stack/webhook/auth surfaces.

1. deploy.go Redeploy now rejects a deployment in a terminal status
   (expired/deleted/stopped) with 409 + error code
   `deployment_not_redeployable` — redeploying one would resurrect an
   over-TTL/over-cap workload. New models.IsDeploymentTerminal +
   DeployStatusStopped const.

2. stack.go Redeploy re-runs the per-tier deployments_apps cap check when
   the stack is NOT in an active (slot-occupying) status — a failed/stopped
   stack flipping back to `building` could take a team to cap+1. New
   models.IsStackActive.

3. stack.go empty-env vault fallback changed from "production" to
   models.EnvDefault (development) at both new + redeploy sites — convention
   #11: a no-env legacy stack must not silently read production secrets.

4. deploy_teardown_reconciler.go increments a new
   metrics.DeployTeardownMarkFailed counter when MarkDeploymentTornDown
   fails — a persistently stuck row is now alertable in NR, not a silent log.

5. auth.go findOrCreateUserGitHub now matches an existing account by email
   (GetUserByEmail) and links github_id via new models.LinkGitHubID instead
   of forking a new team/user — mirrors findOrCreateUserGoogle and rejects
   takeover of an account already linked to a different GitHub ID. The
   /user/emails fallback now filters on Verified && Primary.

6. (already correct) models.CreateUser already routes email through
   NormalizeEmail at the write boundary — every OAuth/magic-link/claim call
   site is covered. No change needed; verified.

7. webhook.go receive_url is now built from a fixed server-controlled base
   (new webhookReceiveBaseURL: API_PUBLIC_URL / compiled-in base; c.BaseURL()
   only as a non-production dev fallback) instead of the client-controllable
   Host header. The URL is encrypted + persisted, so a client-settable host
   was a persistence-poisoning vector.

8. webhook.go Receive + ListRequests reject any non-webhook resource token
   with 404 — GetResourceByToken selects by token only, so a postgres/redis
   token previously passed.

9. auth.go GoogleAuthURL drops the impossible url.Parse-error 500 branch
   (the argument is a compile-time constant) — matches GoogleStart.

Regression tests: models/redeploy_guard_test.go (IsDeploymentTerminal,
IsStackActive), models/link_github_id_test.go (LinkGitHubID),
handlers/webhook_receive_base_url_test.go (#7), handlers/p2_roundup_test.go
(#1 error code, #4 metric, #9 GoogleAuthURL), and a wrong-resource-type case
appended to handlers/webhook_test.go (#8).

go build ./... and go vet ./... pass. New no-DB regression tests pass; the
DB/Redis-backed suites require a test Postgres/Redis (unavailable in this
environment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant