Skip to content

perf(docker): slim runtime image and bump Node LTS to 22||24#1767

Merged
confuser merged 9 commits intomasterfrom
chore/docker-image-slim-node24
Apr 20, 2026
Merged

perf(docker): slim runtime image and bump Node LTS to 22||24#1767
confuser merged 9 commits intomasterfrom
chore/docker-image-slim-node24

Conversation

@confuser
Copy link
Copy Markdown
Member

Summary

  • Cuts the published banmanagement/webui image from ~340 MB compressed → ~160 MB by switching to a 3-stage Alpine Dockerfile, pruning runtime node_modules, dropping git from the runner, and reclassifying build-only deps as devDependencies.
  • Bumps the supported Node.js range to the current LTS pair (22 || 24) across package.json, .naverc, the CI matrix, the Dockerfile base image, and the README.
  • Extends the existing smoke_docker CI job so 4 cypress journeys run against the dockerised stack (login, registration, admin server lifecycle, admin webhook lifecycle).

What changed

Dockerfile

3 stages on node:24-alpine:

Stage Purpose
prod-deps npm ci --omit=dev then scripts/docker/prune-runtime-deps.sh
builder Full deps + npm run build, then rm -rf .next/cache
runner Copies pruned node_modules from prod-deps and only the runtime files from builder (.next, public, server, cli, bin, server.js, docker-entrypoint.js, next.config.js, package.json)

Notes:

  • We considered Next.js output: 'standalone' and reverted it – the standalone tracer cannot follow the dynamic require()s in server.js / cli/, which produced runtime Cannot find module 'web-push' / 'dotenv' failures. Pruned-deps gives us the same size win without the tracing fragility.
  • git is no longer installed in the runner; git-revision-webpack-plugin only runs in the builder.
  • tini is retained for PID 1.

scripts/docker/prune-runtime-deps.sh (new)

Deletes things npm ci --omit=dev cannot, all of which are documented inline:

  • non-Alpine @next/swc-* binaries
  • non-Linux/musl @img/sharp-* binaries
  • react-icons families the app does not import (kept: ai bi bs fa fi go md ri tb ti)
  • date-fns CDN bundles (incl. the duplicate inside @nateradebaugh/react-datetime)
  • the 13 MB image-q/demo directory
  • generic test/docs/examples/*.map/*.d.ts cruft inside node_modules

package.json / package-lock.json

  • engines.node: 20 || 2222 || 24
  • Moved to devDependencies: tailwindcss, postcss, autoprefixer, typescript, url-loader, @next/bundle-analyzer, git-revision-webpack-plugin

next.config.js

  • Lazy/guarded require()s for @next/bundle-analyzer and git-revision-webpack-plugin so the production runtime never imports devDeps.
  • Added modularizeImports for react-icons so webpack only references the icon families actually used.

.dockerignore

Tightened to exclude .git, .github, .cursor, .vscode, *.md (except README.md), LICENSE, dev compose files, scratch scripts (scripts/seed.js), .editorconfig, .eslintrc, cypress.config.js, cypress.setup.config.js, jest.config.js, nodemon.json, renovate.json, captain-definition, CHECKS, .cache_ggshield, .naverc.

.naverc / README.md / CI matrix

Bumped to Node 24 / 22.x + 24.x to match engines.node.

.github/workflows/build.yaml (smoke_docker extension)

After the existing setup-mode boot check:

  1. Tear the stack down (down -v).
  2. Set up Node 24 + cache ~/.npm and ~/.cache/Cypress.
  3. npm ci + npx cypress install on the runner.
  4. Bring MySQL only back up via docker-compose.prod.yml + docker-compose.e2e-overlay.yml (the overlay just adds 3306:3306 and pre-seeds env vars on the WebUI service so it boots in normal mode against bm_e2e_tests as root).
  5. Wait for MySQL healthy, run node cypress/setup.js to migrate + seed bm_e2e_tests from the runner.
  6. Bring the WebUI container up against the seeded DB, wait for /health=ok.
  7. Run cypress with the 4 specs:
    • cypress/e2e/pages/login.spec.js (argon2 + sessions)
    • cypress/e2e/journeys/registration.spec.js (PIN flow)
    • cypress/e2e/journeys/admin-server-lifecycle.spec.js (Apollo + knex + mysql2 CRUD)
    • cypress/e2e/journeys/admin-webhook-lifecycle.spec.js (webhook + sharp)
  8. Dump container logs on failure, then down -v.

This catches outputFileTracingIncludes-style regressions and any standalone-vs-runtime mismatches that the host-only test and setup_e2e jobs cannot see (those run against npm start / e2e:setup:server rather than the published image).

docker-compose.e2e-overlay.yml (new)

Tiny overlay layered on top of docker-compose.prod.yml. Only adds:

  • mysql.ports: 3306:3306 so the host runner can seed.
  • webui.environment with DB_USER=root, DB_NAME=bm_e2e_tests, ENCRYPTION_KEY, SESSION_KEY, CONTACT_EMAIL, SERVER_FOOTER_NAME so the container boots straight into normal mode.

Test plan

Locally verified:

  • docker build succeeds → 159.4 MB compressed final image
  • docker compose up boots end-to-end and reaches /setup
  • /setup wizard completes; admin can be created
  • /admin/servers create + list works (Apollo + knex + mysql2)
  • /admin/webhooks create + list works (Apollo + sharp + argon2 paths)
  • All native modules (sharp, argon2, mysql2) load inside the container

Expected from CI:

  • test matrix passes on Node 22.x and 24.x
  • setup_e2e passes on Node 24.x
  • build_docker produces the slim image
  • smoke_docker passes both the existing setup-mode check and the new cypress run against the dockerised stack

Cuts the published `banmanagement/webui` image from ~340 MB compressed
to ~160 MB while bumping the supported Node.js range to the current
LTS pair (22 || 24).

Image-size changes
- Rewrite Dockerfile as a 3-stage build (`prod-deps` -> `builder` ->
  `runner`) on `node:24-alpine`. The runner copies a pruned production
  `node_modules` from `prod-deps` and only the build artefacts +
  custom-server source it actually needs from `builder`. We cannot use
  Next.js standalone output because `server.js` / `cli/` perform
  dynamic `require`s that the standalone tracer cannot follow; the
  pruned-deps approach keeps every runtime require resolvable while
  still discarding all dev dependencies and `.next/cache`.
- Drop `git` from the runner image (we already get `GIT_COMMIT` at
  build time via `git-revision-webpack-plugin`).
- Add `scripts/docker/prune-runtime-deps.sh` to remove platform
  binaries we never run (non-Alpine `@next/swc-*`, non-Linux/musl
  `@img/sharp-*`), `react-icons` families the app does not import,
  `date-fns` CDN bundles, the 13 MB `image-q/demo` directory, and
  the usual `test`/`docs`/`*.map`/`*.d.ts` cruft from `node_modules`.
- Tighten `.dockerignore` so the build context excludes `.git`,
  `.github`, editor/CI config, dev compose files, scratch scripts,
  and most markdown.
- Reclassify `tailwindcss`, `postcss`, `autoprefixer`, `typescript`,
  `url-loader`, `@next/bundle-analyzer`, and `git-revision-webpack-plugin`
  as devDependencies and lazy-require the build-only ones in
  `next.config.js` so production installs do not pull them in.
- Add `react-icons` `modularizeImports` to `next.config.js` so the
  webpack output only references icon families the app actually uses.

Node 22 || 24
- Update `engines.node`, `.naverc`, the CI matrix, and the README to
  the new LTS pair (22.x is still in active support; 24.x is the new
  LTS). The Dockerfile uses `node:24-alpine`.

CI
- Extend the existing `smoke_docker` job so that, after the existing
  setup-mode boot check, we tear the stack back down, restart MySQL
  with `docker-compose.e2e-overlay.yml` (which only adds a 3306 host
  port and pre-seeds ENCRYPTION_KEY/SESSION_KEY/DB_NAME for the WebUI
  container), seed `bm_e2e_tests` from the runner via `cypress/setup.js`,
  bring the WebUI container up against the seeded DB, wait for
  `/health=ok`, and run four cypress journeys (login, registration,
  admin server lifecycle, admin webhook lifecycle) against the
  container. This catches Apollo/knex/sharp/argon2 wiring regressions
  the host-only `test`/`setup_e2e` jobs cannot see.

Verified locally
- `docker build` -> 159.4 MB compressed image.
- `docker compose up` boots, walks the /setup wizard, and exercises
  /admin/servers + /admin/webhooks (Apollo + knex + sharp + argon2
  paths) successfully.
- Cypress against the container deferred to CI (host cypress launcher
  unrelated breakage on this machine).
@confuser confuser force-pushed the chore/docker-image-slim-node24 branch from 3324edf to 870df67 Compare April 19, 2026 17:20
@cypress
Copy link
Copy Markdown

cypress Bot commented Apr 19, 2026

BanManager-WebUI    Run #10295

Run Properties:  status check passed Passed #10295  •  git commit 835191295d ℹ️: Merge 33639e2d8618ad7e6baa44d3abfe993285966bae into 206eff89a0350d594b61f04d6f84...
Project BanManager-WebUI
Branch Review chore/docker-image-slim-node24
Run status status check passed Passed #10295
Run duration 02m 33s
Commit git commit 835191295d ℹ️: Merge 33639e2d8618ad7e6baa44d3abfe993285966bae into 206eff89a0350d594b61f04d6f84...
Committer James Mortemore
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 1
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 49
View all changes introduced in this branch ↗︎

- Dockerfile: add `--chmod=0555` (r-xr-xr-x) to all 10 runtime COPY
  commands so the image ships read-only, owned by `nextjs:nodejs`.
  Closes SonarCloud docker:S6504. The `nextjs` user only needs to
  read/load these and traverse directories; mutable runtime state
  (cache, uploads, config) lives on the VOLUMEs declared below the
  COPYs, which Docker mounts independently of the image FS.
  Verified locally: `docker build` + `node -e "require('argon2');
  require('sharp'); require('mysql2')"` still loads all three native
  modules cleanly, .bin shims and bin/* scripts retain +x via 0555.

- workflows/build.yaml: pin all three `cypress-io/github-action`
  uses (lines 73, 133, the new line 316) to a full commit SHA -
  v6.10.9 = f790eee7a50d9505912f50c2095510be7de06aa7. Closes
  SonarCloud githubactions:S7637 and protects against tag hijacking
  on the action repo. Renovate will continue to bump these via the
  trailing `# v6.10.9` comment.
`--chmod=0555` alone wasn't enough - SonarCloud's docker:S6504 also
fires whenever a non-root user owns a copied resource, because that
user can always `chmod` write back later via their ownership. The
compliant pattern (per the rule's own example) is `--chown=root:root
--chmod=...`.

Switch all 10 runtime COPYs to `--chown=root:root --chmod=0555`. The
`nextjs` runtime user can still read/load (world-r) and traverse
(world-x) every file, but can no longer mutate the image FS or chmod
itself write access.

This required restructuring the writable-runtime-state setup, because
the COPY of `/app/public` brings in the tracked
`public/images/opengraph/cache` directory and would now stamp it as
`root:root 0555` - which would lock the runtime out of writing
opengraph cache images. Move the `mkdir -p` + `chown nextjs:nodejs` +
`chmod u+w` for the four writable paths (`.next/cache`, `uploads`,
`config`, `public/images/opengraph/cache`) into a NEW `RUN` layer that
sits AFTER the COPYs, so it deterministically wins regardless of what
the COPYs bring in. The old pre-COPY mkdir+chown was effectively
no-op because the COPYs always re-stamped ownership over it.

Verified locally:
- `docker build` succeeds.
- As `nextjs` (uid 1001): writes to all 4 VOLUME paths
  (`/app/uploads/documents`, `/app/config`, `/app/.next/cache/images`,
  `/app/public/images/opengraph/cache`) succeed.
- As `nextjs`: writes to `/app/server.js` and tracked public assets
  (e.g. `public/player-template.png`) are denied.
- `node -e "require('argon2'); require('sharp'); require('mysql2')"`
  succeeds.
- Container boots: entrypoint generates + persists ENCRYPTION_KEY/
  SESSION_KEY/VAPID keys to `/app/config/.env`, app reaches
  "Listening on …:3000" and enters setup mode.
The previous prune script removed `@img/sharp-linuxmusl-x64`,
`@img/sharp-libvips-linuxmusl-x64`, and `@next/swc-linux-x64-musl`
under the wrong assumption that the runner image is always
linux/arm64-musl. That was an Apple-Silicon-developer-blind-spot:
GitHub Actions ubuntu-latest runners are linux/amd64, the published
banmanagement/webui:latest image is multi-arch (amd64 + arm64), and
all current CI smoke / publish flows build on amd64.

Symptom on CI: WebUI container failed to start with
  Error: Could not load the "sharp" module using the linuxmusl-x64
  runtime
because we'd deleted the only sharp binary that matched the runtime.

Keep BOTH linuxmusl-x64 and linuxmusl-arm64 variants for sharp
(plus their libvips siblings) and for @next/swc. Drop the linux-*
glibc variants since our base is `node:24-alpine`, and drop all the
darwin/win32 variants. Net effect on amd64 image size is unchanged
vs. master (we never had x64 binaries removed there, only the prune
list was wrong); arm64 image keeps the same x64 insurance now too.
The smoke_docker job's seed step was failing with PROTOCOL_CONNECTION_LOST
because MySQL's `mysqladmin ping` healthcheck flips the container to
healthy while the entrypoint is still running its temporary mysqld over a
unix socket. The real mysqld bound to TCP 3306 only comes up a few
seconds later, so the seed script connected to a temp server that then
killed the connection mid-query. The script's silent .catch swallowed
the error, the workflow continued, and WebUI sat in
setup_mode_db_unreachable until the wait timed out.

cypress/setup.js: surface failures with a non-zero exit code so callers
can detect them. Existing callers run against a fully-ready DB so this
only affects real-failure paths.

build.yaml: wrap the seed step in an 8-attempt retry with 5s backoff so
we ride out the MySQL temp -> real server transition cleanly without
masking actual seed errors.
pages/_document.js and 6 components/admin/* files do
`import resolveConfig from 'tailwindcss/resolveConfig'`. resolveConfig
is invoked at SSR time (not just at build time) to derive theme colors
for the bundled HTML, and tailwind.config.js itself does
`require('tailwindcss/colors')`. Next.js externalizes the
`tailwindcss/*` resolution to a runtime require, so the production
node_modules tree must contain it.

Without this every cy.visit() in the docker compose smoke job got a
500 with `Cannot find module 'tailwindcss/resolveConfig'` from
.next/server/pages/_document.js. The other build-only packages I
moved (postcss, autoprefixer, typescript, url-loader,
@next/bundle-analyzer, git-revision-webpack-plugin) are only consumed
by next.config.js / tailwind.config.js / postcss.config.js at build
time and stay in devDependencies.

Lockfile shrinks because several transitive packages that were
previously only reached via cypress/etc. devDeps are now also reached
via the production tailwindcss dep, so npm dropped their `dev: true`
flags.
The smoke_docker job's seed step runs on the runner (DB_HOST=127.0.0.1
via the compose port mapping), so server/test/fixtures/server.js was
writing host=127.0.0.1 into bm_web_servers. Inside the WebUI container
that resolves to the container itself, so the GraphQL DataLoader's
servers-pool was hitting ECONNREFUSED 127.0.0.1:3306 on every query.

Add BM_DB_HOST/PORT/USER overrides to the fixture so the seed can write
the address the WebUI container needs (the compose service name `mysql`)
without changing the address its own knex pool uses (127.0.0.1). The
existing DB_* fallback keeps jest + setup_e2e flows untouched.
The createServer admin journey types Cypress.env('DB_HOST') (defaulting
to 127.0.0.1) into the "Add Server" form. Inside the WebUI container,
127.0.0.1:3306 is the container itself, so the createServer mutation's
connection probe fails with ECONNREFUSED. Pass CYPRESS_DB_HOST=mysql
(plus port/user/password) to the cypress action so the spec submits the
compose service name and the WebUI can actually reach the new BM server.
The updateServer resolver always re-validates the BM database
connection, even when only the name has changed. The edit form
deliberately does not pre-fill the password (the API doesn't return it),
so submitting after just changing the name sends an empty password and
the mutation fails with DB_CONNECTION_ERROR against the docker-compose
MySQL (which does require a password).

Local setup_e2e never tripped this because its MySQL accepts the empty
default; the dockerised smoke test now does, so retype the password
before submit if Cypress.env('DB_PASSWORD') is set, mirroring the
create-server step's existing logic.
@sonarqubecloud
Copy link
Copy Markdown

@confuser confuser merged commit 8ce1cde into master Apr 20, 2026
10 checks passed
@confuser confuser deleted the chore/docker-image-slim-node24 branch April 20, 2026 09:40
@cypress
Copy link
Copy Markdown

cypress Bot commented Apr 20, 2026

BanManager-WebUI    Run #10297

Run Properties:  status check passed Passed #10297  •  git commit 8ce1cde3ea: perf(docker): slim runtime image and bump Node LTS to 22||24 (#1767)
Project BanManager-WebUI
Branch Review master
Run status status check passed Passed #10297
Run duration 02m 41s
Commit git commit 8ce1cde3ea: perf(docker): slim runtime image and bump Node LTS to 22||24 (#1767)
Committer James Mortemore
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 1
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 49
View all changes introduced in this branch ↗︎

confuser added a commit that referenced this pull request Apr 20, 2026
The publish workflow has been broken since the Node 24 bump (#1767). The
arm64 leg ran under qemu-user on the amd64 runner and crashed with
SIGILL (exit 132) inside `npm ci` - V8 13's JIT emits Arm v8.x
instructions that the runner's binfmt qemu cannot decode.

Switch to the docker/build-push-action multi-platform pattern:
  - Matrix per architecture, each on its native runner. amd64 stays on
    `ubuntu-latest`; arm64 moves to the free `ubuntu-24.04-arm` runner
    GitHub now provides for public repos. Both per-arch jobs push
    blob-only `push-by-digest` images so they never compete for shared
    tags.
  - A `merge` job downloads both digests and stitches them into the
    `:latest` and `:<sha>` manifest list with `docker buildx imagetools
    create`, reproducing the previous tag set.

Also add per-arch GHA cache scopes so the two legs do not invalidate
each other.

Temporarily includes `chore/native-arm-publish` in the push trigger so
the workflow can be validated end-to-end before the entry is removed
and the change is opened for review.
confuser added a commit that referenced this pull request Apr 20, 2026
SonarCloud's githubactions:S7637 ("Use full commit SHA hash for this
dependency") flagged the three docker/* uses in the new merge job
introduced by this PR. Pin all docker/setup-buildx-action,
docker/login-action, docker/metadata-action, and docker/build-push-action
invocations in both the build matrix and the merge job to their resolved
commit SHAs (with the version comment preserved for human readability),
matching the cypress-io/github-action pinning style #1767 already
established in build.yaml. The actions/* invocations stay on tags - S7637
exempts the first-party github-maintained actions.
confuser added a commit that referenced this pull request Apr 20, 2026
* ci(docker): build linux/arm64 on a native runner instead of qemu

The publish workflow has been broken since the Node 24 bump (#1767). The
arm64 leg ran under qemu-user on the amd64 runner and crashed with
SIGILL (exit 132) inside `npm ci` - V8 13's JIT emits Arm v8.x
instructions that the runner's binfmt qemu cannot decode.

Switch to the docker/build-push-action multi-platform pattern:
  - Matrix per architecture, each on its native runner. amd64 stays on
    `ubuntu-latest`; arm64 moves to the free `ubuntu-24.04-arm` runner
    GitHub now provides for public repos. Both per-arch jobs push
    blob-only `push-by-digest` images so they never compete for shared
    tags.
  - A `merge` job downloads both digests and stitches them into the
    `:latest` and `:<sha>` manifest list with `docker buildx imagetools
    create`, reproducing the previous tag set.

Also add per-arch GHA cache scopes so the two legs do not invalidate
each other.

Temporarily includes `chore/native-arm-publish` in the push trigger so
the workflow can be validated end-to-end before the entry is removed
and the change is opened for review.

* ci(docker): drop temp branch trigger and keep raw SHA tag format

Validation run on chore/native-arm-publish (run 24660044323) succeeded
end-to-end - both per-arch builds passed on their native runners and the
merge job pushed the manifest list to Docker Hub. Remove the temporary
push trigger so the workflow only runs on master, and override the
metadata-action sha prefix so the SHA tag stays bare (matching the
previous github.sha tag format).

* ci(docker): pin docker/* actions to commit SHAs

SonarCloud's githubactions:S7637 ("Use full commit SHA hash for this
dependency") flagged the three docker/* uses in the new merge job
introduced by this PR. Pin all docker/setup-buildx-action,
docker/login-action, docker/metadata-action, and docker/build-push-action
invocations in both the build matrix and the merge job to their resolved
commit SHAs (with the version comment preserved for human readability),
matching the cypress-io/github-action pinning style #1767 already
established in build.yaml. The actions/* invocations stay on tags - S7637
exempts the first-party github-maintained actions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant