perf(docker): slim runtime image and bump Node LTS to 22||24#1767
Merged
perf(docker): slim runtime image and bump Node LTS to 22||24#1767
Conversation
Cuts the published `banmanagement/webui` image from ~340 MB compressed to ~160 MB while bumping the supported Node.js range to the current LTS pair (22 || 24). Image-size changes - Rewrite Dockerfile as a 3-stage build (`prod-deps` -> `builder` -> `runner`) on `node:24-alpine`. The runner copies a pruned production `node_modules` from `prod-deps` and only the build artefacts + custom-server source it actually needs from `builder`. We cannot use Next.js standalone output because `server.js` / `cli/` perform dynamic `require`s that the standalone tracer cannot follow; the pruned-deps approach keeps every runtime require resolvable while still discarding all dev dependencies and `.next/cache`. - Drop `git` from the runner image (we already get `GIT_COMMIT` at build time via `git-revision-webpack-plugin`). - Add `scripts/docker/prune-runtime-deps.sh` to remove platform binaries we never run (non-Alpine `@next/swc-*`, non-Linux/musl `@img/sharp-*`), `react-icons` families the app does not import, `date-fns` CDN bundles, the 13 MB `image-q/demo` directory, and the usual `test`/`docs`/`*.map`/`*.d.ts` cruft from `node_modules`. - Tighten `.dockerignore` so the build context excludes `.git`, `.github`, editor/CI config, dev compose files, scratch scripts, and most markdown. - Reclassify `tailwindcss`, `postcss`, `autoprefixer`, `typescript`, `url-loader`, `@next/bundle-analyzer`, and `git-revision-webpack-plugin` as devDependencies and lazy-require the build-only ones in `next.config.js` so production installs do not pull them in. - Add `react-icons` `modularizeImports` to `next.config.js` so the webpack output only references icon families the app actually uses. Node 22 || 24 - Update `engines.node`, `.naverc`, the CI matrix, and the README to the new LTS pair (22.x is still in active support; 24.x is the new LTS). The Dockerfile uses `node:24-alpine`. CI - Extend the existing `smoke_docker` job so that, after the existing setup-mode boot check, we tear the stack back down, restart MySQL with `docker-compose.e2e-overlay.yml` (which only adds a 3306 host port and pre-seeds ENCRYPTION_KEY/SESSION_KEY/DB_NAME for the WebUI container), seed `bm_e2e_tests` from the runner via `cypress/setup.js`, bring the WebUI container up against the seeded DB, wait for `/health=ok`, and run four cypress journeys (login, registration, admin server lifecycle, admin webhook lifecycle) against the container. This catches Apollo/knex/sharp/argon2 wiring regressions the host-only `test`/`setup_e2e` jobs cannot see. Verified locally - `docker build` -> 159.4 MB compressed image. - `docker compose up` boots, walks the /setup wizard, and exercises /admin/servers + /admin/webhooks (Apollo + knex + sharp + argon2 paths) successfully. - Cypress against the container deferred to CI (host cypress launcher unrelated breakage on this machine).
3324edf to
870df67
Compare
BanManager-WebUI
|
||||||||||||||||||||||||||||
| Project |
BanManager-WebUI
|
| Branch Review |
chore/docker-image-slim-node24
|
| Run status |
|
| Run duration | 02m 33s |
| Commit |
|
| Committer | James Mortemore |
| View all properties for this run ↗︎ | |
| Test results | |
|---|---|
|
|
0
|
|
|
0
|
|
|
1
|
|
|
0
|
|
|
49
|
| View all changes introduced in this branch ↗︎ | |
- Dockerfile: add `--chmod=0555` (r-xr-xr-x) to all 10 runtime COPY
commands so the image ships read-only, owned by `nextjs:nodejs`.
Closes SonarCloud docker:S6504. The `nextjs` user only needs to
read/load these and traverse directories; mutable runtime state
(cache, uploads, config) lives on the VOLUMEs declared below the
COPYs, which Docker mounts independently of the image FS.
Verified locally: `docker build` + `node -e "require('argon2');
require('sharp'); require('mysql2')"` still loads all three native
modules cleanly, .bin shims and bin/* scripts retain +x via 0555.
- workflows/build.yaml: pin all three `cypress-io/github-action`
uses (lines 73, 133, the new line 316) to a full commit SHA -
v6.10.9 = f790eee7a50d9505912f50c2095510be7de06aa7. Closes
SonarCloud githubactions:S7637 and protects against tag hijacking
on the action repo. Renovate will continue to bump these via the
trailing `# v6.10.9` comment.
`--chmod=0555` alone wasn't enough - SonarCloud's docker:S6504 also
fires whenever a non-root user owns a copied resource, because that
user can always `chmod` write back later via their ownership. The
compliant pattern (per the rule's own example) is `--chown=root:root
--chmod=...`.
Switch all 10 runtime COPYs to `--chown=root:root --chmod=0555`. The
`nextjs` runtime user can still read/load (world-r) and traverse
(world-x) every file, but can no longer mutate the image FS or chmod
itself write access.
This required restructuring the writable-runtime-state setup, because
the COPY of `/app/public` brings in the tracked
`public/images/opengraph/cache` directory and would now stamp it as
`root:root 0555` - which would lock the runtime out of writing
opengraph cache images. Move the `mkdir -p` + `chown nextjs:nodejs` +
`chmod u+w` for the four writable paths (`.next/cache`, `uploads`,
`config`, `public/images/opengraph/cache`) into a NEW `RUN` layer that
sits AFTER the COPYs, so it deterministically wins regardless of what
the COPYs bring in. The old pre-COPY mkdir+chown was effectively
no-op because the COPYs always re-stamped ownership over it.
Verified locally:
- `docker build` succeeds.
- As `nextjs` (uid 1001): writes to all 4 VOLUME paths
(`/app/uploads/documents`, `/app/config`, `/app/.next/cache/images`,
`/app/public/images/opengraph/cache`) succeed.
- As `nextjs`: writes to `/app/server.js` and tracked public assets
(e.g. `public/player-template.png`) are denied.
- `node -e "require('argon2'); require('sharp'); require('mysql2')"`
succeeds.
- Container boots: entrypoint generates + persists ENCRYPTION_KEY/
SESSION_KEY/VAPID keys to `/app/config/.env`, app reaches
"Listening on …:3000" and enters setup mode.
The previous prune script removed `@img/sharp-linuxmusl-x64`, `@img/sharp-libvips-linuxmusl-x64`, and `@next/swc-linux-x64-musl` under the wrong assumption that the runner image is always linux/arm64-musl. That was an Apple-Silicon-developer-blind-spot: GitHub Actions ubuntu-latest runners are linux/amd64, the published banmanagement/webui:latest image is multi-arch (amd64 + arm64), and all current CI smoke / publish flows build on amd64. Symptom on CI: WebUI container failed to start with Error: Could not load the "sharp" module using the linuxmusl-x64 runtime because we'd deleted the only sharp binary that matched the runtime. Keep BOTH linuxmusl-x64 and linuxmusl-arm64 variants for sharp (plus their libvips siblings) and for @next/swc. Drop the linux-* glibc variants since our base is `node:24-alpine`, and drop all the darwin/win32 variants. Net effect on amd64 image size is unchanged vs. master (we never had x64 binaries removed there, only the prune list was wrong); arm64 image keeps the same x64 insurance now too.
The smoke_docker job's seed step was failing with PROTOCOL_CONNECTION_LOST because MySQL's `mysqladmin ping` healthcheck flips the container to healthy while the entrypoint is still running its temporary mysqld over a unix socket. The real mysqld bound to TCP 3306 only comes up a few seconds later, so the seed script connected to a temp server that then killed the connection mid-query. The script's silent .catch swallowed the error, the workflow continued, and WebUI sat in setup_mode_db_unreachable until the wait timed out. cypress/setup.js: surface failures with a non-zero exit code so callers can detect them. Existing callers run against a fully-ready DB so this only affects real-failure paths. build.yaml: wrap the seed step in an 8-attempt retry with 5s backoff so we ride out the MySQL temp -> real server transition cleanly without masking actual seed errors.
pages/_document.js and 6 components/admin/* files do
`import resolveConfig from 'tailwindcss/resolveConfig'`. resolveConfig
is invoked at SSR time (not just at build time) to derive theme colors
for the bundled HTML, and tailwind.config.js itself does
`require('tailwindcss/colors')`. Next.js externalizes the
`tailwindcss/*` resolution to a runtime require, so the production
node_modules tree must contain it.
Without this every cy.visit() in the docker compose smoke job got a
500 with `Cannot find module 'tailwindcss/resolveConfig'` from
.next/server/pages/_document.js. The other build-only packages I
moved (postcss, autoprefixer, typescript, url-loader,
@next/bundle-analyzer, git-revision-webpack-plugin) are only consumed
by next.config.js / tailwind.config.js / postcss.config.js at build
time and stay in devDependencies.
Lockfile shrinks because several transitive packages that were
previously only reached via cypress/etc. devDeps are now also reached
via the production tailwindcss dep, so npm dropped their `dev: true`
flags.
The smoke_docker job's seed step runs on the runner (DB_HOST=127.0.0.1 via the compose port mapping), so server/test/fixtures/server.js was writing host=127.0.0.1 into bm_web_servers. Inside the WebUI container that resolves to the container itself, so the GraphQL DataLoader's servers-pool was hitting ECONNREFUSED 127.0.0.1:3306 on every query. Add BM_DB_HOST/PORT/USER overrides to the fixture so the seed can write the address the WebUI container needs (the compose service name `mysql`) without changing the address its own knex pool uses (127.0.0.1). The existing DB_* fallback keeps jest + setup_e2e flows untouched.
The createServer admin journey types Cypress.env('DB_HOST') (defaulting
to 127.0.0.1) into the "Add Server" form. Inside the WebUI container,
127.0.0.1:3306 is the container itself, so the createServer mutation's
connection probe fails with ECONNREFUSED. Pass CYPRESS_DB_HOST=mysql
(plus port/user/password) to the cypress action so the spec submits the
compose service name and the WebUI can actually reach the new BM server.
The updateServer resolver always re-validates the BM database
connection, even when only the name has changed. The edit form
deliberately does not pre-fill the password (the API doesn't return it),
so submitting after just changing the name sends an empty password and
the mutation fails with DB_CONNECTION_ERROR against the docker-compose
MySQL (which does require a password).
Local setup_e2e never tripped this because its MySQL accepts the empty
default; the dockerised smoke test now does, so retype the password
before submit if Cypress.env('DB_PASSWORD') is set, mirroring the
create-server step's existing logic.
|
BanManager-WebUI
|
||||||||||||||||||||||||||||
| Project |
BanManager-WebUI
|
| Branch Review |
master
|
| Run status |
|
| Run duration | 02m 41s |
| Commit |
|
| Committer | James Mortemore |
| View all properties for this run ↗︎ | |
| Test results | |
|---|---|
|
|
0
|
|
|
0
|
|
|
1
|
|
|
0
|
|
|
49
|
| View all changes introduced in this branch ↗︎ | |
confuser
added a commit
that referenced
this pull request
Apr 20, 2026
The publish workflow has been broken since the Node 24 bump (#1767). The arm64 leg ran under qemu-user on the amd64 runner and crashed with SIGILL (exit 132) inside `npm ci` - V8 13's JIT emits Arm v8.x instructions that the runner's binfmt qemu cannot decode. Switch to the docker/build-push-action multi-platform pattern: - Matrix per architecture, each on its native runner. amd64 stays on `ubuntu-latest`; arm64 moves to the free `ubuntu-24.04-arm` runner GitHub now provides for public repos. Both per-arch jobs push blob-only `push-by-digest` images so they never compete for shared tags. - A `merge` job downloads both digests and stitches them into the `:latest` and `:<sha>` manifest list with `docker buildx imagetools create`, reproducing the previous tag set. Also add per-arch GHA cache scopes so the two legs do not invalidate each other. Temporarily includes `chore/native-arm-publish` in the push trigger so the workflow can be validated end-to-end before the entry is removed and the change is opened for review.
2 tasks
confuser
added a commit
that referenced
this pull request
Apr 20, 2026
SonarCloud's githubactions:S7637 ("Use full commit SHA hash for this
dependency") flagged the three docker/* uses in the new merge job
introduced by this PR. Pin all docker/setup-buildx-action,
docker/login-action, docker/metadata-action, and docker/build-push-action
invocations in both the build matrix and the merge job to their resolved
commit SHAs (with the version comment preserved for human readability),
matching the cypress-io/github-action pinning style #1767 already
established in build.yaml. The actions/* invocations stay on tags - S7637
exempts the first-party github-maintained actions.
confuser
added a commit
that referenced
this pull request
Apr 20, 2026
* ci(docker): build linux/arm64 on a native runner instead of qemu The publish workflow has been broken since the Node 24 bump (#1767). The arm64 leg ran under qemu-user on the amd64 runner and crashed with SIGILL (exit 132) inside `npm ci` - V8 13's JIT emits Arm v8.x instructions that the runner's binfmt qemu cannot decode. Switch to the docker/build-push-action multi-platform pattern: - Matrix per architecture, each on its native runner. amd64 stays on `ubuntu-latest`; arm64 moves to the free `ubuntu-24.04-arm` runner GitHub now provides for public repos. Both per-arch jobs push blob-only `push-by-digest` images so they never compete for shared tags. - A `merge` job downloads both digests and stitches them into the `:latest` and `:<sha>` manifest list with `docker buildx imagetools create`, reproducing the previous tag set. Also add per-arch GHA cache scopes so the two legs do not invalidate each other. Temporarily includes `chore/native-arm-publish` in the push trigger so the workflow can be validated end-to-end before the entry is removed and the change is opened for review. * ci(docker): drop temp branch trigger and keep raw SHA tag format Validation run on chore/native-arm-publish (run 24660044323) succeeded end-to-end - both per-arch builds passed on their native runners and the merge job pushed the manifest list to Docker Hub. Remove the temporary push trigger so the workflow only runs on master, and override the metadata-action sha prefix so the SHA tag stays bare (matching the previous github.sha tag format). * ci(docker): pin docker/* actions to commit SHAs SonarCloud's githubactions:S7637 ("Use full commit SHA hash for this dependency") flagged the three docker/* uses in the new merge job introduced by this PR. Pin all docker/setup-buildx-action, docker/login-action, docker/metadata-action, and docker/build-push-action invocations in both the build matrix and the merge job to their resolved commit SHAs (with the version comment preserved for human readability), matching the cypress-io/github-action pinning style #1767 already established in build.yaml. The actions/* invocations stay on tags - S7637 exempts the first-party github-maintained actions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
banmanagement/webuiimage from ~340 MB compressed → ~160 MB by switching to a 3-stage Alpine Dockerfile, pruning runtimenode_modules, droppinggitfrom the runner, and reclassifying build-only deps as devDependencies.package.json,.naverc, the CI matrix, the Dockerfile base image, and the README.smoke_dockerCI job so 4 cypress journeys run against the dockerised stack (login, registration, admin server lifecycle, admin webhook lifecycle).What changed
Dockerfile3 stages on
node:24-alpine:prod-depsnpm ci --omit=devthenscripts/docker/prune-runtime-deps.shbuildernpm run build, thenrm -rf .next/cacherunnernode_modulesfromprod-depsand only the runtime files frombuilder(.next,public,server,cli,bin,server.js,docker-entrypoint.js,next.config.js,package.json)Notes:
output: 'standalone'and reverted it – the standalone tracer cannot follow the dynamicrequire()s inserver.js/cli/, which produced runtimeCannot find module 'web-push'/'dotenv'failures. Pruned-deps gives us the same size win without the tracing fragility.gitis no longer installed in the runner;git-revision-webpack-pluginonly runs in the builder.tiniis retained for PID 1.scripts/docker/prune-runtime-deps.sh(new)Deletes things
npm ci --omit=devcannot, all of which are documented inline:@next/swc-*binaries@img/sharp-*binariesreact-iconsfamilies the app does not import (kept:ai bi bs fa fi go md ri tb ti)date-fnsCDN bundles (incl. the duplicate inside@nateradebaugh/react-datetime)image-q/demodirectorytest/docs/examples/*.map/*.d.tscruft insidenode_modulespackage.json/package-lock.jsonengines.node:20 || 22→22 || 24devDependencies:tailwindcss,postcss,autoprefixer,typescript,url-loader,@next/bundle-analyzer,git-revision-webpack-pluginnext.config.jsrequire()s for@next/bundle-analyzerandgit-revision-webpack-pluginso the production runtime never imports devDeps.modularizeImportsforreact-iconsso webpack only references the icon families actually used..dockerignoreTightened to exclude
.git,.github,.cursor,.vscode,*.md(exceptREADME.md),LICENSE, dev compose files, scratch scripts (scripts/seed.js),.editorconfig,.eslintrc,cypress.config.js,cypress.setup.config.js,jest.config.js,nodemon.json,renovate.json,captain-definition,CHECKS,.cache_ggshield,.naverc..naverc/README.md/ CI matrixBumped to Node 24 / 22.x + 24.x to match
engines.node..github/workflows/build.yaml(smoke_docker extension)After the existing setup-mode boot check:
down -v).~/.npmand~/.cache/Cypress.npm ci+npx cypress installon the runner.docker-compose.prod.yml+docker-compose.e2e-overlay.yml(the overlay just adds3306:3306and pre-seeds env vars on the WebUI service so it boots in normal mode againstbm_e2e_testsas root).healthy, runnode cypress/setup.jsto migrate + seedbm_e2e_testsfrom the runner./health=ok.cypress/e2e/pages/login.spec.js(argon2 + sessions)cypress/e2e/journeys/registration.spec.js(PIN flow)cypress/e2e/journeys/admin-server-lifecycle.spec.js(Apollo + knex + mysql2 CRUD)cypress/e2e/journeys/admin-webhook-lifecycle.spec.js(webhook + sharp)down -v.This catches
outputFileTracingIncludes-style regressions and any standalone-vs-runtime mismatches that the host-onlytestandsetup_e2ejobs cannot see (those run againstnpm start/e2e:setup:serverrather than the published image).docker-compose.e2e-overlay.yml(new)Tiny overlay layered on top of
docker-compose.prod.yml. Only adds:mysql.ports: 3306:3306so the host runner can seed.webui.environmentwithDB_USER=root,DB_NAME=bm_e2e_tests,ENCRYPTION_KEY,SESSION_KEY,CONTACT_EMAIL,SERVER_FOOTER_NAMEso the container boots straight into normal mode.Test plan
Locally verified:
docker buildsucceeds → 159.4 MB compressed final imagedocker compose upboots end-to-end and reaches /setupExpected from CI:
testmatrix passes on Node 22.x and 24.xsetup_e2epasses on Node 24.xbuild_dockerproduces the slim imagesmoke_dockerpasses both the existing setup-mode check and the new cypress run against the dockerised stack