ci: set SKIP_CAPACITY on gen2 jobs to fit class RAM#1246
Merged
Conversation
gen2 CircleCI machines enforce virtual memory limits at the cgroup level. The Skip runtime's default SKIP_CAPACITY of 16 GB (palloc.c) fails to mmap on any class with <16 GB RAM, producing "ERROR (MAP FAILED): Cannot allocate memory" early in the job. Observed on skipruntime/large.gen2 in PR #1245 and on an earlier skdb-wasm/large.gen2 run that died at 61 s. gen1 tolerated the same 16 GB virtual mmap because only RSS was enforced and the mmap is lazy. Set SKIP_CAPACITY explicitly per gen2 job, leaving ~25% RAM headroom for OS and tooling: check-ts medium.gen2 (4 GB) -> 3G skdb large.gen2 (8 GB) -> 6G skdb-wasm large.gen2 (8 GB) -> 6G skipruntime large.gen2 (8 GB)* -> 6G compiler xlarge.gen2 (16 GB) -> 12G * postgres + kafka sidecars get their own RAM allocation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
mbouaziz
added a commit
that referenced
this pull request
May 27, 2026
…1247) ## Summary Follow-up to #1246. The previous PR set `SKIP_CAPACITY` on the CircleCI primary container, but the value doesn't propagate into nested `docker build` steps that run on the remote-docker host. Skip toolchain invocations inside those builds reverted to the runtime's 16 GB default `mmap` and OOM'd on gen2 hosts. Observed in [pipeline 4507 / job 16690](https://app.circleci.com/pipelines/github/SkipLabs/skip/4507/workflows/bb0eb244-0515-4641-8e4f-974a89f63aac/jobs/16690) (skipruntime job, "Run native addon unreleased test" step): the docker build of `skiplabs/skiplang-bin-builder` failed at [skiplang/Dockerfile:41](skiplang/Dockerfile#L41) with `ERROR (MAP FAILED): Cannot allocate memory` during `make STAGE=0`. ## Fix Two parts: 1. **Declare `ARG SKIP_CAPACITY=` + `ENV SKIP_CAPACITY=$SKIP_CAPACITY`** in each Dockerfile that invokes the Skip toolchain at build time: - [Dockerfile](Dockerfile) (top-level, `bootstrap` stage inherits ENV via `skiplang-base`) - [skiplang/Dockerfile](skiplang/Dockerfile) (`skiplang` stage inherits via `base`) - [skipruntime-ts/Dockerfile](skipruntime-ts/Dockerfile) ARG defaults to empty so local dev builds preserve the runtime's 16 GB default — `palloc.c:449-451` treats an empty `SKIP_CAPACITY` as unset. 2. **[bin/docker_build.sh](bin/docker_build.sh) forwards `$SKIP_CAPACITY`** (when set) as a build arg to all bake targets that invoke the toolchain (`skiplang`, `skip`, `skiplang-bin-builder`, `skipruntime`). Mirrors the existing `STAGE` forwarding pattern. In CI this picks up the value already set on the job by #1246, so no additional `.circleci/base.yml` change is needed. ## Test plan - [ ] After merge, watch the next PR that triggers `skipruntime` — the "Run native addon unreleased test" step's docker build should no longer fail with `MAP FAILED`. - [ ] Confirm local `bin/docker_build.sh skipruntime` (without `SKIP_CAPACITY` env var) still uses the runtime's 16 GB default and builds fine on a developer machine. - [ ] Confirm `STAGE=1 bin/docker_build.sh skiplang` still works — the new SKIP_CAPACITY block sits next to STAGE and shouldn't interfere. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
pull Bot
pushed a commit
to edisplay/skip
that referenced
this pull request
May 27, 2026
Following SkipLabs#1246, SKIP_CAPACITY is set on the CircleCI primary container but does not propagate into nested `docker build` steps, which run on a separate (remote-docker) host. Skip toolchain invocations inside those builds reverted to the runtime's 16 GB default mmap and OOM'd on gen2 hosts that enforce virtual memory limits. Observed in pipeline 4507 / job 16690 (skipruntime): the docker build of skiplabs/skiplang-bin-builder failed at skiplang/Dockerfile:41 with "ERROR (MAP FAILED): Cannot allocate memory" during `make STAGE=0`. Fix in two parts: 1. Declare `ARG SKIP_CAPACITY=` + `ENV SKIP_CAPACITY=$SKIP_CAPACITY` in each Dockerfile whose build invokes the Skip toolchain (Dockerfile, skiplang/Dockerfile, skipruntime-ts/Dockerfile). Default empty so local dev builds keep the runtime's 16 GB default; palloc.c treats empty SKIP_CAPACITY as unset. 2. docker_build.sh forwards $SKIP_CAPACITY (if set) as a build arg to all bake targets that invoke the toolchain, mirroring the existing STAGE forwarding pattern. In CI this picks up the value already set on the job, so no additional config is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
mbouaziz
added a commit
that referenced
this pull request
May 27, 2026
sk_create_mapping printed "CAPACITY SET TO: <bytes>" to stdout for every Skip process started with a non-default capacity. This was debug output that escaped into normal stdout — fine when nothing relied on stdout exactness, but the skdb diff tests (test/diff/*.sql) compare runtime stdout to .expected golden files and now fail on every test once SKIP_CAPACITY is set in CI (e.g. via the gen2 fix in #1246). Just remove the print. The value is observable via /proc/self/maps or by recognising the cgroup limit; no observability is being lost that needed to be on stdout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mbouaziz
added a commit
that referenced
this pull request
May 27, 2026
## Summary `sk_create_mapping` in [skiplang/prelude/runtime/palloc.c:567-569](skiplang/prelude/runtime/palloc.c#L567-L569) printed `CAPACITY SET TO: <bytes>` to **stdout** for every Skip process started with a non-default capacity. This was harmless until #1246 set `SKIP_CAPACITY` on gen2 CI jobs to avoid `MAP FAILED` OOMs. Now every Skip process emits the line, and the skdb diff tests (`test/diff/*.sql`) — which compare runtime stdout to `.expected` golden files — fail on every test. Observed in [job 16738](https://app.circleci.com/pipelines/github/SkipLabs/skip/4522/workflows/523a19b3-8783-4f62-a2c2-3a188b0e0716/jobs/16738) (skdb on `large.gen2`, `SKIP_CAPACITY=6G` → `6442450944`): ``` CAPACITY SET TO: 6442450944 04 - test/diff/select1_views.sql (part-2): FAILED ... ``` Repeated for ~40 diff tests. The skdb workflow has been silently broken for any PR that triggers it (most don't, because of generate_config.sh's per-package diff heuristic; this PR triggered it because it touches a `skiplang/prelude/` file). ## Fix Remove the print. If anyone needs to confirm the runtime's capacity at startup, `/proc/self/maps` or the `--capacity` CLI flag echo are cleaner channels — stdout is reserved for program output that golden-file tests can rely on. ## Test plan - [ ] After merge, a PR that touches `skiplang/prelude/` and triggers `skdb` should now pass. - [ ] `compiler` and `skipruntime` (also affected, since they run with `SKIP_CAPACITY=12G` / `6G`) should remain green — they don't compare stdout, so they weren't failing visibly, but the line was still being emitted and cluttering logs. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Gen2 CircleCI machines enforce virtual memory limits at the cgroup level. The Skip runtime's default
SKIP_CAPACITYof 16 GB (skiplang/prelude/runtime/palloc.c:449-451) fails tommapon any class with <16 GB RAM, producingERROR (MAP FAILED): Cannot allocate memoryearly in the job.Observed on
skipruntime/large.gen2in PR #1245 (job 16682) and on an earlierskdb-wasm/large.gen2run that died at 61 s (workflowf78dd673, 2026-05-27 09:37). Gen1 tolerated the same 16 GB virtual mmap because only RSS was enforced and the mmap is lazy — gen2 evidently does not.Set
SKIP_CAPACITYexplicitly per gen2 job, leaving ~25% RAM headroom for OS + tooling:SKIP_CAPACITY3G6G6G6G12GWhy also setting the ones that haven't (yet) failed: same
mmaphappens on every Skip-toolchain invocation, so any gen2 job usingskiplabs/skip*images is one coin flip away from the same failure.skipruntimeonlarge.gen2succeeded 3× in a row on the Phase 1 PR before failing on #1245 — the variance is real.check-examplesis onlarge.gen2but usescimg/baseand only runs the Skip toolchain inside docker-compose containers (separate cgroups), so it's not affected.Test plan
skiplang/prelude/src/foofor compiler + skdb + skdb-wasm + skipruntime, and a ts workspace file for check-ts).skipruntimeno longer dies withMAP FAILEDatskargo build.compilerstill passes — 12 G cap is well above measured peak (~10 GB).skdbandskdb-wasmover the next few PRs for any new failures specifically at allocation time; if they appear, drop the cap further.🤖 Generated with Claude Code