host: add cgroups v2 support and fix Debian 13 compatibility by phaus · Pull Request #3 · consolving/flynn

phaus · 2026-04-13T21:16:46Z

Add full cgroups v2 support to enable Flynn to run on modern Linux distributions (Debian 13+) where cgroups v1 is compiled out entirely.

Changes:

host: dual v1/v2 cgroup setup with cpu.shares-to-cpu.weight conversion, unified hierarchy controller enablement, and per-container CpuWeight
host: cgroups v2 OOM notification via inotify on memory.events
host: guard CheckCpushares to only run on cgroups v1
host: FIEMAP fallback to sequential copy when tmpfs doesn't support it
postgres: disable TimescaleDB/ExtWhitelist (unavailable in packages layer)
postgres: pre-install uuid-ossp and pgcrypto extensions in template1 so non-superuser app database users can use them without pgextwlist
dns: fix off-by-one panic in clientconfig.go (len>=8 but slice [:9])

These changes, combined with the rebuilt TUF images (3-layer postgres with PostgreSQL 11 packages, controller with JSON schemas), enable successful single-node Flynn cluster bootstrap on Debian 13 (Trixie) with cgroups v2 and ZFS 2.3.

Add full cgroups v2 support to enable Flynn to run on modern Linux distributions (Debian 13+) where cgroups v1 is compiled out entirely. Changes: - host: dual v1/v2 cgroup setup with cpu.shares-to-cpu.weight conversion, unified hierarchy controller enablement, and per-container CpuWeight - host: cgroups v2 OOM notification via inotify on memory.events - host: guard CheckCpushares to only run on cgroups v1 - host: FIEMAP fallback to sequential copy when tmpfs doesn't support it - postgres: disable TimescaleDB/ExtWhitelist (unavailable in packages layer) - postgres: pre-install uuid-ossp and pgcrypto extensions in template1 so non-superuser app database users can use them without pgextwlist - dns: fix off-by-one panic in clientconfig.go (len>=8 but slice [:9]) These changes, combined with the rebuilt TUF images (3-layer postgres with PostgreSQL 11 packages, controller with JSON schemas), enable successful single-node Flynn cluster bootstrap on Debian 13 (Trixie) with cgroups v2 and ZFS 2.3.

…MACs for multi-node Fix two bugs blocking 3-node cluster bootstrap: 1. PostgreSQL primary crash-loops when sync replica exists because installExtensionsInTemplate() runs CREATE EXTENSION against template1 without overriding default_transaction_read_only=on (set in postgresql.conf when downstream != nil). Add SET default_transaction_read_only=off before extension DDL. 2. Flannel VXLAN overlay is broken on cloned VMs because all nodes get identical flannel.1 MAC addresses (kernel derives MAC deterministically from VNI + machine state). Add netlink.LinkSetHardwareAddr() after device creation to set a unique MAC derived from the VTEP IP (02:42:IP[0]:IP[1]:IP[2]:IP[3]). Also increase bootstrap wait timeouts from 5 to 10 minutes and add configurable timeout field to WaitAction for multi-node clusters where service startup takes longer.

The router registered services (router-api, router-http) with discoverd using LISTEN_IP (typically 0.0.0.0), which is not a routable address. Other services (status aggregator, scheduler) could not reach the router, causing the cluster to report unhealthy and the scheduler to loop with "route not found" errors. Now uses EXTERNAL_IP for registration while keeping LISTEN_IP for binding.

Replace NeighAdd with NeighSet for FDB entries to avoid 'file exists' errors when entries already exist (idempotent upsert vs exclusive create). Derive a unique MAC address for each node's flannel.1 device from its VTEP IP (02:42:xx:xx:xx:xx) instead of using the default MAC from the base image. Without this, nodes cloned from the same image share identical MACs which breaks VXLAN forwarding. Only set the MAC when it differs from the current one to avoid flushing ARP neighbor entries. Add retry logic that brings the link down/up if setting the MAC on a running interface fails.

The primary starts PostgreSQL in read-only mode when a downstream (sync standby) exists. But assumePrimary needs read-write access to create the superuser and install extensions in the freshly-initialized database. The session-level SET default_transaction_read_only=off was insufficient — CREATE EXTENSION still failed with 'cannot execute in a read-only transaction', causing assumePrimary to fail, which called p.stop(), killing postgres and all replication connections, creating an infinite loop. Fix: Start read-write during initial setup (the database was just created with initdb, there is no user data to protect). After setup completes, switch to read-only mode and SIGHUP postgres before calling waitForSync. Remove the now-unnecessary SET TRANSACTION READ WRITE hack.

On cgroups v2, each container's OOM notification uses an inotify instance. With many containers (89+ from bootstrap), the default max_user_instances=128 is exhausted, causing NotifyOOM to fail. The watch() goroutine previously returned this error, which triggered Destroy() and killed the container within ~1 second with no user-visible error message. Make the OOM notification failure non-fatal: log a warning and continue watching for state changes. The container still functions correctly without OOM monitoring. Also fix DNS resolver detection for systemd-resolved environments. On Debian 13, /etc/resolv.conf points to the stub resolver at 127.0.0.53 which is unreachable from containers in separate network namespaces. Fall back to /run/systemd/resolve/resolv.conf which contains the real upstream resolver IPs.

Update resource limit tests to work on both cgroups v1 and v2: - resourceCmd: auto-detect cgroup version inside containers. On v2, read memory.max and cpu.weight from the container's cgroup path (discovered via /proc/1/cgroup) instead of v1's fixed paths. - Add cpuSharesToWeight() helper matching the kernel's conversion formula: weight = 1 + ((shares - 2) * 9999) / 262142. - Add isCgroupV2() detection based on /sys/fs/cgroup/cgroup.controllers. - Set DisableLog: true on test jobs that capture output via attach streams. This avoids a race condition in the log mux where short-lived jobs complete before StreamLog sets up its subscription, causing the attach client to block forever. - Make setupGitreceive() conditional on the -run filter matching git-related tests, so non-git tests don't block on broken deployments. - Update slugbuilder-limit test app to read v2 cgroup files. All 4 resource limit tests pass: CLISuite.TestRunLimits HostSuite.TestResourceLimits ControllerSuite.TestResourceLimitsOneOffJob ControllerSuite.TestResourceLimitsReleaseJob

This project is actively maintained again — remove the warning that was added when Flynn was abandoned.

phaus and others added 9 commits April 13, 2026 23:15

cli: remove unmaintained warning

7cfa373

This project is actively maintained again — remove the warning that was added when Flynn was abandoned.

config: remove local vLLM provider from opencode.json

18cbedf

phaus merged commit 18cbedf into master Apr 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

host: add cgroups v2 support and fix Debian 13 compatibility#3

host: add cgroups v2 support and fix Debian 13 compatibility#3
phaus merged 9 commits intomasterfrom
debian13-cgroups-v2-bootstrap

phaus commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phaus commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant