A proof-of-concept demonstrating how a fully unprivileged container can achieve node-level code execution on Kubernetes by exploiting the CVE-2026-31431 Linux kernel page-cache corruption bug through shared container image layers.
The core attack primitive is: any privileged DaemonSet sharing image layers with an attacker-controlled container can be weaponized for container escape. This PoC uses kube-proxy as one concrete example, but the technique generalizes to any privileged workload on the cluster.
Validated on Alibaba Cloud ACK, Amazon EKS, and Google GKE — an unprivileged pod writes [*] success to the host filesystem via the privileged kube-proxy DaemonSet:
| Alibaba Cloud ACK (kernel 6.6.88) | Amazon EKS (kernel 6.12.79) | Google GKE (kernel 6.12.68) |
|---|---|---|
![]() |
![]() |
![]() |
Disclaimer: This repository is published for educational and defensive purposes only. Use it exclusively on systems you own or have explicit authorization to test.
CVE-2026-31431 ("Copy Fail") is a Linux kernel vulnerability in the page-cache Copy-on-Write (CoW) path. An AF_ALG splice race allows an unprivileged process to corrupt the page-cache pages of a read-only file. The corruption persists in the kernel page cache and is visible to every process that subsequently reads or executes the file — including processes in other containers or on the host.
For full details on the original vulnerability, see copy.fail.
The attack exploits three properties that commonly coexist in Kubernetes clusters:
- Kernel page-cache corruption (CVE-2026-31431) — an unprivileged process can overwrite the in-memory cached pages of any file it can open read-only.
- Image layer sharing — container runtimes (containerd, CRI-O) use overlay filesystems where identical image layers map to the same page-cache pages across containers.
- Privileged DaemonSets — many clusters run DaemonSets with elevated privileges (
privileged: true,hostNetwork: true, broad capabilities, etc.) that periodically execute binaries from their image.
When these conditions align, an unprivileged pod can corrupt a binary in a shared image layer, and a privileged DaemonSet on the same node will unknowingly execute the corrupted binary with its elevated privileges — achieving full node-level code execution.
The vulnerability target is NOT limited to kube-proxy. Any privileged DaemonSet (monitoring agents, CNI plugins, log collectors, security agents, etc.) whose container image shares layers with an attacker-controlled image is a viable target.
The attack chain has three stages: page-cache corruption, cross-container propagation, and privileged execution.
The kernel's AF_ALG (crypto) subsystem exposes a socket-based interface for userspace cryptographic operations. The exploit abuses a race condition in how the kernel handles splice() from a file into an AF_ALG socket:
- Open the target binary read-only.
- Create an AF_ALG AEAD socket bound to
authencesn(hmac(sha256),cbc(aes)). - Send a small payload chunk through the AF_ALG socket with
MSG_MORE, telling the kernel to expect more data. splice()the target file's contents from an fd → pipe → AF_ALG socket.- Due to the CoW bug, the kernel writes the attacker's payload bytes into the target file's page-cache pages instead of properly isolating them.
The exploit repeats this for each 4-byte window until the entire target binary's cached pages are overwritten with a custom payload.
No write permission to the file is needed. The file on disk is unchanged — only the in-memory page cache is corrupted.
Container runtimes use overlay filesystems. When two containers share the same image layer, the kernel serves their file reads from the same page-cache pages.
The attacker builds their PoC image FROM the same base image as the target privileged DaemonSet. Because both containers share the same overlay lower-dir, binaries in the shared layer map to identical page-cache pages.
When the unprivileged PoC container corrupts a binary's page cache, the corruption is immediately visible to the privileged container on the same node — with zero cross-container communication.
When the privileged DaemonSet next executes any corrupted binary (through its normal operation cycle), the kernel loads the corrupted page-cache pages. The attacker's payload runs with the DaemonSet's full privileges — potentially including:
- Full root on the node
- All capabilities
- Access to host namespaces (network, PID, mount)
The payload in this PoC (payload/payload.c) simply mounts the host root filesystem and writes a marker file to /root/res as proof of node-level code execution.
┌──────────────────────────┐ ┌──────────────────────────┐
│ PoC Container │ │ Privileged DaemonSet │
│ (unprivileged) │ │ (e.g. kube-proxy, │
│ │ │ monitoring agent, etc.)│
│ 1. Open target binary │ │ │
│ (read-only) │ │ │
│ │ │ │
│ 2. AF_ALG splice race │ │ │
│ corrupts page cache │ │ │
│ │ │ │ │
└──────────┼───────────────┘ └──────────────────────────┘
│ │
▼ │
┌─────────────────────┐ │
│ Kernel Page Cache │ │
│ │◄────────────────────┘
│ Shared-layer binary │ 3. DaemonSet executes the
│ (CORRUPTED) │ corrupted binary
│ contains attacker's │ → loads corrupted pages
│ payload bytes │ → payload runs with
└─────────────────────┘ DaemonSet's privileges
The PoC has been successfully validated on the following managed Kubernetes platforms:
| Property | Value |
|---|---|
| Platform | Alibaba Cloud Container Service for Kubernetes (ACK) |
| Kubernetes | v1.35.2 |
| Node Kernel | 6.6.88-4.2.alnx4.x86_64 |
| kube-proxy | registry-cn-*.ack.aliyuncs.com/acs/kube-proxy:v1.35.2-aliyun.1 |
| Base Image | registry.k8s.io/kube-proxy:v1.35.2 (upstream) |
| Root Device | /dev/vda3 (ext4) |
| Property | Value |
|---|---|
| Platform | Amazon Elastic Kubernetes Service (EKS) |
| Kubernetes | v1.35.4 |
| Node Kernel | 6.12.79-101.147.amzn2023.x86_64 |
| kube-proxy | ***.dkr.ecr.***.amazonaws.com.cn/eks/kube-proxy:v1.35.3-eksbuild.2 |
| Base Image | public.ecr.aws/eks-distro-build-tooling/eks-distro-minimal-base-iptables:2026-03-11-1773190710.2023 |
| Root Device | /dev/nvme0n1p1 (xfs) |
| Property | Value |
|---|---|
| Platform | Google Kubernetes Engine (GKE) |
| Kubernetes | v1.35.3-gke.1234000 |
| Node OS | Container-Optimized OS (COS) 125, BUILD_ID 19216.220.72 |
| Node Kernel | 6.12.68+ x86_64 |
| kube-proxy | us-central1-artifactregistry.gcr.io/gke-release/gke-release/kube-proxy:v1.35.3-gke.1234000 |
| Base Image | Same as kube-proxy (GKE provider-managed Artifact Registry image) |
| Root Device | /dev/dm-0 (ext2, read-only); /dev/sda1 (ext4, writable stateful partition) |
| Marker Path | /mnt/stateful_partition/copyfail-res |
In all three cases, an unprivileged PoC pod successfully wrote the [*] success marker file to the host filesystem — proving node-level code execution through the privileged kube-proxy DaemonSet.
For the complete walkthroughs (image layer analysis, build steps, deployment):
- EKS: docs/eks-poc.md
- GKE: docs/gke-poc.md
This PoC uses kube-proxy as the target because it is one of the most common privileged DaemonSets in Kubernetes clusters. Three variants are provided:
- Default (ACK / upstream): built
FROM registry.k8s.io/kube-proxy:v1.35.2(seeDockerfile) - EKS: built
FROM public.ecr.aws/eks-distro-build-tooling/eks-distro-minimal-base-iptables:2026-03-11-1773190710.2023(seeDockerfile.eks) - GKE: built
FROM us-central1-artifactregistry.gcr.io/gke-release/gke-release/kube-proxy:v1.35.3-gke.1234000(seeDockerfile.gke)
All variants corrupt binaries like /usr/sbin/ipset, /usr/sbin/nft, /usr/sbin/xtables-legacy-multi, and /usr/sbin/xtables-nft-multi.
Important caveats:
- kube-proxy only invokes
ipsetwhen configured in ipvs mode. The default mode (iptables) does not useipset. See kubernetes/enhancements#5495 for the ipvs deprecation plan. - Some managed Kubernetes distributions (e.g. certain cloud providers) run kube-proxy as a non-privileged container, which limits the impact of the escape.
- The PoC targets multiple binaries (
ipset,nft,xtables-legacy-multi,xtables-nft-multi) to cover different proxy modes, but whether they get invoked depends on cluster configuration.
If kube-proxy is not privileged in your cluster, the attack principle still holds — you just need to identify a different privileged DaemonSet that shares image layers with a base image you can build from.
To adapt this PoC to a different privileged DaemonSet:
- Identify a privileged DaemonSet running on the cluster (monitoring agents, CNI plugins, log collectors, etc.).
- Build your PoC image
FROMthe same base image used by that DaemonSet. - Identify binaries in the shared layer that the DaemonSet will execute during its normal operation.
- Corrupt those binaries' page cache using the exploit.
.
├── cmd/copyfail/main.go # Entry point; embeds compiled payload
├── internal/
│ ├── exploit/
│ │ ├── exploit.go # Core exploit: AF_ALG splice race loop
│ │ └── patch.go # Splits payload into 4-byte patch windows
│ └── alg/
│ └── alg.go # AF_ALG AEAD socket abstraction
├── payload/
│ ├── payload.c # ACK/upstream payload (mount /dev/vda3 ext4)
│ ├── payload-eks.c # EKS payload (NVMe/Xen device auto-detection)
│ ├── payload-gke.c # GKE payload (COS/Ubuntu device auto-detection)
│ └── nolibc/ # Kernel's tiny libc for static, no-dependency payloads
├── deploy/
│ ├── poc.yaml # Kubernetes Deployment manifest (ACK/upstream)
│ ├── poc-eks.yaml # EKS Deployment manifest
│ └── poc-gke.yaml # GKE Deployment manifest
├── Dockerfile # ACK/upstream: FROM registry.k8s.io/kube-proxy
├── Dockerfile.eks # EKS: FROM eks-distro-minimal-base-iptables
├── Dockerfile.gke # GKE: FROM gke-release/kube-proxy
├── Makefile # Build orchestration (includes *-eks and *-gke targets)
└── docs/
├── eks-poc.md # EKS PoC full walkthrough
├── gke-poc.md # GKE PoC full walkthrough
├── ack-poc-res.png # ACK validation screenshot
├── eks-poc-res.png # EKS validation screenshot
└── gke-poc-res.png # GKE validation screenshot
- Go 1.25+
- A cross-compiler for the nolibc payload (default:
x86_64-linux-gnu-gcc) - Docker / Buildx
- A Kubernetes cluster with a privileged DaemonSet that shares image layers with the PoC image (the default example targets kube-proxy)
imagePullPolicy: IfNotPresenton the target DaemonSet (the Kubernetes default)- Linux kernel before the CVE-2026-31431 fix
# Build payload + Go binary
make build
# Build Docker image
make docker-build
# Build and push to GHCR
make docker-push IMAGE=ghcr.io/<you>/copy-fail-poc TAG=latest# Build EKS payload + Go binary + Docker image
make docker-build-eks
# Build and push to GHCR
make docker-push-eks IMAGE=ghcr.io/<you>/copy-fail-pocFor arm64 targets (Graviton):
make build-eks CC=aarch64-linux-gnu-gcc GOARCH=arm64# Build GKE payload + Go binary + Docker image
make docker-build-gke
# Build and push to GHCR
make docker-push-gke IMAGE=ghcr.io/<you>/copy-fail-pocFor arm64 nodes:
make docker-build-gke CC=aarch64-linux-gnu-gcc GOARCH=arm64 PLATFORM=linux/arm64# ACK / upstream Kubernetes
kubectl apply -f deploy/poc.yaml
# Amazon EKS
kubectl apply -f deploy/poc-eks.yaml
# Google GKE
kubectl apply -f deploy/poc-gke.yamlThe Deployment creates a single unprivileged pod. It:
- Runs
/bin/copyfailto corrupt the page cache of target binaries in the shared image layer. - Sleeps indefinitely so the pod stays running for observation.
After the target privileged DaemonSet next executes a corrupted binary (for kube-proxy, this typically happens within seconds due to its reconciliation loop), check the node:
# SSH into the node, or use a privileged debug pod
# ACK / EKS (writable root filesystem)
cat /root/res
# Expected output: [*] success
# GKE COS nodes (read-only root, writable stateful partition)
cat /mnt/stateful_partition/copyfail-res
# Expected output: [*] successThe presence of the marker file on the host filesystem proves that attacker-supplied code executed with node-level privileges — from inside the privileged DaemonSet's container context.
kubectl delete -f deploy/poc.yaml # or poc-eks.yaml / poc-gke.yaml
# On the affected node(s), remove the marker and restart the target DaemonSet:
rm -f /root/res # ACK / EKS
rm -f /copyfail-res /mnt/stateful_partition/copyfail-res # GKE COS nodes
# For kube-proxy: delete the pod to force image layer re-read
kubectl delete pod -n kube-system -l k8s-app=kube-proxy --field-selector spec.nodeName=<node>The default payload (payload/payload.c) is a validation-only program that writes a marker file. To build a custom payload:
- Edit
payload/payload.c. The program is built againstnolibc(the kernel's minimal C library) for a static, dependency-free binary. - Run
make payloadto cross-compile. - The compiled payload is embedded into the Go binary via
//go:embed.
- Linux kernel: All versions before the CVE-2026-31431 patch.
- Kubernetes: Any version using an unpatched node kernel. The vulnerability is in the kernel, not in Kubernetes itself. Kubernetes merely provides the execution context (shared image layers + privileged DaemonSets) that elevates the impact from local page-cache corruption to full container escape.
- Patch the kernel. This is the definitive fix.
- Enable image layer isolation. Some runtimes support per-container filesystem snapshots that prevent page-cache sharing.
- Minimize privileged DaemonSets. Reduce the number of workloads running with elevated privileges; use the principle of least privilege.
- Drop unnecessary capabilities from DaemonSets that don't strictly require
privileged: true. - Restrict pod scheduling to prevent untrusted workloads from landing on nodes running privileged DaemonSets with shared base images.
- Use distinct base images for privileged workloads to reduce the chance of layer sharing with untrusted containers.
- vArmor built-in mitigation rule: copy-fail-mitigation blocks the exploit vector by preventing containers from creating
AF_ALGsockets. The rule is available through the AppArmor and BPF enforcers. - Kubernetes eBPF mitigation: iwanhae/copyfail-ebpf-k8s provides an eBPF-based Kubernetes mitigation example for CVE-2026-31431.
- CVE-2026-31431 discovery and disclosure: Theori / Xint
- Cross-platform C payload: Tony Gies (LGPL-2.1-or-later OR MIT)
- nolibc: Linux kernel selftests (
tools/include/nolibc/)
The Go exploit code in this repository is provided as-is for research purposes.
The payload (payload/payload.c) is derived from copy-fail-c and is dual-licensed under LGPL-2.1-or-later OR MIT. See LICENSE-LGPL and LICENSE-MIT.


