Skip to content

Percivalll/Copy-Fail-CVE-2026-31431-Kubernetes-PoC

Repository files navigation

Copy Fail (CVE-2026-31431) — Kubernetes Container Escape PoC

A proof-of-concept demonstrating how a fully unprivileged container can achieve node-level code execution on Kubernetes by exploiting the CVE-2026-31431 Linux kernel page-cache corruption bug through shared container image layers.

The core attack primitive is: any privileged DaemonSet sharing image layers with an attacker-controlled container can be weaponized for container escape. This PoC uses kube-proxy as one concrete example, but the technique generalizes to any privileged workload on the cluster.

Validated on Alibaba Cloud ACK, Amazon EKS, and Google GKE — an unprivileged pod writes [*] success to the host filesystem via the privileged kube-proxy DaemonSet:

Alibaba Cloud ACK (kernel 6.6.88) Amazon EKS (kernel 6.12.79) Google GKE (kernel 6.12.68)
ACK EKS GKE

Disclaimer: This repository is published for educational and defensive purposes only. Use it exclusively on systems you own or have explicit authorization to test.

Background

CVE-2026-31431 ("Copy Fail") is a Linux kernel vulnerability in the page-cache Copy-on-Write (CoW) path. An AF_ALG splice race allows an unprivileged process to corrupt the page-cache pages of a read-only file. The corruption persists in the kernel page cache and is visible to every process that subsequently reads or executes the file — including processes in other containers or on the host.

For full details on the original vulnerability, see copy.fail.

Attack Principle

The attack exploits three properties that commonly coexist in Kubernetes clusters:

  1. Kernel page-cache corruption (CVE-2026-31431) — an unprivileged process can overwrite the in-memory cached pages of any file it can open read-only.
  2. Image layer sharing — container runtimes (containerd, CRI-O) use overlay filesystems where identical image layers map to the same page-cache pages across containers.
  3. Privileged DaemonSets — many clusters run DaemonSets with elevated privileges (privileged: true, hostNetwork: true, broad capabilities, etc.) that periodically execute binaries from their image.

When these conditions align, an unprivileged pod can corrupt a binary in a shared image layer, and a privileged DaemonSet on the same node will unknowingly execute the corrupted binary with its elevated privileges — achieving full node-level code execution.

The vulnerability target is NOT limited to kube-proxy. Any privileged DaemonSet (monitoring agents, CNI plugins, log collectors, security agents, etc.) whose container image shares layers with an attacker-controlled image is a viable target.

How It Works

The attack chain has three stages: page-cache corruption, cross-container propagation, and privileged execution.

1. Page-Cache Corruption via AF_ALG Splice Race

The kernel's AF_ALG (crypto) subsystem exposes a socket-based interface for userspace cryptographic operations. The exploit abuses a race condition in how the kernel handles splice() from a file into an AF_ALG socket:

  1. Open the target binary read-only.
  2. Create an AF_ALG AEAD socket bound to authencesn(hmac(sha256),cbc(aes)).
  3. Send a small payload chunk through the AF_ALG socket with MSG_MORE, telling the kernel to expect more data.
  4. splice() the target file's contents from an fd → pipe → AF_ALG socket.
  5. Due to the CoW bug, the kernel writes the attacker's payload bytes into the target file's page-cache pages instead of properly isolating them.

The exploit repeats this for each 4-byte window until the entire target binary's cached pages are overwritten with a custom payload.

No write permission to the file is needed. The file on disk is unchanged — only the in-memory page cache is corrupted.

2. Cross-Container Propagation via Image Layer Sharing

Container runtimes use overlay filesystems. When two containers share the same image layer, the kernel serves their file reads from the same page-cache pages.

The attacker builds their PoC image FROM the same base image as the target privileged DaemonSet. Because both containers share the same overlay lower-dir, binaries in the shared layer map to identical page-cache pages.

When the unprivileged PoC container corrupts a binary's page cache, the corruption is immediately visible to the privileged container on the same node — with zero cross-container communication.

3. Privileged Execution by the Target DaemonSet

When the privileged DaemonSet next executes any corrupted binary (through its normal operation cycle), the kernel loads the corrupted page-cache pages. The attacker's payload runs with the DaemonSet's full privileges — potentially including:

  • Full root on the node
  • All capabilities
  • Access to host namespaces (network, PID, mount)

The payload in this PoC (payload/payload.c) simply mounts the host root filesystem and writes a marker file to /root/res as proof of node-level code execution.

Attack Flow Diagram

┌──────────────────────────┐     ┌──────────────────────────┐
│   PoC Container          │     │   Privileged DaemonSet   │
│   (unprivileged)         │     │   (e.g. kube-proxy,      │
│                          │     │    monitoring agent, etc.)│
│  1. Open target binary   │     │                          │
│     (read-only)          │     │                          │
│                          │     │                          │
│  2. AF_ALG splice race   │     │                          │
│     corrupts page cache  │     │                          │
│          │               │     │                          │
└──────────┼───────────────┘     └──────────────────────────┘
           │                                  │
           ▼                                  │
  ┌─────────────────────┐                     │
  │  Kernel Page Cache   │                     │
  │                      │◄────────────────────┘
  │  Shared-layer binary │     3. DaemonSet executes the
  │  (CORRUPTED)         │        corrupted binary
  │  contains attacker's │        → loads corrupted pages
  │  payload bytes       │        → payload runs with
  └─────────────────────┘           DaemonSet's privileges

Validated Cloud Environments

The PoC has been successfully validated on the following managed Kubernetes platforms:

Alibaba Cloud ACK

Property Value
Platform Alibaba Cloud Container Service for Kubernetes (ACK)
Kubernetes v1.35.2
Node Kernel 6.6.88-4.2.alnx4.x86_64
kube-proxy registry-cn-*.ack.aliyuncs.com/acs/kube-proxy:v1.35.2-aliyun.1
Base Image registry.k8s.io/kube-proxy:v1.35.2 (upstream)
Root Device /dev/vda3 (ext4)

ACK PoC Result

Amazon EKS

Property Value
Platform Amazon Elastic Kubernetes Service (EKS)
Kubernetes v1.35.4
Node Kernel 6.12.79-101.147.amzn2023.x86_64
kube-proxy ***.dkr.ecr.***.amazonaws.com.cn/eks/kube-proxy:v1.35.3-eksbuild.2
Base Image public.ecr.aws/eks-distro-build-tooling/eks-distro-minimal-base-iptables:2026-03-11-1773190710.2023
Root Device /dev/nvme0n1p1 (xfs)

EKS PoC Result

Google GKE

Property Value
Platform Google Kubernetes Engine (GKE)
Kubernetes v1.35.3-gke.1234000
Node OS Container-Optimized OS (COS) 125, BUILD_ID 19216.220.72
Node Kernel 6.12.68+ x86_64
kube-proxy us-central1-artifactregistry.gcr.io/gke-release/gke-release/kube-proxy:v1.35.3-gke.1234000
Base Image Same as kube-proxy (GKE provider-managed Artifact Registry image)
Root Device /dev/dm-0 (ext2, read-only); /dev/sda1 (ext4, writable stateful partition)
Marker Path /mnt/stateful_partition/copyfail-res

GKE PoC Result

In all three cases, an unprivileged PoC pod successfully wrote the [*] success marker file to the host filesystem — proving node-level code execution through the privileged kube-proxy DaemonSet.

For the complete walkthroughs (image layer analysis, build steps, deployment):

kube-proxy as a Concrete Example

This PoC uses kube-proxy as the target because it is one of the most common privileged DaemonSets in Kubernetes clusters. Three variants are provided:

  • Default (ACK / upstream): built FROM registry.k8s.io/kube-proxy:v1.35.2 (see Dockerfile)
  • EKS: built FROM public.ecr.aws/eks-distro-build-tooling/eks-distro-minimal-base-iptables:2026-03-11-1773190710.2023 (see Dockerfile.eks)
  • GKE: built FROM us-central1-artifactregistry.gcr.io/gke-release/gke-release/kube-proxy:v1.35.3-gke.1234000 (see Dockerfile.gke)

All variants corrupt binaries like /usr/sbin/ipset, /usr/sbin/nft, /usr/sbin/xtables-legacy-multi, and /usr/sbin/xtables-nft-multi.

Important caveats:

  • kube-proxy only invokes ipset when configured in ipvs mode. The default mode (iptables) does not use ipset. See kubernetes/enhancements#5495 for the ipvs deprecation plan.
  • Some managed Kubernetes distributions (e.g. certain cloud providers) run kube-proxy as a non-privileged container, which limits the impact of the escape.
  • The PoC targets multiple binaries (ipset, nft, xtables-legacy-multi, xtables-nft-multi) to cover different proxy modes, but whether they get invoked depends on cluster configuration.

If kube-proxy is not privileged in your cluster, the attack principle still holds — you just need to identify a different privileged DaemonSet that shares image layers with a base image you can build from.

Generalizing to Other Targets

To adapt this PoC to a different privileged DaemonSet:

  1. Identify a privileged DaemonSet running on the cluster (monitoring agents, CNI plugins, log collectors, etc.).
  2. Build your PoC image FROM the same base image used by that DaemonSet.
  3. Identify binaries in the shared layer that the DaemonSet will execute during its normal operation.
  4. Corrupt those binaries' page cache using the exploit.

Repository Structure

.
├── cmd/copyfail/main.go          # Entry point; embeds compiled payload
├── internal/
│   ├── exploit/
│   │   ├── exploit.go            # Core exploit: AF_ALG splice race loop
│   │   └── patch.go              # Splits payload into 4-byte patch windows
│   └── alg/
│       └── alg.go                # AF_ALG AEAD socket abstraction
├── payload/
│   ├── payload.c                 # ACK/upstream payload (mount /dev/vda3 ext4)
│   ├── payload-eks.c             # EKS payload (NVMe/Xen device auto-detection)
│   ├── payload-gke.c             # GKE payload (COS/Ubuntu device auto-detection)
│   └── nolibc/                   # Kernel's tiny libc for static, no-dependency payloads
├── deploy/
│   ├── poc.yaml                  # Kubernetes Deployment manifest (ACK/upstream)
│   ├── poc-eks.yaml              # EKS Deployment manifest
│   └── poc-gke.yaml              # GKE Deployment manifest
├── Dockerfile                    # ACK/upstream: FROM registry.k8s.io/kube-proxy
├── Dockerfile.eks                # EKS: FROM eks-distro-minimal-base-iptables
├── Dockerfile.gke                # GKE: FROM gke-release/kube-proxy
├── Makefile                      # Build orchestration (includes *-eks and *-gke targets)
└── docs/
    ├── eks-poc.md                # EKS PoC full walkthrough
    ├── gke-poc.md                # GKE PoC full walkthrough
    ├── ack-poc-res.png           # ACK validation screenshot
    ├── eks-poc-res.png           # EKS validation screenshot
    └── gke-poc-res.png           # GKE validation screenshot

Prerequisites

  • Go 1.25+
  • A cross-compiler for the nolibc payload (default: x86_64-linux-gnu-gcc)
  • Docker / Buildx
  • A Kubernetes cluster with a privileged DaemonSet that shares image layers with the PoC image (the default example targets kube-proxy)
  • imagePullPolicy: IfNotPresent on the target DaemonSet (the Kubernetes default)
  • Linux kernel before the CVE-2026-31431 fix

Building

ACK / Upstream Kubernetes

# Build payload + Go binary
make build

# Build Docker image
make docker-build

# Build and push to GHCR
make docker-push IMAGE=ghcr.io/<you>/copy-fail-poc TAG=latest

Amazon EKS

# Build EKS payload + Go binary + Docker image
make docker-build-eks

# Build and push to GHCR
make docker-push-eks IMAGE=ghcr.io/<you>/copy-fail-poc

For arm64 targets (Graviton):

make build-eks CC=aarch64-linux-gnu-gcc GOARCH=arm64

Google GKE

# Build GKE payload + Go binary + Docker image
make docker-build-gke

# Build and push to GHCR
make docker-push-gke IMAGE=ghcr.io/<you>/copy-fail-poc

For arm64 nodes:

make docker-build-gke CC=aarch64-linux-gnu-gcc GOARCH=arm64 PLATFORM=linux/arm64

Usage

Deploy the PoC

# ACK / upstream Kubernetes
kubectl apply -f deploy/poc.yaml

# Amazon EKS
kubectl apply -f deploy/poc-eks.yaml

# Google GKE
kubectl apply -f deploy/poc-gke.yaml

The Deployment creates a single unprivileged pod. It:

  1. Runs /bin/copyfail to corrupt the page cache of target binaries in the shared image layer.
  2. Sleeps indefinitely so the pod stays running for observation.

Verify the Escape

After the target privileged DaemonSet next executes a corrupted binary (for kube-proxy, this typically happens within seconds due to its reconciliation loop), check the node:

# SSH into the node, or use a privileged debug pod

# ACK / EKS (writable root filesystem)
cat /root/res
# Expected output: [*] success

# GKE COS nodes (read-only root, writable stateful partition)
cat /mnt/stateful_partition/copyfail-res
# Expected output: [*] success

The presence of the marker file on the host filesystem proves that attacker-supplied code executed with node-level privileges — from inside the privileged DaemonSet's container context.

Clean Up

kubectl delete -f deploy/poc.yaml      # or poc-eks.yaml / poc-gke.yaml

# On the affected node(s), remove the marker and restart the target DaemonSet:
rm -f /root/res                                     # ACK / EKS
rm -f /copyfail-res /mnt/stateful_partition/copyfail-res  # GKE COS nodes
# For kube-proxy: delete the pod to force image layer re-read
kubectl delete pod -n kube-system -l k8s-app=kube-proxy --field-selector spec.nodeName=<node>

Customizing the Payload

The default payload (payload/payload.c) is a validation-only program that writes a marker file. To build a custom payload:

  1. Edit payload/payload.c. The program is built against nolibc (the kernel's minimal C library) for a static, dependency-free binary.
  2. Run make payload to cross-compile.
  3. The compiled payload is embedded into the Go binary via //go:embed.

Affected Versions

  • Linux kernel: All versions before the CVE-2026-31431 patch.
  • Kubernetes: Any version using an unpatched node kernel. The vulnerability is in the kernel, not in Kubernetes itself. Kubernetes merely provides the execution context (shared image layers + privileged DaemonSets) that elevates the impact from local page-cache corruption to full container escape.

Mitigation

  • Patch the kernel. This is the definitive fix.
  • Enable image layer isolation. Some runtimes support per-container filesystem snapshots that prevent page-cache sharing.
  • Minimize privileged DaemonSets. Reduce the number of workloads running with elevated privileges; use the principle of least privilege.
  • Drop unnecessary capabilities from DaemonSets that don't strictly require privileged: true.
  • Restrict pod scheduling to prevent untrusted workloads from landing on nodes running privileged DaemonSets with shared base images.
  • Use distinct base images for privileged workloads to reduce the chance of layer sharing with untrusted containers.

Mitigation Examples

  • vArmor built-in mitigation rule: copy-fail-mitigation blocks the exploit vector by preventing containers from creating AF_ALG sockets. The rule is available through the AppArmor and BPF enforcers.
  • Kubernetes eBPF mitigation: iwanhae/copyfail-ebpf-k8s provides an eBPF-based Kubernetes mitigation example for CVE-2026-31431.

Credits

  • CVE-2026-31431 discovery and disclosure: Theori / Xint
  • Cross-platform C payload: Tony Gies (LGPL-2.1-or-later OR MIT)
  • nolibc: Linux kernel selftests (tools/include/nolibc/)

License

The Go exploit code in this repository is provided as-is for research purposes.

The payload (payload/payload.c) is derived from copy-fail-c and is dual-licensed under LGPL-2.1-or-later OR MIT. See LICENSE-LGPL and LICENSE-MIT.

About

PoC: fully unprivileged container escape to node-level code execution on Kubernetes via CVE-2026-31431 page-cache corruption + shared image layers. Validated on Alibaba Cloud ACK, Amazon EKS and Google GKE.

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE-LGPL
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

 
 
 

Contributors