Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Kernel Error/Crash on virtualization.framework (ovl_permission) #7005

Open
eytanhanig opened this issue Oct 2, 2023 · 0 comments
Open

Comments

@eytanhanig
Copy link

eytanhanig commented Oct 2, 2023

Description

I am getting kernel crashes when running many containers, with the "Something went wrong" popup stating:

Fatal error reported: Linux kernel v6.3.13 crash on virtualization.framework

[  400.226213] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028
[  400.226274] Mem abort info:
[  400.226291]   ESR = 0x0000000096000004
[  400.226382]   EC = 0x25: DABT (current EL), IL = 32 bits
[  400.280607]   SET = 0, FnV = 0
[  400.280662]   EA = 0, S1PTW = 0
[  400.280702]   FSC = 0x04: level 0 translation fault
[  400.280748] Data abort info:
[  400.280791]   ISV = 0, ISS = 0x00000004
[  400.280833]   CM = 0, WnR = 0
[  400.303879] user pgtable: 4k pages, 48-bit VAs, pgdp=000000011e87d000
[  400.303971] [0000000000000028] pgd=0000000000000000, p4d=0000000000000000
[  400.304057] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  400.304154] Modules linked in: xfrm_user xfrm_algo nfsd auth_rpcgss nfs lockd grace sunrpc fakeowner(O) shiftfs(O) grpcfuse(O) vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common vsock
[  400.304387] CPU: 2 PID: 14717 Comm: gunicorn Tainted: G           O       6.3.13-linuxkit #1
[  400.304506] pstate: 61401005 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
[  400.304592] pc : do_ovl_get_acl+0x50/0x148
[  400.304674] lr : do_ovl_get_acl+0x48/0x148
[  400.304744] sp : ffff80000e3637d0
[  400.304825] x29: ffff80000e3637d0 x28: ffff7ae2d29c9240 x27: ffff7ae2e2795060
[  400.304912] x26: d0d0d0d0d0d0d0d0 x25: 00000000000041ed x24: 00000000000041ed
[  400.305008] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000
[  400.305089] x20: ffff7ae3f314c630 x19: 0000000000008000 x18: 0000000000000000
[  400.305202] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  400.305264] x14: 0000000000000000 x13: 000000000000017f x12: 0000ffffaa4041a8
[  400.305334] x11: 0000000000000000 x10: 0000ffffa60390e0 x9 : ffffa8ed5be7b55c
[  400.305412] x8 : ffff80000e363b68 x7 : 0000000000000000 x6 : 2f73657275747566
[  400.305500] x5 : ffff7ae2e2795058 x4 : 0000000000000001 x3 : 0000000000000001
[  400.305588] x2 : 0000000000008000 x1 : 0000000000000000 x0 : 0000000000000000
[  400.305684] Call trace:
[  400.305702]  do_ovl_get_acl+0x50/0x148
[  400.305733]  ovl_get_inode_acl+0x3c/0x50
[  400.305791]  get_cached_acl_rcu+0x54/0x74
[  400.305823]  generic_permission+0x110/0x23c
[  400.305871]  ovl_permission+0x94/0x12c
[  400.305922]  inode_permission+0x5c/0x168
[  400.305993]  link_path_walk+0x230/0x36c
[  400.306038]  path_lookupat+0x60/0x128
[  400.306053]  filename_lookup+0xa4/0x11c
[  400.306097]  vfs_statx+0x80/0x158
[  400.306137]  vfs_fstatat+0x64/0x88
[  400.306195]  __do_sys_newfstatat+0x5c/0xa4
[  400.306243]  __arm64_sys_newfstatat+0x2c/0x3c
[  400.306299]  invoke_syscall.constprop.0+0x88/0xd8
[  400.306348]  do_el0_svc+0x110/0x128
[  400.306392]  el0_svc+0x9c/0xcc
[  400.306489]  el0t_64_sync_handler+0xac/0x13c
[  400.306546]  el0t_64_sync+0x190/0x194
[  400.306576] Code: aa1403e0 97fff480 aa0003f5 a904ffff (f9401400) 
[  400.306641] ---[ end trace 0000000000000000 ]---

Based on the error message I believe that this is being caused by a known issue that was patched in more recent versions of the Linux kernel.

Based on this, this, and looking back through tag history suggests it was patched for kernel versions v6.5-rc1 and/or v6.6-rc3. It also appears to have been backported to version 5.15.121.

Reproduce

Start up a lot of containers. This specifically seems to happen when our proprietary "dev-in-docker" development container is run. Most of my tests were done on 4.20.0 since I'm blocked from upgrading until docker/compose#10797 is released.

At the end of the day this is a known kernel issue, and can be solved by upgrading the kernel used by Docker for Mac.

Expected behavior

The expected behavior is Docker not crashing with a fatal error.

docker version

Client:
 Cloud integration: v1.0.35+desktop.5
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:28:49 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.24.0 (122432)
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:31:36 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    24.0.6
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2-desktop.5
    Path:     /Users/eytan.hanig/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.22.0-desktop.2
    Path:     /Users/eytan.hanig/.docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/eytan.hanig/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.20
    Path:     /Users/eytan.hanig/.docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.8
    Path:     /Users/eytan.hanig/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/eytan.hanig/.docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /Users/eytan.hanig/.docker/cli-plugins/docker-scan
  scout: Docker Scout (Docker Inc.)
    Version:  v1.0.7
    Path:     /Users/eytan.hanig/.docker/cli-plugins/docker-scout

Server:
 Containers: 6
  Running: 2
  Paused: 0
  Stopped: 4
 Images: 36
 Server Version: 24.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
 runc version: v1.1.8-0-g82f18fe
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.4.16-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 11
 Total Memory: 7.765GiB
 Name: docker-desktop
 ID: 4840d711-39a9-4fd1-a57b-fc0ee42f9554
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

Diagnostics ID

203274A4-F32D-43DB-A5F1-25D150401DF8/20230929191501

Additional Info

No response

@eytanhanig eytanhanig changed the title Docker Desktop Kernel Crash on virtualization.framework Oct 2, 2023
@eytanhanig eytanhanig changed the title Kernel Crash on virtualization.framework Fatal Kernel Error/Crash on virtualization.framework (ovl_permission) Oct 2, 2023
@bsousaa bsousaa added area/kernel Linux kernel bug status/triage and removed needs-triage labels Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants