kernel 7.0.3 + nvidia-open 595.71.05 on RTX 3090: `__nv_drm_gem_nvkms_map` requests range exceeding PCI BAR1 → Xid 31 → Xid 154 (Node Reboot Required) under Chromium GPU workload

### NVIDIA Open GPU Kernel Modules Version

595.71.05 (Arch package nvidia-open-dkms 595.71.05-2)

### Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

- [ ] I confirm that this does not happen with the proprietary driver package.

### Operating System and Version

Arch Linux (rolling release)

### Kernel Release

Linux host 7.0.3-arch1-2 #1 SMP PREEMPT_DYNAMIC Fri, 01 May 2026 15:49:22 +0000 x86_64 GNU/Linux

### Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

- [x] I am running on a stable kernel release.

### Hardware: GPU

GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-<redacted>)

### Describe the bug

On a single-GPU RTX 3090 desktop running Linux 7.0.3 with
`nvidia-open-dkms` 595.71.05, the kernel logged a `resource sanity check`
warning naming `__nv_drm_gem_nvkms_map` as the caller of an mmap that
"spans more than" the device's BAR1 region. The same instant, the GPU
took an MMU fault on Copy Engine 2 (Xid 31) and the driver self-declared
the GPU unrecoverable (Xid 154, "Node Reboot Required") with
`uvm encountered global fatal error 0x60`. GSP RPC then timed out
(Xid 175). The display compositor's vblank stalled, the screen froze, and
neither `nvidia-smi` nor `systemctl reboot` could complete; recovery
required a hardware power-cycle. The trigger workload was a
Chromium-based browser (Brave) starting a new renderer process.

### To Reproduce

- Wayland compositor (Hyprland) running, ~2 hours uptime since boot
- Brave (Chromium-based browser) open with several tabs
- Brave subprocess started a new renderer/GPU process — call stack shows
 Chromium worker thread deep in kperfBoostSet_IMPL → rpcRmApiControl_GSP →
 _kgspRpcRecvPoll, consistent with a GPU-frequency-boost RPC during
 renderer spin-up
- No CUDA process active; no userspace had /dev/nvidia-uvm open
- System RAM healthy: 7.6 GiB / 61 GiB used, no swap pressure
- Single occurrence so far; not yet a deterministic reproducer
- See "Smoking-gun evidence" and "Fault sequence" in More Info below

### Bug Incidence

Once

### nvidia-bug-report.log.gz

[nvidia-bug-report.log.gz](https://github.com/user-attachments/files/27489540/nvidia-bug-report.log.gz)

### More Info

Note: I have not tested with the proprietary nvidia-dkms package, so I have
left the proprietary-driver-confirmation checkbox unchecked. The kernel's
own `resource sanity check` warning names `__nv_drm_gem_nvkms_map+0x99/0xf0
[nvidia_drm]` as the caller, which is specific to nvidia-open's DRM layer.
I am happy to test the proprietary driver if maintainers think it would
help isolate the regression.

<!DOCTYPE html><h2 cid="n200" mdtype="heading" class="md-end-block md-heading md-focus" style="box-sizing: border-box; white-space: pre-wrap; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.75em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.225; cursor: text; border-bottom: 1px solid rgb(238, 238, 238); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Smoking-gun evidence</h2><p cid="n201" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Single line, logged by the kernel core (not by NVRM) at t = 0:<pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="" cid="n202" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">resource: resource sanity check: requesting [mem 0x000000fccfdd0000-0x000000fcd00fffff], which spans more than 0000:01:00.0 [mem 0xfcc0000000-0xfccfffffff 64bit pref] caller __nv_drm_gem_nvkms_map+0x99/0xf0 [nvidia_drm] mapping multiple BARs</pre><p cid="n203" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">The requested range starts ~3 MiB before the end of BAR1
(<code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">0xfcc0000000-0xfccfffffff</code>) and runs ~33 MiB past it, into BAR3
(<code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">0xfcd0000000</code> + 32 MiB). The kernel's PCI resource validation rejects
the request, and the subsequent
<code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">[drm:__nv_drm_gem_nvkms_map] *ERROR* Failed to map NvKmsKapiMemory 0x00000000616506ff</code>
confirms the map failed.<p cid="n204" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Immediately preceding this, NVRM logged ~25 repetitions of:<pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="" cid="n205" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">NVRM: dmaAllocMapping_GM107: can't alloc VA space for mapping. NVRM: nvAssertOkFailedNoLog: ... [NV_ERR_NO_MEMORY] (0x00000051) ... @ mapping_reuse.c:273 ... @ kern_bus_gm107.c:3141 // ("pBar1VaInfo-&gt;reuseDb")</pre><p cid="n206" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">so BAR1 VA space was being repeatedly exhausted in the seconds leading up
to the bad-range request. That suggests the bad mapping is a fallback (or
an arithmetic mistake) on the BAR1-VA-exhausted path rather than a
random misuse of <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">pci_resource_*</code>.<h2 cid="n207" mdtype="heading" class="md-end-block md-heading" style="box-sizing: border-box; white-space: pre-wrap; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.75em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.225; cursor: text; border-bottom: 1px solid rgb(238, 238, 238); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Fault sequence</h2><p cid="n208" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">All times relative to t = 0 (the <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">resource sanity check</code> line above).
Full redacted log in <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">kernel-log-excerpt.txt</code>.<figure class="md-table-fig table-figure" cid="n209" mdtype="table" style="box-sizing: border-box; margin: 1.2em 0px; overflow-x: auto; max-width: calc(100% + 16px); padding: 0px; cursor: default; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">
Offset | Event
-- | --
t+0:00:00 | resource sanity check, __nv_drm_gem_nvkms_map ... mapping multiple BARs, Failed to map NvKmsKapiMemory.
t+0:00:00 | Xid 31 — MMU Fault: ENGINE CE2 HUBCLIENT_CE0 faulted @ 0x1_21000000, FAULT_PTE ACCESS_TYPE_VIRT_WRITE.
t+0:00:00 | nvGpuOpsReportFatalError: uvm encountered global fatal error 0x60, requiring os reboot to recover.
t+0:00:00 | Xid 154 — GPU recovery action changed from 0x0 (None) to 0x2 (Node Reboot Required).
t+0:00:00 | Brave GPU subprocess receives SIGILL (trap invalid opcode ... in brave[...]).
t+0:00:01 | [drm:nv_drm_atomic_apply_modeset_config] Failed to initialize semaphore for plane fence, nv_drm_atomic_commit Error code: -11.
t+0:01:15 | _kgspIsHeartbeatTimedOut: diff 75117 timeout 5200. GSP heartbeat lost.
t+0:01:45 | Memory Subsystem Error detected. kgmmuInvalidateTlb failed.
t+0:01:45 | Xid 175 — Timeout after 75s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL). Originating thread name ThreadPoolSingl (Chromium worker).
t+0:01:48 | Call trace dumped: _kgspRpcRecvPoll → _issueRpcAndWait → rpcRmApiControl_GSP → kperfBoostSet_IMPL → resControl_IMPL → ... → nvidia_unlocked_ioctl.
t+0:01:48 onward | RC watchdog: GPU is probably locked! Notify Timeout Seconds: 7 repeats every 30-60 s. Hundreds of NV_ERR_RESET_REQUIRED assertions firing as the fullchip-reset path itself fails its preconditions.
t+0:06:18 | Xid 16, Head 00000003 Count ..., RM has detected that 7 Seconds without a Vblank Counter Update on head:D0. Display visibly froze.
t+0:12:48 | Second Xid 16 / vblank-watchdog.

</figure><h2 cid="n252" mdtype="heading" class="md-end-block md-heading" style="box-sizing: border-box; white-space: pre-wrap; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.75em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.225; cursor: text; border-bottom: 1px solid rgb(238, 238, 238); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Recovery</h2><ul class="ul-list" cid="n253" mdtype="list" data-mark="-" style="box-sizing: border-box; margin: 0.8em 0px; padding-left: 30px; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li class="md-list-item" cid="n254" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">nvidia-smi</code> accepted the ioctl but never returned (hung indefinitely;
killed after ~5 min).</li><li class="md-list-item" cid="n256" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">The driver's own RC path tried <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">FULLCHIP_RESET</code> repeatedly; every
attempt failed with <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">NV_ERR_RESET_REQUIRED</code> precondition assertions —
the chip-reset path itself was wedged.</li><li class="md-list-item" cid="n258" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">systemctl reboot</code> was invoked from an SSH session and hung at
<code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">nvidia_drm</code> module teardown for &gt;5 minutes without progress.</li><li class="md-list-item" cid="n260" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">Recovery required holding the hardware power button.</li></ul><p cid="n262" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">The system was otherwise functional throughout: SSH stayed up, the
Wayland compositor's main thread was alive in <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">do_epoll_wait</code>, no
processes were in D-state. The wedge is entirely below <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">nvidia_drm</code>.<h2 cid="n263" mdtype="heading" class="md-end-block md-heading" style="box-sizing: border-box; white-space: pre-wrap; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.75em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.225; cursor: text; border-bottom: 1px solid rgb(238, 238, 238); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">What I have ruled out</h2><ul class="ul-list" cid="n264" mdtype="list" data-mark="-" style="box-sizing: border-box; margin: 0.8em 0px; padding-left: 30px; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li class="md-list-item" cid="n265" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">Hardware fault on the GPU. This 3090 had been stable for many
months on the previous <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">linux 6.19.11</code> + <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">nvidia-open 595.58.03</code> stack
with the same workload. After the hardware power-cycle, the system
came up cleanly on the same 7.0.3 + 595.71.05 stack and has so far
been stable.</li><li class="md-list-item" cid="n267" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">Host OOM. 7.6 GiB / 61 GiB host RAM in use at fault time. No swap
pressure. No <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">oom_reaper</code> activity in the journal. The OOM was
GPU-VA, not host RAM.</li><li class="md-list-item" cid="n269" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">Userspace-only fault. The kernel core's
<code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">resource sanity check</code> was emitted from inside <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">nvidia_drm</code>'s
<code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">__nv_drm_gem_nvkms_map</code>. The subsequent <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">Xid 31</code> MMU fault is a
consequence of the bad mapping being used. The Brave <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">SIGILL</code> came
after the kernel error and looks like a downstream consequence of
the GPU buffer the renderer expected being inaccessible.</li><li class="md-list-item" cid="n271" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">DKMS build mismatch / firmware mismatch. DKMS built <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">nvidia-open</code>
595.71.05 cleanly for both kernels at upgrade time; modules load
cleanly; firmware version matches the driver expectations
(<code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">linux-firmware-nvidia 20260410-1</code>).</li></ul><p cid="n273" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">I cannot rule out — and want to be careful not to overclaim — which
component regressed. The kernel and the driver were both upgraded in the
same transaction, so this could be a bug in nvidia-open's PCI
BAR-range arithmetic, a kernel-side change to the resource validation
that nvidia-open is the first to trip, or a problem in the combination
(e.g. a new <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">pci_resource_*</code> semantic on 7.0.x that nvidia-open hasn't
adopted yet). I have not yet had the opportunity to bisect.<h2 cid="n274" mdtype="heading" class="md-end-block md-heading" style="box-sizing: border-box; white-space: pre-wrap; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.75em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.225; cursor: text; border-bottom: 1px solid rgb(238, 238, 238); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Open questions</h2><p cid="n275" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">If you (or anyone reading) have seen this signature before, I'd value
pointers on any of:<ol class="ol-list" start="" cid="n276" mdtype="list" style="box-sizing: border-box; margin: 0.8em 0px; padding-left: 30px; position: relative; color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li class="md-list-item" cid="n277" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">Does this reproduce on <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">nvidia-dkms</code> (proprietary kernel module) at
595.71.05, holding kernel 7.0.3 fixed?</li><li class="md-list-item" cid="n279" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">Does this reproduce on kernel 6.19.11 with <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">nvidia-open</code> 595.71.05?</li><li class="md-list-item" cid="n281" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">Does disabling Chromium-side GPU acceleration (e.g.
<code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">--disable-gpu-rasterization</code>, <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">--disable-gpu</code>) prevent it on
7.0.3 + 595.71.05?</li><li class="md-list-item md-focus-container" cid="n283" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;">Does the <code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">resource sanity check</code> line precede every freeze of this
form, or are there freezes without it? (I have only this one
occurrence.)</li></ol>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel 7.0.3 + nvidia-open 595.71.05 on RTX 3090: `__nv_drm_gem_nvkms_map` requests range exceeding PCI BAR1 → Xid 31 → Xid 154 (Node Reboot Required) under Chromium GPU workload #1134

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Kernel Release

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

Describe the bug

To Reproduce

Bug Incidence

nvidia-bug-report.log.gz

More Info

Smoking-gun evidence

Fault sequence

Recovery

What I have ruled out

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Offset	Event
t+0:00:00	resource sanity check, __nv_drm_gem_nvkms_map ... mapping multiple BARs, Failed to map NvKmsKapiMemory.
t+0:00:00	Xid 31 — MMU Fault: ENGINE CE2 HUBCLIENT_CE0 faulted @ 0x1_21000000, FAULT_PTE ACCESS_TYPE_VIRT_WRITE.
t+0:00:00	nvGpuOpsReportFatalError: uvm encountered global fatal error 0x60, requiring os reboot to recover.
t+0:00:00	Xid 154 — GPU recovery action changed from 0x0 (None) to 0x2 (Node Reboot Required).
t+0:00:00	Brave GPU subprocess receives SIGILL (trap invalid opcode ... in brave[...]).
t+0:00:01	[drm:nv_drm_atomic_apply_modeset_config] Failed to initialize semaphore for plane fence, nv_drm_atomic_commit Error code: -11.
t+0:01:15	_kgspIsHeartbeatTimedOut: diff 75117 timeout 5200. GSP heartbeat lost.
t+0:01:45	Memory Subsystem Error detected. kgmmuInvalidateTlb failed.
t+0:01:45	Xid 175 — Timeout after 75s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL). Originating thread name ThreadPoolSingl (Chromium worker).
t+0:01:48	Call trace dumped: _kgspRpcRecvPoll → _issueRpcAndWait → rpcRmApiControl_GSP → kperfBoostSet_IMPL → resControl_IMPL → ... → nvidia_unlocked_ioctl.
t+0:01:48 onward	RC watchdog: GPU is probably locked! Notify Timeout Seconds: 7 repeats every 30-60 s. Hundreds of NV_ERR_RESET_REQUIRED assertions firing as the fullchip-reset path itself fails its preconditions.
t+0:06:18	Xid 16, Head 00000003 Count ..., RM has detected that 7 Seconds without a Vblank Counter Update on head:D0. Display visibly froze.
t+0:12:48	Second Xid 16 / vblank-watchdog.

kernel 7.0.3 + nvidia-open 595.71.05 on RTX 3090: __nv_drm_gem_nvkms_map requests range exceeding PCI BAR1 → Xid 31 → Xid 154 (Node Reboot Required) under Chromium GPU workload #1134

Description

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Kernel Release

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

Describe the bug

To Reproduce

Bug Incidence

nvidia-bug-report.log.gz

More Info

Smoking-gun evidence

Fault sequence

Recovery

What I have ruled out

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

kernel 7.0.3 + nvidia-open 595.71.05 on RTX 3090: `__nv_drm_gem_nvkms_map` requests range exceeding PCI BAR1 → Xid 31 → Xid 154 (Node Reboot Required) under Chromium GPU workload #1134