Commit 2f2cc53
drm/i915/guc: Close deregister-context race against CT-loss
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).
Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.
In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.
Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.
We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).
In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:
intel_wakeref_wait_for_idle(>->wakeref)
(called by) - intel_gt_pm_wait_for_idle
(called by) - wait_for_suspend
Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Tested-by: Mousumi Jana <mousumi.jana@intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231229215143.581619-1-alan.previn.teres.alexis@intel.com1 parent 5e83c06 commit 2f2cc53
File tree
2 files changed
+78
-5
lines changed- drivers/gpu/drm/i915
- gem
- gt/uc
2 files changed
+78
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
31 | 38 | | |
32 | 39 | | |
33 | 40 | | |
| |||
160 | 167 | | |
161 | 168 | | |
162 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
163 | 173 | | |
164 | 174 | | |
165 | 175 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
236 | 236 | | |
237 | 237 | | |
238 | 238 | | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
239 | 246 | | |
240 | 247 | | |
241 | 248 | | |
| |||
613 | 620 | | |
614 | 621 | | |
615 | 622 | | |
| 623 | + | |
| 624 | + | |
616 | 625 | | |
617 | 626 | | |
618 | 627 | | |
| |||
623 | 632 | | |
624 | 633 | | |
625 | 634 | | |
626 | | - | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
627 | 640 | | |
628 | 641 | | |
629 | 642 | | |
| |||
3288 | 3301 | | |
3289 | 3302 | | |
3290 | 3303 | | |
3291 | | - | |
| 3304 | + | |
3292 | 3305 | | |
3293 | 3306 | | |
3294 | 3307 | | |
3295 | 3308 | | |
3296 | 3309 | | |
| 3310 | + | |
3297 | 3311 | | |
3298 | 3312 | | |
3299 | 3313 | | |
| |||
3304 | 3318 | | |
3305 | 3319 | | |
3306 | 3320 | | |
| 3321 | + | |
| 3322 | + | |
| 3323 | + | |
| 3324 | + | |
3307 | 3325 | | |
3308 | 3326 | | |
3309 | 3327 | | |
3310 | 3328 | | |
3311 | 3329 | | |
| 3330 | + | |
3312 | 3331 | | |
3313 | 3332 | | |
3314 | 3333 | | |
3315 | | - | |
| 3334 | + | |
3316 | 3335 | | |
3317 | 3336 | | |
3318 | | - | |
| 3337 | + | |
| 3338 | + | |
| 3339 | + | |
| 3340 | + | |
| 3341 | + | |
| 3342 | + | |
| 3343 | + | |
| 3344 | + | |
| 3345 | + | |
| 3346 | + | |
| 3347 | + | |
| 3348 | + | |
| 3349 | + | |
| 3350 | + | |
| 3351 | + | |
| 3352 | + | |
| 3353 | + | |
| 3354 | + | |
| 3355 | + | |
3319 | 3356 | | |
3320 | 3357 | | |
3321 | 3358 | | |
| |||
3383 | 3420 | | |
3384 | 3421 | | |
3385 | 3422 | | |
3386 | | - | |
| 3423 | + | |
| 3424 | + | |
| 3425 | + | |
| 3426 | + | |
| 3427 | + | |
| 3428 | + | |
| 3429 | + | |
| 3430 | + | |
| 3431 | + | |
| 3432 | + | |
| 3433 | + | |
| 3434 | + | |
| 3435 | + | |
| 3436 | + | |
| 3437 | + | |
| 3438 | + | |
3387 | 3439 | | |
3388 | 3440 | | |
3389 | 3441 | | |
| |||
3394 | 3446 | | |
3395 | 3447 | | |
3396 | 3448 | | |
| 3449 | + | |
| 3450 | + | |
| 3451 | + | |
| 3452 | + | |
| 3453 | + | |
| 3454 | + | |
| 3455 | + | |
| 3456 | + | |
| 3457 | + | |
| 3458 | + | |
| 3459 | + | |
3397 | 3460 | | |
3398 | 3461 | | |
3399 | 3462 | | |
| |||
0 commit comments