Skip to content

Commit 107ba1a

Browse files
ickleAndi Shyti
authored andcommitted
drm/i915/gt: Restrict forced preemption to the active context
When we submit a new pair of contexts to ELSP for execution, we start a timer by which point we expect the HW to have switched execution to the pending contexts. If the promotion to the new pair of contexts has not occurred, we declare the executing context to have hung and force the preemption to take place by resetting the engine and resubmitting the new contexts. This can lead to an unfair situation where almost all of the preemption timeout is consumed by the first context which just switches into the second context immediately prior to the timer firing and triggering the preemption reset (assuming that the timer interrupts before we process the CS events for the context switch). The second context hasn't yet had a chance to yield to the incoming ELSP (and send the ACk for the promotion) and so ends up being blamed for the reset. If we see that a context switch has occurred since setting the preemption timeout, but have not yet received the ACK for the ELSP promotion, rearm the preemption timer and check again. This is especially significant if the first context was not schedulable and so we used the shortest timer possible, greatly increasing the chance of accidentally blaming the second innocent context. Fixes: 3a7a92a ("drm/i915/execlists: Force preemption") Fixes: d12acee ("drm/i915/execlists: Cancel banned contexts on schedule-out") Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Tested-by: Andrzej Hajda <andrzej.hajda@intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220921135258.1714873-1-andrzej.hajda@intel.com
1 parent d09aa85 commit 107ba1a

File tree

2 files changed

+35
-1
lines changed

2 files changed

+35
-1
lines changed

drivers/gpu/drm/i915/gt/intel_engine_types.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,21 @@ struct intel_engine_execlists {
165165
*/
166166
struct timer_list preempt;
167167

168+
/**
169+
* @preempt_target: active request at the time of the preemption request
170+
*
171+
* We force a preemption to occur if the pending contexts have not
172+
* been promoted to active upon receipt of the CS ack event within
173+
* the timeout. This timeout maybe chosen based on the target,
174+
* using a very short timeout if the context is no longer schedulable.
175+
* That short timeout may not be applicable to other contexts, so
176+
* if a context switch should happen within before the preemption
177+
* timeout, we may shoot early at an innocent context. To prevent this,
178+
* we record which context was active at the time of the preemption
179+
* request and only reset that context upon the timeout.
180+
*/
181+
const struct i915_request *preempt_target;
182+
168183
/**
169184
* @ccid: identifier for contexts submitted to this engine
170185
*/

drivers/gpu/drm/i915/gt/intel_execlists_submission.c

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1241,6 +1241,9 @@ static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
12411241
if (!rq)
12421242
return 0;
12431243

1244+
/* Only allow ourselves to force reset the currently active context */
1245+
engine->execlists.preempt_target = rq;
1246+
12441247
/* Force a fast reset for terminated contexts (ignoring sysfs!) */
12451248
if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
12461249
return INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS;
@@ -2427,8 +2430,24 @@ static void execlists_submission_tasklet(struct tasklet_struct *t)
24272430
GEM_BUG_ON(inactive - post > ARRAY_SIZE(post));
24282431

24292432
if (unlikely(preempt_timeout(engine))) {
2433+
const struct i915_request *rq = *engine->execlists.active;
2434+
2435+
/*
2436+
* If after the preempt-timeout expired, we are still on the
2437+
* same active request/context as before we initiated the
2438+
* preemption, reset the engine.
2439+
*
2440+
* However, if we have processed a CS event to switch contexts,
2441+
* but not yet processed the CS event for the pending
2442+
* preemption, reset the timer allowing the new context to
2443+
* gracefully exit.
2444+
*/
24302445
cancel_timer(&engine->execlists.preempt);
2431-
engine->execlists.error_interrupt |= ERROR_PREEMPT;
2446+
if (rq == engine->execlists.preempt_target)
2447+
engine->execlists.error_interrupt |= ERROR_PREEMPT;
2448+
else
2449+
set_timer_ms(&engine->execlists.preempt,
2450+
active_preempt_timeout(engine, rq));
24322451
}
24332452

24342453
if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) {

0 commit comments

Comments
 (0)