Skip to content

Commit

Permalink
sched/rt:fix the missing of rt_rq runtime check in rt-period timer
Browse files Browse the repository at this point in the history
The rq->rd->span of a cpu in a system with isolated cpus splited into two
different parts: one is for isolated cpus, another for non-isolated cpus.

When CONFIG_RT_GROUP_SCHED enabled, the handler of sched_rt_period_timer
updates rt_time and rt_runtime for every cpus in rq(this_cpu)->rd->span.

It means that other parts cpus out of this_cpu's rd->span will be missed
by sched_rt_period_timer handler, when CONFIG_RT_GROUP_SCHED enabled and
isolated cpus presents in system.

E.g problem will be triggered as follows on my 8 cores machine:
1 enable  CONFIG_RT_GROUP_SCHED=y, and boot kernel with command-line
  "isolcpus=4-7"
2 create a child group and init it:
  mount -t cgroup -o cpu cpu /sys/fs/cgruop
  mkdir /sys/fs/cgroup/child0
  echo 950000 > /sys/fs/cgroup/child0/cpu.rt_runtime_us
3 run two rt-loop tasks, assume their pids are $pid1 and $pid2
4 affinity a rt task to the isolated cpu-sets
  taskset -p 0xf0 $pid2
5 add tasks created above into child cpu-group
  echo $pid1 > /sys/fs/cgroup/child0/tasks
  echo $pid2 > /sys/fs/cgroup/child0/tasks
6 check wat happened:
  "top": one of the task will fail to has cpu usage, but its stat is "R"
  "kill": the task on the problem rt_rq can't be killed

This patch will fix this problem.

Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
  • Loading branch information
Hailong Liu authored and intel-lab-lkp committed Dec 5, 2020
1 parent 21bf7cb commit 65b7864
Showing 1 changed file with 3 additions and 12 deletions.
15 changes: 3 additions & 12 deletions kernel/sched/rt.c
Expand Up @@ -856,19 +856,10 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
int i, idle = 1, throttled = 0;
const struct cpumask *span;

span = sched_rt_period_mask();
#ifdef CONFIG_RT_GROUP_SCHED
/*
* FIXME: isolated CPUs should really leave the root task group,
* whether they are isolcpus or were isolated via cpusets, lest
* the timer run on a CPU which does not service all runqueues,
* potentially leaving other CPUs indefinitely throttled. If
* isolation is really required, the user will turn the throttle
* off to kill the perturbations it causes anyway. Meanwhile,
* this maintains functionality for boot and/or troubleshooting.
*/
if (rt_b == &root_task_group.rt_bandwidth)
span = cpu_online_mask;
span = cpu_online_mask;
#else
span = sched_rt_period_mask();
#endif
for_each_cpu(i, span) {
int enqueue = 0;
Expand Down

0 comments on commit 65b7864

Please sign in to comment.