Skip to content

Commit

Permalink
sched: Have do_idle() call __schedule() without enabling preemption
Browse files Browse the repository at this point in the history
I finally got around to creating trampolines for dynamically allocated
ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace
function hook callbacks, like perf, that allocate the ftrace_ops
descriptor via kmalloc() and friends, ftrace was not able to optimize
the functions being traced to use a trampoline because they would also
need to be allocated dynamically. The problem is that they cannot be
freed when CONFIG_PREEMPT is set, as there's no way to tell if a task
was preempted on the trampoline. That was before Paul McKenney
implemented synchronize_rcu_tasks() that would make sure all tasks
(except idle) have scheduled out or have entered user space.

While testing this, I triggered this bug:

 BUG: unable to handle kernel paging request at ffffffffa0230077
 IP: 0xffffffffa0230077
 PGD 2414067
 PUD 2415063
 PMD c463c067
 PTE 0

 Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
 Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core x86_pkg_temp_thermal kvm_intel i915 snd_seq kvm snd_seq_device snd_pcm i2c_algo_bit snd_timer drm_kms_helper irqbypass syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_i801 snd soundcore wmi i2c_core video e1000e ptp pps_core
 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.11.0-rc3-test+ torvalds#356
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff8800cebbd0c0 task.stack: ffff8800cebd8000
 RIP: 0010:0xffffffffa0230077
 RSP: 0018:ffff8800cebdfd80 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff8800cebbd0c0 RCX: ffffffff8121391e
 RDX: dffffc0000000000 RSI: ffffffff81ce7c78 RDI: ffff8800cebbf298
 RBP: ffff8800cebdfe28 R08: 0000000000000003 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800cebbd0c0
 R13: ffffffff828fbe90 R14: ffff8800d392c480 R15: ffffffff826cc380
 FS:  0000000000000000(0000) GS:ffff8800d3900000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa0230077 CR3: 0000000002413000 CR4: 00000000001406e0
 Call Trace:
  ? debug_smp_processor_id+0x17/0x20
  ? sched_ttwu_pending+0x79/0x190
  ? schedule+0x5/0xe0
  ? trace_hardirqs_on_caller+0x182/0x280
  schedule+0x5/0xe0
  schedule_preempt_disabled+0x18/0x30
  ? schedule+0x5/0xe0
  ? schedule_preempt_disabled+0x18/0x30
  do_idle+0x172/0x220

What happened was that the idle task was preempted on the trampoline.
As synchronize_rcu_tasks() ignores the idle thread, there's nothing
that lets ftrace know that the idle task was preempted on a trampoline.

The idle task shouldn't need to ever enable preemption. The idle task
is simply a loop that calls schedule or places the cpu into idle mode.
In fact, having preemption enabled is inefficient, because it can
happen when idle is just about to call schedule anyway, which would
cause schedule to be called twice. Once for when the interrupt came in
and was returning back to normal context, and then again in the normal
path that the idle loop is running in, which would be pointless, as it
had already scheduled.

Adding a new function local to kernel/sched/ that allows idle to call
the scheduler without enabling preemption, fixes the
synchronize_rcu_tasks) issue, as well as removes the pointless spurious
scheduled caused by interrupts happening in the brief window where
preemption is enabled just before it calls schedule.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
  • Loading branch information
rostedt authored and 0day robot committed Apr 12, 2017
1 parent 717a94b commit 0dc2ae8
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 1 deletion.
7 changes: 7 additions & 0 deletions kernel/sched/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -3502,6 +3502,13 @@ asmlinkage __visible void __sched schedule(void)
}
EXPORT_SYMBOL(schedule);

void __sched schedule_idle(void)
{
do {
__schedule(false);
} while (need_resched());
}

#ifdef CONFIG_CONTEXT_TRACKING
asmlinkage __visible void __sched schedule_user(void)
{
Expand Down
2 changes: 1 addition & 1 deletion kernel/sched/idle.c
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ static void do_idle(void)
smp_mb__after_atomic();

sched_ttwu_pending();
schedule_preempt_disabled();
schedule_idle();
}

bool cpu_in_idle(unsigned long pc)
Expand Down
2 changes: 2 additions & 0 deletions kernel/sched/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -1467,6 +1467,8 @@ static inline struct cpuidle_state *idle_get_state(struct rq *rq)
}
#endif

extern void schedule_idle(void);

extern void sysrq_sched_debug_show(void);
extern void sched_init_granularity(void);
extern void update_max_interval(void);
Expand Down

0 comments on commit 0dc2ae8

Please sign in to comment.