Mike-Galbraith…
Commits on Jul 18, 2021
-
v2 mm/slub: restore/expand unfreeze_partials() local exclusion scope
On Thu, 2021-07-15 at 18:34 +0200, Mike Galbraith wrote: > Greetings crickets, > > Methinks he problem is the hole these patches opened only for RT. > > static void put_cpu_partial(struct kmem_cache *s, struct page *page, > int drain) > { > #ifdef CONFIG_SLUB_CPU_PARTIAL > struct page *oldpage; > int pages; > int pobjects; > > slub_get_cpu_ptr(s->cpu_slab); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bah, I'm tired of waiting to see what if anything mm folks do about this little bugger, so I'm gonna step on it my damn self and be done with it. Fly or die little patchlet. mm/slub: restore/expand unfreeze_partials() local exclusion scope 2180da7 ("mm, slub: use migrate_disable() on PREEMPT_RT") replaced preempt_disable() in put_cpu_partial() with migrate_disable(), which when combined with ___slab_alloc() having become preemptibile, leads to kmem_cache_free()/kfree() blowing through ___slab_alloc() unimpeded, and vice versa, resulting in PREMPT_RT exclusive explosions in both paths while stress testing with both SLUB_CPU_PARTIAL/MEMCG enabled, ___slab_alloc() during allocation (duh), and __unfreeze_partials() during free, both while accessing an unmapped page->freelist. Serialize put_cpu_partial()/unfreeze_partials() on cpu_slab->lock to ensure that alloc/free paths cannot pluck cpu_slab->partial out from underneath each other unconstrained. Signed-off-by: Mike Galbraith <efault@gmx.de> Fixes: 2180da7 ("mm, slub: use migrate_disable() on PREEMPT_RT")
Commits on Jul 13, 2021
-
entry: Fix the preempt lazy fallout
Common code needs common defines.... Fixes: f2f9e49 ("x86: Support for lazy preemption") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 13, 2021
Commits on Jul 12, 2021
-
rtmutex: Use cmpxchg_release() on unlock
Fixes: e03cbdc ("locking/rtmutex: Provide lockdep less variants of rtmutex interfaces") Reported-by: Peter Ziljstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 12, 2021
Commits on Jul 10, 2021
-
mm/slub: Replace local_lock_irqsave/restore() calls in PREEMPT_RT scope
IRQ affecting local lock operations are intended for dual RT/!RT scope and only affect IRQs in !RT scope. For RT the irqsave/restore() is a NOOP. Replace the local_lock_irqsave/restore() operations in do_slab_free() with plain local_lock()/local_unlock(). Purely cosmetic, no functional change. Fixes: 340e7c4 ("mm, slub: convert kmem_cpu_slab protection to local_lock") Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/b9b2dbb062bd25d9f6c0b918c3c834bc6964c2d6.camel@gmx.de
Mike Galbraith authored and Thomas Gleixner committedJul 10, 2021 -
mm/slub: Fix kmem_cache_alloc_bulk() error path
The kmem_cache_alloc_bulk() error exit path double unlocks cpu_slab->lock instead of making the required slub_put_cpu_ptr() call. Fix that. Boring details: 1. 12c69ba ("mm, slub: move disabling/enabling irqs to ___slab_alloc())") adds local_irq_enable() above goto error, leaving the one at error: intact. 2. 2180da7 ("mm, slub: use migrate_disable() on PREEMPT_RT") adds slub_get/put_cpu_ptr() calls, missing the already broken error path, creating unpaired slub_put_cpu_ptr()/slub_put_cpu_ptr() calls. 3. 340e7c4 ("mm, slub: convert kmem_cpu_slab protection to local_lock") converts local_irq_enable() to local_unlock_irq(), culminating in a double unlock and unpaired slub_put_cpu_ptr()/slub_put_cpu_ptr(). Fixes: 12c69ba ("mm, slub: move disabling/enabling irqs to ___slab_alloc())") Fixes: 2180da7 ("mm, slub: use migrate_disable() on PREEMPT_RT") adds Fixes: 340e7c4 ("mm, slub: convert kmem_cpu_slab protection to local_lock") Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/966f27f7999acb1db8d60e241a73dfde3344345c.camel@gmx.de
Mike Galbraith authored and Thomas Gleixner committedJul 10, 2021 -
This depends on CONFIG_PROVE_RCU _and_ CONFIG_TASKS_RCU_GENERIC and requires a prototype if selected. Move it into the proper header file and fix the config dependencies. Reported-by: kernel test robot <lkp@intel.com> Fixes: 8abf1b2 ("rcu: Delay RCU-selftests") Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 10, 2021 -
locking/rt: Fix kernel doc for rwsem_rt_mutex_try_lock()
Drop the _no_lockdep postfix in the function documentation. Fixes: e03cbdc ("locking/rtmutex: Provide lockdep less variants of rtmutex interfaces") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 10, 2021
Commits on Jul 7, 2021
-
Add localversion for -RT release
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 7, 2021 -
Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
powerpc: Avoid recursive header includes
- The include of bug.h leads to an include of printk.h which gets back to spinlock.h and complains then about missing xchg(). Remove bug.h and add bits.h which is needed for BITS_PER_BYTE. - Avoid the "please don't include this file directly" error from rwlock-rt. Allow an include from/with rtmutex.h. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
powerpc/stackprotector: work around stack-guard init from atomic
This is invoked from the secondary CPU in atomic context. On x86 we use tsc instead. On Power we XOR it against mftb() so lets use stack address as the initial value. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
While converting the openpic emulation code to use a raw_spinlock_t enables guests to run on RT, there's still a performance issue. For interrupts sent in directed delivery mode with a multiple CPU mask, the emulated openpic will loop through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop through all the pending interrupts for that VCPU. This is done while holding the raw_lock, meaning that in all this time the interrupts and preemption are disabled on the host Linux. A malicious user app can max both these number and cause a DoS. This temporary fix is sent for two reasons. First is so that users who want to use the in-kernel MPIC emulation are aware of the potential latencies, thus making sure that the hardware MPIC and their usage scenario does not involve interrupts sent in directed delivery mode, and the number of possible pending interrupts is kept small. Secondly, this should incentivize the development of a proper openpic emulation that would be better suited for RT. Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Bogdan Purcareata authored and Thomas Gleixner committedJul 7, 2021 -
powerpc/pseries/iommu: Use a locallock instead local_irq_save()
The locallock protects the per-CPU variable tce_page. The function attempts to allocate memory while tce_page is protected (by disabling interrupts). Use local_irq_save() instead of local_irq_disable(). Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
powerpc: traps: Use PREEMPT_RT
Add PREEMPT_RT to the backtrace if enabled. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled section which is not working on -RT. Delay freeing of memory until preemption is enabled again. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating the vgic and timer states to prevent the calling task from migrating to another CPU. It does so to prevent the task from writing to the incorrect per-CPU GIC distributor registers. On -rt kernels, it's possible to maintain the same guarantee with the use of migrate_{disable,enable}(), with the added benefit that the migrate-disabled region is preemptible. Update kvm_arch_vcpu_ioctl_run() to do so. Cc: Christoffer Dall <christoffer.dall@linaro.org> Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com> Signed-off-by: Josh Cartwright <joshc@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>Josh Cartwright authored and Thomas Gleixner committedJul 7, 2021 -
ARM: enable irq in translation/section permission fault handlers
Probably happens on all ARM, with CONFIG_PREEMPT_RT CONFIG_DEBUG_ATOMIC_SLEEP This simple program.... int main() { *((char*)0xc0001000) = 0; }; [ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658 [ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a [ 512.743217] INFO: lockdep is turned off. [ 512.743360] irq event stamp: 0 [ 512.743482] hardirqs last enabled at (0): [< (null)>] (null) [ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0 [ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0 [ 512.744303] softirqs last disabled at (0): [< (null)>] (null) [ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104) [ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24) [ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0) [ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c) [ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0) [ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c) [ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8) [ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94) [ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48) [ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0) [ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8) Oxc0000000 belongs to kernel address space, user task can not be allowed to access it. For above condition, correct result is that test case should receive a “segment fault” and exits but not stacks. the root cause is commit 02fe284 ("avoid enabling interrupts in prefetch/data abort handlers"),it deletes irq enable block in Data abort assemble code and move them into page/breakpiont/alignment fault handlers instead. But author does not enable irq in translation/section permission fault handlers. ARM disables irq when it enters exception/ interrupt mode, if kernel doesn't enable irq, it would be still disabled during translation/section permission fault. We see the above splat because do_force_sig_info is still called with IRQs off, and that code eventually does a: spin_lock_irqsave(&t->sighand->siglock, flags); As this is architecture independent code, and we've not seen any other need for other arch to have the siglock converted to raw lock, we can conclude that we should enable irq for ARM translation/section permission exception. Signed-off-by: Yadi.hu <yadi.hu@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>Yadi.hu authored and Thomas Gleixner committedJul 7, 2021 -
arch/arm64: Add lazy preempt support
arm64 is missing support for PREEMPT_RT. The main feature which is lacking is support for lazy preemption. The arch-specific entry code, thread information structure definitions, and associated data tables have to be extended to provide this support. Then the Kconfig file has to be extended to indicate the support is available, and also to indicate that support for full RT preemption is now available. Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
powerpc: Add support for lazy preemption
Implement the powerpc pieces for lazy preempt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 7, 2021 -
arm: Add support for lazy preemption
Implement the arm pieces for lazy preempt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 7, 2021 -
x86: Support for lazy preemption
Implement the x86 pieces for lazy preempt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 7, 2021 -
x86/entry: Use should_resched() in idtentry_exit_cond_resched()
The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter. By using should_resched(0) instead of need_resched() the same check can be performed which uses the same variable as 'preempt_count()` which was issued before. Use should_resched(0) instead need_resched(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
sched: Add support for lazy preemption
It has become an obsession to mitigate the determinism vs. throughput loss of RT. Looking at the mainline semantics of preemption points gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER tasks. One major issue is the wakeup of tasks which are right away preempting the waking task while the waking task holds a lock on which the woken task will block right after having preempted the wakee. In mainline this is prevented due to the implicit preemption disable of spin/rw_lock held regions. On RT this is not possible due to the fully preemptible nature of sleeping spinlocks. Though for a SCHED_OTHER task preempting another SCHED_OTHER task this is really not a correctness issue. RT folks are concerned about SCHED_FIFO/RR tasks preemption and not about the purely fairness driven SCHED_OTHER preemption latencies. So I introduced a lazy preemption mechanism which only applies to SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the existing preempt_count each tasks sports now a preempt_lazy_count which is manipulated on lock acquiry and release. This is slightly incorrect as for lazyness reasons I coupled this on migrate_disable/enable so some other mechanisms get the same treatment (e.g. get_cpu_light). Now on the scheduler side instead of setting NEED_RESCHED this sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and therefor allows to exit the waking task the lock held region before the woken task preempts. That also works better for cross CPU wakeups as the other side can stay in the adaptive spinning loop. For RT class preemption there is no change. This simply sets NEED_RESCHED and forgoes the lazy preemption counter. Initial test do not expose any observable latency increasement, but history shows that I've been proven wrong before :) The lazy preemption mode is per default on, but with CONFIG_SCHED_DEBUG enabled it can be disabled via: # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features and reenabled via # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features The test results so far are very machine and workload dependent, but there is a clear trend that it enhances the non RT workload performance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 7, 2021 -
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
x86: kvm Require const tsc for RT
Non constant TSC is a nightmare on bare metal already, but with virtualization it becomes a complete disaster because the workarounds are horrible latency wise. That's also a preliminary for running RT in a guest on top of a RT host. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 7, 2021 -
signal/x86: Delay calling signals in atomic
On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
sysfs: Add /sys/kernel/realtime entry
Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
tpm_tis: fix stall after iowrite*()s
ioread8() operations to TPM MMIO addresses can stall the cpu when immediately following a sequence of iowrite*()'s to the same region. For example, cyclitest measures ~400us latency spikes when a non-RT usermode application communicates with an SPI-based TPM chip (Intel Atom E3940 system, PREEMPT_RT kernel). The spikes are caused by a stalling ioread8() operation following a sequence of 30+ iowrite8()s to the same address. I believe this happens because the write sequence is buffered (in cpu or somewhere along the bus), and gets flushed on the first LOAD instruction (ioread*()) that follows. The enclosed change appears to fix this issue: read the TPM chip's access register (status code) after every iowrite*() operation to amortize the cost of flushing data to chip across multiple instructions. Signed-off-by: Haris Okanovic <haris.okanovic@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
tty/serial/pl011: Make the locking work on RT
The lock is a sleeping lock and local_irq_save() is not the optimsation we are looking for. Redo it to make it work on -RT and non-RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 7, 2021 -
tty/serial/omap: Make the locking RT aware
The lock is a sleeping lock and local_irq_save() is not the optimsation we are looking for. Redo it to make it work on -RT and non-RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner committedJul 7, 2021 -
drm/i915/gt: Only disable interrupts for the timeline lock on !force-…
…threaded According to commit d677392 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe") the intrrupts are disabled the code may be called from an interrupt handler and from preemptible context. With `force_irqthreads' set the timeline mutex is never observed in IRQ context so it is not neede to disable interrupts. Disable only interrupts if not in `force_irqthreads' mode. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021 -
drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
The order of the header files is important. If this header file is included after tracepoint.h was included then the NOTRACE here becomes a nop. Currently this happens for two .c files which use the tracepoitns behind DRM_I915_LOW_LEVEL_TRACEPOINTS. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sebastian Andrzej Siewior authored and Thomas Gleixner committedJul 7, 2021