Skip to content
Permalink
Peter-Oskolkov…
Switch branches/tags

Commits on Feb 11, 2022

  1. RFC: sched: UMCG: episode IV: A New Hope

    A lot of effort has been put into making UMCG based on
    userspace TLS data work, and it gets ugly very fast because
    it is very hard to guarantee that the pages are present
    when needed; and they are needed in non-preemptible (sched)
    contexts. The last attempt here:
    https://lore.kernel.org/lkml/20220120155517.066795336@infradead.org/
    is a good example: a lot of mm-related work, a lot of
    extra stuff added to struct task_struct just to deal
    with kernel-to-userspace writes in sched contexts.
    
    Here I propose a different approach (actually, it was my first approach,
    before we pivoted to userspace TLS). Keep everything the kernel
    needs in a kernel-side struct umcg_task, and copy relevant
    data out to the userspace when the server's sys_umcg_wait() returns.
    
    Before I go too deep down into implementing and testing this,
    I'd like to get some feedback re: if this approach is acceptable.
    
    Please review.
    
    =====================
    
    User Managed Concurrency Groups is an M:N threading toolkit that allows
    constructing user space schedulers designed to efficiently manage
    heterogeneous in-process workloads while maintaining high CPU
    utilization (95%+).
    
    Add UMCG syscall stubs, Kconfig, as well as stubs for hooks into
    sched, execve, etc., as this boilerplate is more or less stable,
    comparing to various approaches attempted at implementing UMCG.
    
    Signed-off-by: Peter Oskolkov <posk@google.com>
    posk-io authored and intel-lab-lkp committed Feb 11, 2022

Commits on Feb 2, 2022

  1. sched: move autogroup sysctls into its own file

    move autogroup sysctls to autogroup.c and use the new
    register_sysctl_init() to register the sysctl interface.
    
    Signed-off-by: Zhen Ni <nizhen@uniontech.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220128095025.8745-1-nizhen@uniontech.com
    nizhenth authored and Peter Zijlstra committed Feb 2, 2022
  2. selftests/rseq: x86-32: use %gs segment selector for accessing rseq t…

    …hread area
    
    Rather than use rseq_get_abi() and pass its result through a register to
    the inline assembler, directly access the per-thread rseq area through a
    memory reference combining the %gs segment selector, the constant offset
    of the field in struct rseq, and the rseq_offset value (in a register).
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-16-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  3. selftests/rseq: x86-64: use %fs segment selector for accessing rseq t…

    …hread area
    
    Rather than use rseq_get_abi() and pass its result through a register to
    the inline assembler, directly access the per-thread rseq area through a
    memory reference combining the %fs segment selector, the constant offset
    of the field in struct rseq, and the rseq_offset value (in a register).
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-15-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  4. selftests/rseq: Fix: work-around asm goto compiler bugs

    gcc and clang each have their own compiler bugs with respect to asm
    goto. Implement a work-around for compiler versions known to have those
    bugs.
    
    gcc prior to 4.8.2 miscompiles asm goto.
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
    
    gcc prior to 8.1.0 miscompiles asm goto at O1.
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103908
    
    clang prior to version 13.0.1 miscompiles asm goto at O2.
    llvm/llvm-project#52735
    
    Work around these issues by adding a volatile inline asm with
    memory clobber in the fallthrough after the asm goto and at each
    label target.  Emit this for all compilers in case other similar
    issues are found in the future.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-14-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  5. selftests/rseq: Remove arm/mips asm goto compiler work-around

    The arm and mips work-around for asm goto size guess issues are not
    properly documented, and lack reference to specific compiler versions,
    upstream compiler bug tracker entry, and reproducer.
    
    I can only find a loosely documented patch in my original LKML rseq post
    refering to gcc < 7 on ARM, but it does not appear to be sufficient to
    track the exact issue. Also, I am not sure MIPS really has the same
    limitation.
    
    Therefore, remove the work-around until we can properly document this.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/lkml/20171121141900.18471-17-mathieu.desnoyers@efficios.com/
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  6. selftests/rseq: Fix warnings about #if checks of undefined tokens

    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-12-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  7. selftests/rseq: Fix ppc32 offsets by using long rather than off_t

    The semantic of off_t is for file offsets. We mean to use it as an
    offset from a pointer. We really expect it to fit in a single register,
    and not use a 64-bit type on 32-bit architectures.
    
    Fix runtime issues on ppc32 where the offset is always 0 due to
    inconsistency between the argument type (off_t -> 64-bit) and type
    expected by the inline assembler (32-bit).
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-11-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  8. selftests/rseq: Fix ppc32 missing instruction selection "u" and "x" f…

    …or load/store
    
    Building the rseq basic test  with
    gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
    Target: powerpc-linux-gnu
    
    leads to these errors:
    
    /tmp/ccieEWxU.s: Assembler messages:
    /tmp/ccieEWxU.s:118: Error: syntax error; found `,', expected `('
    /tmp/ccieEWxU.s:118: Error: junk at end of line: `,8'
    /tmp/ccieEWxU.s:121: Error: syntax error; found `,', expected `('
    /tmp/ccieEWxU.s:121: Error: junk at end of line: `,8'
    /tmp/ccieEWxU.s:626: Error: syntax error; found `,', expected `('
    /tmp/ccieEWxU.s:626: Error: junk at end of line: `,8'
    /tmp/ccieEWxU.s:629: Error: syntax error; found `,', expected `('
    /tmp/ccieEWxU.s:629: Error: junk at end of line: `,8'
    /tmp/ccieEWxU.s:735: Error: syntax error; found `,', expected `('
    /tmp/ccieEWxU.s:735: Error: junk at end of line: `,8'
    /tmp/ccieEWxU.s:738: Error: syntax error; found `,', expected `('
    /tmp/ccieEWxU.s:738: Error: junk at end of line: `,8'
    /tmp/ccieEWxU.s:741: Error: syntax error; found `,', expected `('
    /tmp/ccieEWxU.s:741: Error: junk at end of line: `,8'
    Makefile:581: recipe for target 'basic_percpu_ops_test.o' failed
    
    Based on discussion with Linux powerpc maintainers and review of
    the use of the "m" operand in powerpc kernel code, add the missing
    %Un%Xn (where n is operand number) to the lwz, stw, ld, and std
    instructions when used with "m" operands.
    
    Using "WORD" to mean either a 32-bit or 64-bit type depending on
    the architecture is misleading. The term "WORD" really means a
    32-bit type in both 32-bit and 64-bit powerpc assembler. The intent
    here is to wrap load/store to intptr_t into common macros for both
    32-bit and 64-bit.
    
    Rename the macros with a RSEQ_ prefix, and use the terms "INT"
    for always 32-bit type, and "LONG" for architecture bitness-sized
    type.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-10-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  9. selftests/rseq: Fix ppc32: wrong rseq_cs 32-bit field pointer on big …

    …endian
    
    ppc32 incorrectly uses padding as rseq_cs pointer field. Fix this by
    using the rseq_cs.arch.ptr field.
    
    Use this field across all architectures.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-9-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  10. selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35

    glibc-2.35 (upcoming release date 2022-02-01) exposes the rseq per-thread
    data in the TCB, accessible at an offset from the thread pointer, rather
    than through an actual Thread-Local Storage (TLS) variable, as the
    Linux kernel selftests initially expected.
    
    The __rseq_abi TLS and glibc-2.35's ABI for per-thread data cannot
    actively coexist in a process, because the kernel supports only a single
    rseq registration per thread.
    
    Here is the scheme introduced to ensure selftests can work both with an
    older glibc and with glibc-2.35+:
    
    - librseq exposes its own "rseq_offset, rseq_size, rseq_flags" ABI.
    
    - librseq queries for glibc rseq ABI (__rseq_offset, __rseq_size,
      __rseq_flags) using dlsym() in a librseq library constructor. If those
      are found, copy their values into rseq_offset, rseq_size, and
      rseq_flags.
    
    - Else, if those glibc symbols are not found, handle rseq registration
      from librseq and use its own IE-model TLS to implement the rseq ABI
      per-thread storage.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-8-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  11. selftests/rseq: Introduce thread pointer getters

    This is done in preparation for the selftest uplift to become compatible
    with glibc-2.35.
    
    glibc-2.35 exposes the rseq per-thread data in the TCB, accessible
    at an offset from the thread pointer.
    
    The toolchains do not implement accessing the thread pointer on all
    architectures. Provide thread pointer getters for ppc and x86 which
    lack (or lacked until recently) toolchain support.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-7-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  12. selftests/rseq: Introduce rseq_get_abi() helper

    This is done in preparation for the selftest uplift to become compatible
    with glibc-2.35.
    
    glibc-2.35 exposes the rseq per-thread data in the TCB, accessible
    at an offset from the thread pointer, rather than through an actual
    Thread-Local Storage (TLS) variable, as the kernel selftests initially
    expected.
    
    Introduce a rseq_get_abi() helper, initially using the __rseq_abi
    TLS variable, in preparation for changing this userspace ABI for one
    which is compatible with glibc-2.35.
    
    Note that the __rseq_abi TLS and glibc-2.35's ABI for per-thread data
    cannot actively coexist in a process, because the kernel supports only
    a single rseq registration per thread.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-6-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  13. selftests/rseq: Remove volatile from __rseq_abi

    This is done in preparation for the selftest uplift to become compatible
    with glibc-2.35.
    
    All accesses to the __rseq_abi fields are volatile, but remove the
    volatile from the TLS variable declaration, otherwise we are stuck with
    volatile for the upcoming rseq_get_abi() helper.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-5-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  14. selftests/rseq: Remove useless assignment to cpu variable

    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-4-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  15. rseq: Remove broken uapi field layout on 32-bit little endian

    The rseq rseq_cs.ptr.{ptr32,padding} uapi endianness handling is
    entirely wrong on 32-bit little endian: a preprocessor logic mistake
    wrongly uses the big endian field layout on 32-bit little endian
    architectures.
    
    Fortunately, those ptr32 accessors were never used within the kernel,
    and only meant as a convenience for user-space.
    
    Remove those and replace the whole rseq_cs union by a __u64 type, as
    this is the only thing really needed to express the ABI. Document how
    32-bit architectures are meant to interact with this field.
    
    Fixes: ec9c82e ("rseq: uapi: Declare rseq_cs field as union, update includes")
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220127152720.25898-1-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022
  16. selftests/rseq: introduce own copy of rseq uapi header

    The Linux kernel rseq uapi header has a broken layout for the
    rseq_cs.ptr field on 32-bit little endian architectures. The entire
    rseq_cs.ptr field is planned for removal, leaving only the 64-bit
    rseq_cs.ptr64 field available.
    
    Both glibc and librseq use their own copy of the Linux kernel uapi
    header, where they introduce proper union fields to access to the 32-bit
    low order bits of the rseq_cs pointer on 32-bit architectures.
    
    Introduce a copy of the Linux kernel uapi headers in the Linux kernel
    selftests.
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220124171253.22072-2-mathieu.desnoyers@efficios.com
    compudj authored and Peter Zijlstra committed Feb 2, 2022

Commits on Jan 27, 2022

  1. psi: Fix "no previous prototype" warnings when CONFIG_CGROUPS=n

    When CONFIG_CGROUPS is disabled psi code generates the following warnings:
    
    kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes]
        1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group,
             |                     ^~~~~~~~~~~~~~~~~~
    kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes]
        1182 | void psi_trigger_destroy(struct psi_trigger *t)
             |      ^~~~~~~~~~~~~~~~~~~
    kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes]
        1249 | __poll_t psi_trigger_poll(void **trigger_ptr,
             |          ^~~~~~~~~~~~~~~~
    
    Change declarations of these functions in the header to provide the
    prototypes even when they are unused.
    
    Fixes: 0e94682 ("psi: introduce psi monitor")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com
    surenbaghdasaryan authored and Peter Zijlstra committed Jan 27, 2022
  2. psi: Fix "defined but not used" warnings when CONFIG_PROC_FS=n

    When CONFIG_PROC_FS is disabled psi code generates the following warnings:
    
    kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=]
        1364 | static const struct proc_ops psi_cpu_proc_ops = {
             |                              ^~~~~~~~~~~~~~~~
    kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=]
        1355 | static const struct proc_ops psi_memory_proc_ops = {
             |                              ^~~~~~~~~~~~~~~~~~~
    kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=]
        1346 | static const struct proc_ops psi_io_proc_ops = {
             |                              ^~~~~~~~~~~~~~~
    
    Make definitions of these structures and related functions conditional on
    CONFIG_PROC_FS config.
    
    Fixes: 0e94682 ("psi: introduce psi monitor")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com
    surenbaghdasaryan authored and Peter Zijlstra committed Jan 27, 2022
  3. sched/uclamp: Fix iowait boost escaping uclamp restriction

    iowait_boost signal is applied independently of util and doesn't take
    into account uclamp settings of the rq. An io heavy task that is capped
    by uclamp_max could still request higher frequency because
    sugov_iowait_apply() doesn't clamp the boost via uclamp_rq_util_with()
    like effective_cpu_util() does.
    
    Make sure that iowait_boost honours uclamp requests by calling
    uclamp_rq_util_with() when applying the boost.
    
    Fixes: 982d9cd ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
    Signed-off-by: Qais Yousef <qais.yousef@arm.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Link: https://lore.kernel.org/r/20211216225320.2957053-3-qais.yousef@arm.com
    qais-yousef authored and Peter Zijlstra committed Jan 27, 2022
  4. sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max

    sugov_update_single_{freq, perf}() contains a 'busy' filter that ensures
    we don't bring the frqeuency down if there's no idle time (CPU is busy).
    
    The problem is that with uclamp_max we will have scenarios where a busy
    task is capped to run at a lower frequency and this filter prevents
    applying the capping when this task starts running.
    
    We handle this by skipping the filter when uclamp is enabled and the rq
    is being capped by uclamp_max.
    
    We introduce a new function uclamp_rq_is_capped() to help detecting when
    this capping is taking effect. Some code shuffling was required to allow
    using cpu_util_{cfs, rt}() in this new function.
    
    On 2 Core SMT2 Intel laptop I see:
    
    Without this patch:
    
    	uclampset -M 0 sysbench --test=cpu --threads = 4 run
    
    produces a score of ~3200 consistently. Which is the highest possible.
    
    Compiling the kernel also results in frequency running at max 3.1GHz all
    the time - running uclampset -M 400 to cap it has no effect without this
    patch.
    
    With this patch:
    
    	uclampset -M 0 sysbench --test=cpu --threads = 4 run
    
    produces a score of ~1100 with some outliers in ~1700. Uclamp max
    aggregates the performance requirements, so having high values sometimes
    is expected if some other task happens to require that frequency starts
    running at the same time.
    
    When compiling the kernel with uclampset -M 400 I can see the
    frequencies mostly in the ~2GHz region. Helpful to conserve power and
    prevent heating when not plugged in.
    
    Fixes: 982d9cd ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
    Signed-off-by: Qais Yousef <qais.yousef@arm.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20211216225320.2957053-2-qais.yousef@arm.com
    qais-yousef authored and Peter Zijlstra committed Jan 27, 2022
  5. sched/core: Export pelt_thermal_tp

    We can't use this tracepoint in modules without having the symbol
    exported first, fix that.
    
    Fixes: 7650479 ("sched/pelt: Add support to track thermal pressure")
    Signed-off-by: Qais Yousef <qais.yousef@arm.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20211028115005.873539-1-qais.yousef@arm.com
    qais-yousef authored and Peter Zijlstra committed Jan 27, 2022
  6. MAINTAINERS: add Suren as psi co-maintainer

    Suren wrote the poll() interface, which is a significant part of the
    psi code and represents a large user of psi itself (Android). It's a
    good idea to have him look at psi patches as well, and it's good to
    have two people following things in case one of us is traveling.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20220117120317.1581315-1-hannes@cmpxchg.org
    hnaz authored and Peter Zijlstra committed Jan 27, 2022
  7. sched/numa: initialize numa statistics when forking new task

    The child processes will inherit numa_pages_migrated and
    total_numa_faults from the parent. It means even if there is no numa
    fault happen on the child, the statistics in /proc/$pid of the child
    process might show huge amount. This is a bit weird. Let's initialize
    them when do fork.
    
    Signed-off-by: Honglei Wang <wanghonglei@didichuxing.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Mel Gorman <mgorman@suse.de>
    Link: https://lore.kernel.org/r/20220113133920.49900-1-wanghonglei@didichuxing.com
    Honglei Wang authored and Peter Zijlstra committed Jan 27, 2022
  8. sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show…

    …_numa
    
    The older format of /proc/pid/sched printed home node info which
    required the mempolicy and task lock around mpol_get(). However
    the format has changed since then and there is no need for
    sched_show_numa() any more to have mempolicy argument,
    asssociated mpol_get/put and task_lock/unlock. Remove them.
    
    Fixes: 397f237 ("sched/numa: Fix numa balancing stats in /proc/pid/sched")
    Signed-off-by: Bharata B Rao <bharata@amd.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
    Acked-by: Mel Gorman <mgorman@suse.de>
    Link: https://lore.kernel.org/r/20220118050515.2973-1-bharata@amd.com
    Bharata B Rao authored and Peter Zijlstra committed Jan 27, 2022

Commits on Jan 23, 2022

  1. Linux 5.17-rc1

    torvalds committed Jan 23, 2022
  2. Merge tag 'perf-tools-for-v5.17-2022-01-22' of git://git.kernel.org/p…

    …ub/scm/linux/kernel/git/acme/linux
    
    Pull more perf tools updates from Arnaldo Carvalho de Melo:
    
     - Fix printing 'phys_addr' in 'perf script'.
    
     - Fix failure to add events with 'perf probe' in ppc64 due to not
       removing leading dot (ppc64 ABIv1).
    
     - Fix cpu_map__item() python binding building.
    
     - Support event alias in form foo-bar-baz, add pmu-events and
       parse-event tests for it.
    
     - No need to setup affinities when starting a workload or attaching to
       a pid.
    
     - Use path__join() to compose a path instead of ad-hoc snprintf()
       equivalent.
    
     - Override attr->sample_period for non-libpfm4 events.
    
     - Use libperf cpumap APIs instead of accessing the internal state
       directly.
    
     - Sync x86 arch prctl headers and files changed by the new
       set_mempolicy_home_node syscall with the kernel sources.
    
     - Remove duplicate include in cpumap.h.
    
     - Remove redundant err variable.
    
    * tag 'perf-tools-for-v5.17-2022-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
      perf tools: Remove redundant err variable
      perf test: Add parse-events test for aliases with hyphens
      perf test: Add pmu-events test for aliases with hyphens
      perf parse-events: Support event alias in form foo-bar-baz
      perf evsel: Override attr->sample_period for non-libpfm4 events
      perf cpumap: Remove duplicate include in cpumap.h
      perf cpumap: Migrate to libperf cpumap api
      perf python: Fix cpu_map__item() building
      perf script: Fix printing 'phys_addr' failure issue
      tools headers UAPI: Sync files changed by new set_mempolicy_home_node syscall
      tools headers UAPI: Sync x86 arch prctl headers with the kernel sources
      perf machine: Use path__join() to compose a path instead of snprintf(dir, '/', filename)
      perf evlist: No need to setup affinities when disabling events for pid targets
      perf evlist: No need to setup affinities when enabling events for pid targets
      perf stat: No need to setup affinities when starting a workload
      perf affinity: Allow passing a NULL arg to affinity__cleanup()
      perf probe: Fix ppc64 'perf probe add events failed' case
    torvalds committed Jan 23, 2022
  3. Merge tag 'trace-v5.17-3' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/rostedt/linux-trace
    
    Pull ftrace fix from Steven Rostedt:
     "Fix s390 breakage from sorting mcount tables.
    
      The latest merge of the tracing tree sorts the mcount table at build
      time. But s390 appears to do things differently (like always) and
      replaces the sorted table back to the original unsorted one. As the
      ftrace algorithm depends on it being sorted, bad things happen when it
      is not, and s390 experienced those bad things.
    
      Add a new config to tell the boot if the mcount table is sorted or
      not, and allow s390 to opt out of it"
    
    * tag 'trace-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
      ftrace: Fix assuming build time sort works for s390
    torvalds committed Jan 23, 2022
  4. ftrace: Fix assuming build time sort works for s390

    To speed up the boot process, as mcount_loc needs to be sorted for ftrace
    to work properly, sorting it at build time is more efficient than boot up
    and can save milliseconds of time. Unfortunately, this change broke s390
    as it will modify the mcount_loc location after the sorting takes place
    and will put back the unsorted locations. Since the sorting is skipped at
    boot up if it is believed that it was sorted at run time, ftrace can crash
    as its algorithms are dependent on the list being sorted.
    
    Add a new config BUILDTIME_MCOUNT_SORT that is set when
    BUILDTIME_TABLE_SORT but not if S390 is set. Use this config to determine
    if sorting should take place at boot up.
    
    Link: https://lore.kernel.org/all/yt9dee51ctfn.fsf@linux.ibm.com/
    
    Fixes: 72b3942 ("scripts: ftrace - move the sort-processing in ftrace_init")
    Reported-by: Sven Schnelle <svens@linux.ibm.com>
    Tested-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    rostedt committed Jan 23, 2022
  5. Merge tag 'kbuild-fixes-v5.17' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/masahiroy/linux-kbuild
    
    Pull Kbuild fixes from Masahiro Yamada:
    
     - Bring include/uapi/linux/nfc.h into the UAPI compile-test coverage
    
     - Revert the workaround of CONFIG_CC_IMPLICIT_FALLTHROUGH
    
     - Fix build errors in certs/Makefile
    
    * tag 'kbuild-fixes-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
      certs: Fix build error when CONFIG_MODULE_SIG_KEY is empty
      certs: Fix build error when CONFIG_MODULE_SIG_KEY is PKCS#11 URI
      Revert "Makefile: Do not quote value for CONFIG_CC_IMPLICIT_FALLTHROUGH"
      usr/include/Makefile: add linux/nfc.h to the compile-test coverage
    torvalds committed Jan 23, 2022
  6. Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux

    Pull bitmap updates from Yury Norov:
    
     - introduce for_each_set_bitrange()
    
     - use find_first_*_bit() instead of find_next_*_bit() where possible
    
     - unify for_each_bit() macros
    
    * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
      vsprintf: rework bitmap_list_string
      lib: bitmap: add performance test for bitmap_print_to_pagebuf
      bitmap: unify find_bit operations
      mm/percpu: micro-optimize pcpu_is_populated()
      Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
      find: micro-optimize for_each_{set,clear}_bit()
      include/linux: move for_each_bit() macros from bitops.h to find.h
      cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
      tools: sync tools/bitmap with mother linux
      all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
      cpumask: use find_first_and_bit()
      lib: add find_first_and_bit()
      arch: remove GENERIC_FIND_FIRST_BIT entirely
      include: move find.h from asm_generic to linux
      bitops: move find_bit_*_le functions from le.h to find.h
      bitops: protect find_first_{,zero}_bit properly
    torvalds committed Jan 23, 2022

Commits on Jan 22, 2022

  1. perf tools: Remove redundant err variable

    Return value from perf_event__process_tracing_data() directly instead
    of taking this in another redundant variable.
    
    Reported-by: Zeal Robot <zealci@zte.com.cn>
    Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Link: http://lore.kernel.org/lkml/20220112080109.666800-1-chi.minghao@zte.com.cn
    Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Minghao Chi authored and Arnaldo Carvalho de Melo committed Jan 22, 2022
  2. perf test: Add parse-events test for aliases with hyphens

    Add a test which allows us to test parsing an event alias with hyphens.
    
    Since these events typically do not exist on most host systems, add the
    alias to the fake pmu.
    
    Function perf_pmu__test_parse_init() has terms added to match known test
    aliases.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Acked-by: Ian Rogers <irogers@google.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kajol Jain <kjain@linux.ibm.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Qi Liu <liuqi115@huawei.com>
    Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
    Cc: linuxarm@huawei.com
    Link: https://lore.kernel.org/r/1642432215-234089-4-git-send-email-john.garry@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    johnpgarry authored and Arnaldo Carvalho de Melo committed Jan 22, 2022
  3. perf test: Add pmu-events test for aliases with hyphens

    Add a test for aliases with hyphens in the name to ensure that the
    pmu-events tables are as expects. There should be no reason why these sort
    of aliases would be treated differently, but no harm in checking.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Acked-by: Ian Rogers <irogers@google.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kajol Jain <kjain@linux.ibm.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Qi Liu <liuqi115@huawei.com>
    Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
    Cc: linuxarm@huawei.com
    Link: https://lore.kernel.org/r/1642432215-234089-3-git-send-email-john.garry@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    johnpgarry authored and Arnaldo Carvalho de Melo committed Jan 22, 2022
  4. perf parse-events: Support event alias in form foo-bar-baz

    Event aliasing for events whose name in the form foo-bar-baz is not
    supported, while foo-bar, foo_bar_baz, and other combinations are, i.e.
    two hyphens are not supported.
    
    The HiSilicon D06 platform has events in such form:
    
      $ ./perf list sdir-home-migrate
    
      List of pre-defined events (to be used in -e):
    
      uncore hha:
        sdir-home-migrate
       [Unit: hisi_sccl,hha]
    
      $ sudo ./perf stat -e sdir-home-migrate
      event syntax error: 'sdir-home-migrate'
                              \___ parser error
      Run 'perf list' for a list of valid events
    
       Usage: perf stat [<options>] [<command>]
    
       -e, --event <event>event selector. use 'perf list' to list available events
    
    To support, add an extra PMU event symbol type for "baz", and add a new
    rule in the bison file.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Acked-by: Ian Rogers <irogers@google.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kajol Jain <kjain@linux.ibm.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Qi Liu <liuqi115@huawei.com>
    Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
    Cc: linuxarm@huawei.com
    Link: https://lore.kernel.org/r/1642432215-234089-2-git-send-email-john.garry@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    johnpgarry authored and Arnaldo Carvalho de Melo committed Jan 22, 2022
Older