Skip to content
Permalink
Peter-Oskolkov…
Switch branches/tags

Commits on Nov 21, 2021

  1. sched/umcg, lib/umcg: add tools/lib/umcg/libumcg.txt

    Document libumcg.
    
    Signed-off-by: Peter Oskolkov <posk@google.com>
    posk-io authored and intel-lab-lkp committed Nov 21, 2021
  2. sched/umcg: add Documentation/userspace-api/umcg.txt

    Document User Managed Concurrency Groups syscalls, data structures,
    state transitions, etc. in UMGG kernel API.
    
    Signed-off-by: Peter Oskolkov <posk@google.com>
    posk-io authored and intel-lab-lkp committed Nov 21, 2021
  3. sched/umcg, lib/umcg: implement libumcg

    Implement libumcg in tools/lib/umcg. Define higher-level UMCG
    API that hides kernel-level UMCG API intricacies.
    
    As a higher-level API, libumcg makes subtle changes to server/worker
    interactions, compared to the kernel UMCG API, and introduces
    the following new concepts:
    
    - UMCG Group: a collection of servers and workers in a process
      that can interact with each other; UMCG groups are useful to
      partition servers and workers within a process in order to, for
      example, affine work to specific NUMA nodes;
    - UMCG basic tasks: these are UMCG servers, from the kernel point
      of view; they do not interact with UMCG workers and thus
      do not need in UMCG groups; used for cooperative wait/wake/swap
      operations.
    
    The main difference of server/worker interaction in libumcg
    vs the kernel-side UMCG API is that a wakeup can be queued:
    if umcg_wake() is called on a RUNNING UMCG task, the fact is
    recorded (in the userspace), and when the task calls umcg_wait()
    or umcg_swap(), the wakeup is consumed and the task is not
    marked IDLE.
    
    Libumcg exports the following API:
            umcg_enabled()
            umcg_get_utid()
            umcg_set_task_tag()
            umcg_get_task_tag()
            umcg_create_group()
            umcg_destroy_group()
            umcg_register_basic_task()
            umcg_register_worker()
            umcg_register_server()
            umcg_unregister_task()
            umcg_wait()
            umcg_wake()
            umcg_swap()
            umcg_get_idle_worker()
            umcg_run_worker()
            umcg_preempt_worker()
            umcg_get_time_ns()
    
    See tools/lib/umcg/libumcg.txt for details.
    
    Notes:
    - this is still somewhat work-in-progress: while the kernel side
      code has been more or less stable over the last couple of months,
      the userspace side of things is less so;
    - while libumcg is intended to be the main/primary/only direct user
      of the kernel UMCG API, at the moment the implementation is more
      geared more towards testing and correctness than live production
      usage, with a lot of asserts and similar development helpers;
    - I have a number of umcg selftests that I plan to clean up and
      post shortly.
    
    Signed-off-by: Peter Oskolkov <posk@google.com>
    posk-io authored and intel-lab-lkp committed Nov 21, 2021
  4. sched/umcg: implement UMCG syscalls

    Define struct umcg_task and two syscalls: sys_umcg_ctl sys_umcg_wait.
    
    User Managed Concurrency Groups is an M:N threading toolkit that allows
    constructing user space schedulers designed to efficiently manage
    heterogeneous in-process workloads while maintaining high CPU
    utilization (95%+).
    
    In addition, M:N threading and cooperative user space scheduling
    enables synchronous coding style and better cache locality when
    compared to asynchronous callback/continuation style of programming.
    
    UMCG kernel API is build around the following ideas:
    
    * UMCG server: a task/thread representing "kernel threads", or (v)CPUs;
    * UMCG worker: a task/thread representing "application threads", to be
      scheduled over servers;
    * UMCG task state: (NONE), RUNNING, BLOCKED, IDLE: states a UMCG task (a
      server or a worker) can be in;
    * UMCG task state flag: LOCKED, PREEMPTED: additional state flags that
      can be ORed with the task state to communicate additional information to
      the kernel;
    * struct umcg_task: a per-task userspace set of data fields, usually
      residing in the TLS, that fully reflects the current task's UMCG state
      and controls the way the kernel manages the task;
    * sys_umcg_ctl(): a syscall used to register the current task/thread as a
      server or a worker, or to unregister a UMCG task;
    * sys_umcg_wait(): a syscall used to put the current task to sleep and/or
      wake another task, pontentially context-switching between the two tasks
      on-CPU synchronously.
    
    In short, servers can be thought of as CPUs over which application
    threads (workers) are scheduled; at any one time a worker is either:
    - RUNNING: has a server and is schedulable by the kernel;
    - BLOCKED: blocked in the kernel (e.g. on I/O, or a futex);
    - IDLE: is not blocked, but cannot be scheduled by the kernel to
      run because it has no server assigned to it (e.g. because all
      available servers are busy "running" other workers).
    
    Usually the number of servers in a process is equal to the number of
    CPUs available to the kernel if the process is supposed to consume
    the whole machine, or less than the number of CPUs available if the
    process is sharing the machine with other workloads. The number of
    workers in a process can grow very large: tens of thousands is normal;
    hundreds of thousands and more (millions) is something that would
    be desirable to achieve in the future, as lightweight userspace
    threads in Java and Go easily scale to millions, and UMCG workers
    are (intended to be) conceptually similar to those.
    
    Detailed use cases and API behavior are provided in
    Documentation/userspace-api/umcg.txt (see sibling patches).
    
    Some high-level implementation notes:
    
    UMCG tasks (workers and servers) are "tagged" with struct umcg_task
    residing in userspace (usually in TLS) to facilitate kernel/userspace
    communication. This makes the kernel-side code much simpler (see e.g.
    the implementation of sys_umcg_wait), but also requires some careful
    uaccess handling and page pinning (see below).
    
    The main UMCG server/worker interaction looks like:
    
    a. worker W1 is RUNNING, with a server S attached to it sleeping
       in IDLE state;
    b. worker W1 blocks in the kernel, e.g. on I/O;
    c. the kernel marks W1 as BLOCKED, the attached server S
       as RUNNING, and wakes S (the "block detection" event);
    d. the server now picks another IDLE worker W2 to run: marks
       W2 as RUNNING, itself as IDLE, ands calls sys_umcg_wait();
    e. when the blocking operation of W1 completes, the worker
       is marked by the kernel as IDLE and added to idle workers list
       (see struct umcg_task) for the userspace to pick up and
       later run (the "wake detection" event).
    
    While there are additional operations such as worker-to-worker
    context switch, preemption, workers "yielding", etc., the "workflow"
    above is the main worker/server interaction that drives the
    implementation.
    
    Specifically:
    
    - most operations are conceptually context switches:
        - scheduling a worker: a running server goes to sleep and "runs"
          a worker in its place;
        - block detection: worker is descheduled, and its server is woken;
        - wake detection: woken worker, running in the kernel, is descheduled,
          and if there is an idle server, it is woken to process the wake
          detection event;
    - to faciliate low scheduling latencies and cache locality, most
      server/worker interactions described above are performed synchronously
      "on CPU" via WF_CURRENT_CPU flag passed to ttwu; while at the moment
      the context switches are simulated by putting the switch-out task to
      sleep and waking the switch-into task on the same cpu, it is very much
      the long-term goal of this project to make the context switch much
      lighter, by tweaking runtime accounting and, maybe, even bypassing
      __schedule();
    - worker blocking is detected in a hook to sched_submit_work; as mentioned
      above, the server is to be woken on the same CPU, synchronously;
      this code may not pagefault, so to access worker's and server's
      userspace memory (struct umcg_task), memory pages containing the worker's
      and the server's structs umcg_task are pinned when the worker is
      exiting to the userspace, and unpinned when the worker is descheduled;
    - worker wakeup is detected in a hook to sched_update_worker, and processed
      in the exit to usermode loop (via TIF_NOTIFY_RESUME); workers CAN
      pagefault on the wakeup path;
    - worker preemption is implemented by the userspace tagging the worker
      with UMCG_TF_PREEMPTED state flag and sending a NOOP signal to it;
      on the exit to usermode the worker is intercepted and its server is woken
      (see Documentation/userspace-api/umcg.txt for more details);
    - each state change is tagged with a unique timestamp (of MONOTONIC
      variety), so that
        - scheduling instrumentation is naturally available;
        - racing state changes are easily detected and ABA issues are
          avoided;
      see umcg_update_state() in umcg.c for implementation details, and
      Documentation/userspace-api/umcg.txt for a higher-level
      description.
    
    The previous version of the patchset can be found at
    https://lore.kernel.org/all/20211012232522.714898-1-posk@google.com/
    containing some additional context and links to earlier discussions.
    
    More details are available in Documentation/userspace-api/umcg.txt
    in sibling patches, and in doc-comments in the code.
    
    Signed-off-by: Peter Oskolkov <posk@google.com>
    posk-io authored and intel-lab-lkp committed Nov 21, 2021
  5. mm, x86/uaccess: add userspace atomic helpers

    In addition to futexes needing to do atomic operations in the userspace,
    a second use case is now in the works (UMCG, see
    https://lore.kernel.org/all/20210917180323.278250-1-posk@google.com/),
    so a generic facility to perform these operations has been called for
    (see https://lore.kernel.org/all/87ilyk9xc0.ffs@tglx/).
    
    Add a set of generic helpers to perform 32/64-bit xchg and cmpxchg
    operations in the userspace. Also implement the required
    architecture-specific support on x86_64.
    
    Signed-off-by: Peter Oskolkov <posk@google.com>
    posk-io authored and intel-lab-lkp committed Nov 21, 2021
  6. sched/umcg: add WF_CURRENT_CPU and externise ttwu

    Add WF_CURRENT_CPU wake flag that advices the scheduler to
    move the wakee to the current CPU. This is useful for fast on-CPU
    context switching use cases such as UMCG.
    
    In addition, make ttwu external rather than static so that
    the flag could be passed to it from outside of sched/core.c.
    
    Signed-off-by: Peter Oskolkov <posk@google.com>
    posk-io authored and intel-lab-lkp committed Nov 21, 2021

Commits on Nov 17, 2021

  1. psi: Fix PSI_MEM_FULL state when tasks are in memstall and doing reclaim

    We've noticed cases where tasks in a cgroup are stalled on memory but
    there is little memory FULL pressure since tasks stay on the runqueue
    in reclaim.
    
    A simple example involves a single threaded program that keeps leaking
    and touching large amounts of memory. It runs in a cgroup with swap
    enabled, memory.high set at 10M and cpu.max ratio set at 5%. Though
    there is significant CPU pressure and memory SOME, there is barely any
    memory FULL since the task enters reclaim and stays on the runqueue.
    However, this memory-bound task is effectively stalled on memory and
    we expect memory FULL to match memory SOME in this scenario.
    
    The code is confused about memstall && running, thinking there is a
    stalled task and a productive task when there's only one task: a
    reclaimer that's counted as both. To fix this, we redefine the
    condition for PSI_MEM_FULL to check that all running tasks are in an
    active memstall instead of checking that there are no running tasks.
    
            case PSI_MEM_FULL:
    -               return unlikely(tasks[NR_MEMSTALL] && !tasks[NR_RUNNING]);
    +               return unlikely(tasks[NR_MEMSTALL] &&
    +                       tasks[NR_RUNNING] == tasks[NR_MEMSTALL_RUNNING]);
    
    This will capture reclaimers. It will also capture tasks that called
    psi_memstall_enter() and are about to sleep, but this should be
    negligible noise.
    
    Signed-off-by: Brian Chen <brianchen118@gmail.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Link: https://lore.kernel.org/r/20211110213312.310243-1-brianchen118@gmail.com
    brianc118 authored and Peter Zijlstra committed Nov 17, 2021
  2. sched/core: Forced idle accounting

    Adds accounting for "forced idle" time, which is time where a cookie'd
    task forces its SMT sibling to idle, despite the presence of runnable
    tasks.
    
    Forced idle time is one means to measure the cost of enabling core
    scheduling (ie. the capacity lost due to the need to force idle).
    
    Forced idle time is attributed to the thread responsible for causing
    the forced idle.
    
    A few details:
     - Forced idle time is displayed via /proc/PID/sched. It also requires
       that schedstats is enabled.
     - Forced idle is only accounted when a sibling hyperthread is held
       idle despite the presence of runnable tasks. No time is charged if
       a sibling is idle but has no runnable tasks.
     - Tasks with 0 cookie are never charged forced idle.
     - For SMT > 2, we scale the amount of forced idle charged based on the
       number of forced idle siblings. Additionally, we split the time up and
       evenly charge it to all running tasks, as each is equally responsible
       for the forced idle.
    
    Signed-off-by: Josh Don <joshdon@google.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20211018203428.2025792-1-joshdon@google.com
    Josh Don authored and Peter Zijlstra committed Nov 17, 2021
  3. psi: Add a missing SPDX license header

    Add the missing SPDX license header to
    include/linux/psi.h
    include/linux/psi_types.h
    kernel/sched/psi.c
    
    Signed-off-by: Liu Xinpeng <liuxp11@chinatelecom.cn>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Link: https://lore.kernel.org/r/1635133586-84611-2-git-send-email-liuxp11@chinatelecom.cn
    liuxp11 authored and Peter Zijlstra committed Nov 17, 2021
  4. psi: Remove repeated verbose comment

    Comment in function psi_task_switch,there are two same lines.
    ...
    * runtime state, the cgroup that contains both tasks
    * runtime state, the cgroup that contains both tasks
    ...
    
    Signed-off-by: Liu Xinpeng <liuxp11@chinatelecom.cn>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Link: https://lore.kernel.org/r/1635133586-84611-1-git-send-email-liuxp11@chinatelecom.cn
    liuxp11 authored and Peter Zijlstra committed Nov 17, 2021

Commits on Nov 14, 2021

  1. Linux 5.16-rc1

    torvalds committed Nov 14, 2021
  2. kconfig: Add support for -Wimplicit-fallthrough

    Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang.
    
    The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH,
    which is enabled by default.
    
    Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This
    bugfix only appears in Clang 14.0.0, so older versions still contain
    the bug and -Wimplicit-fallthrough won't be enabled for them, for now.
    
    This concludes a long journey and now we are finally getting rid
    of the unintentional fallthrough bug-class in the kernel, entirely. :)
    
    Link: llvm/llvm-project@9ed4a94 [1]
    Link: https://bugs.llvm.org/show_bug.cgi?id=51094 [2]
    Link: KSPP#115
    Link: ClangBuiltLinux#236
    Co-developed-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
    Reviewed-by: Nathan Chancellor <nathan@kernel.org>
    Tested-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    GustavoARSilva authored and torvalds committed Nov 14, 2021
  3. Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/x…

    …fs-linux
    
    Pull xfs cleanups from Darrick Wong:
     "The most 'exciting' aspect of this branch is that the xfsprogs
      maintainer and I have worked through the last of the code
      discrepancies between kernel and userspace libxfs such that there are
      no code differences between the two except for #includes.
    
      IOWs, diff suffices to demonstrate that the userspace tools behave the
      same as the kernel, and kernel-only bits are clearly marked in the
      /kernel/ source code instead of just the userspace source.
    
      Summary:
    
       - Clean up open-coded swap() calls.
    
       - A little bit of #ifdef golf to complete the reunification of the
         kernel and userspace libxfs source code"
    
    * tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
      xfs: sync xfs_btree_split macros with userspace libxfs
      xfs: #ifdef out perag code for userspace
      xfs: use swap() to make dabtree code cleaner
    torvalds committed Nov 14, 2021
  4. Merge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/deller/parisc-linux
    
    Pull more parisc fixes from Helge Deller:
     "Fix a build error in stracktrace.c, fix resolving of addresses to
      function names in backtraces, fix single-stepping in assembly code and
      flush userspace pte's when using set_pte_at()"
    
    * tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
      parisc/entry: fix trace test in syscall exit path
      parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
      parisc: Fix implicit declaration of function '__kernel_text_address'
      parisc: Fix backtrace to always include init funtion names
    torvalds committed Nov 14, 2021
  5. Merge tag 'sh-for-5.16' of git://git.libc.org/linux-sh

    Pull arch/sh updates from Rich Felker.
    
    * tag 'sh-for-5.16' of git://git.libc.org/linux-sh:
      sh: pgtable-3level: Fix cast to pointer from integer of different size
      sh: fix READ/WRITE redefinition warnings
      sh: define __BIG_ENDIAN for math-emu
      sh: math-emu: drop unused functions
      sh: fix kconfig unmet dependency warning for FRAME_POINTER
      sh: Cleanup about SPARSE_IRQ
      sh: kdump: add some attribute to function
      maple: fix wrong return value of maple_bus_init().
      sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/
      sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y
      sh: boards: Fix the cacography in irq.c
      sh: check return code of request_irq
      sh: fix trivial misannotations
    torvalds committed Nov 14, 2021
  6. Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

    Pull ARM fixes from Russell King:
    
     - Fix early_iounmap
    
     - Drop cc-option fallbacks for architecture selection
    
    * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
      ARM: 9156/1: drop cc-option fallbacks for architecture selection
      ARM: 9155/1: fix early early_iounmap()
    torvalds committed Nov 14, 2021
  7. Merge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/s…

    …cm/linux/kernel/git/robh/linux
    
    Pull devicetree fixes from Rob Herring:
    
     - Two fixes due to DT node name changes on Arm, Ltd. boards
    
     - Treewide rename of Ingenic CGU headers
    
     - Update ST email addresses
    
     - Remove Netlogic DT bindings
    
     - Dropping few more cases of redundant 'maxItems' in schemas
    
     - Convert toshiba,tc358767 bridge binding to schema
    
    * tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
      dt-bindings: watchdog: sunxi: fix error in schema
      bindings: media: venus: Drop redundant maxItems for power-domain-names
      dt-bindings: Remove Netlogic bindings
      clk: versatile: clk-icst: Ensure clock names are unique
      of: Support using 'mask' in making device bus id
      dt-bindings: treewide: Update @st.com email address to @foss.st.com
      dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml
      dt-bindings: media: Update maintainers for st,stm32-cec.yaml
      dt-bindings: mfd: timers: Update maintainers for st,stm32-timers
      dt-bindings: timer: Update maintainers for st,stm32-timer
      dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz
      dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml
      dt-bindings: Rename Ingenic CGU headers to ingenic,*.h
    torvalds committed Nov 14, 2021
  8. Merge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull timer fix from Thomas Gleixner:
     "A single fix for POSIX CPU timers to address a problem where POSIX CPU
      timer delivery stops working for a new child task because
      copy_process() copies state information which is only valid for the
      parent task"
    
    * tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
    torvalds committed Nov 14, 2021
  9. Merge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/lin…

    …ux/kernel/git/tip/tip
    
    Pull irq fixes from Thomas Gleixner:
     "A set of fixes for the interrupt subsystem
    
      Core code:
    
       - A regression fix for the Open Firmware interrupt mapping code where
         a interrupt controller property in a node caused a map property in
         the same node to be ignored.
    
      Interrupt chip drivers:
    
       - Workaround a limitation in SiFive PLIC interrupt chip which
         silently ignores an EOI when the interrupt line is masked.
    
       - Provide the missing mask/unmask implementation for the CSKY MP
         interrupt controller.
    
      PCI/MSI:
    
       - Prevent a use after free when PCI/MSI interrupts are released by
         destroying the sysfs entries before freeing the memory which is
         accessed in the sysfs show() function.
    
       - Implement a mask quirk for the Nvidia ION AHCI chip which does not
         advertise masking capability despite implementing it. Even worse
         the chip comes out of reset with all MSI entries masked, which due
         to the missing masking capability never get unmasked.
    
       - Move the check which prevents accessing the MSI[X] masking for XEN
         back into the low level accessors. The recent consolidation missed
         that these accessors can be invoked from places which do not have
         that check which broke XEN. Move them back to he original place
         instead of sprinkling tons of these checks all over the code"
    
    * tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      of/irq: Don't ignore interrupt-controller when interrupt-map failed
      irqchip/sifive-plic: Fixup EOI failed when masked
      irqchip/csky-mpintc: Fixup mask/unmask implementation
      PCI/MSI: Destroy sysfs before freeing entries
      PCI: Add MSI masking quirk for Nvidia ION AHCI
      PCI/MSI: Deal with devices lying about their MSI mask capability
      PCI/MSI: Move non-mask check back into low level accessors
    torvalds committed Nov 14, 2021
  10. Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull x86 static call update from Thomas Gleixner:
     "A single fix for static calls to make the trampoline patching more
      robust by placing explicit signature bytes after the call trampoline
      to prevent patching random other jumps like the CFI jump table
      entries"
    
    * tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      static_call,x86: Robustify trampoline patching
    torvalds committed Nov 14, 2021
  11. Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/sc…

    …m/linux/kernel/git/tip/tip
    
    Pull scheduler fixes from Borislav Petkov:
    
     - Avoid touching ~100 config files in order to be able to select the
       preemption model
    
     - clear cluster CPU masks too, on the CPU unplug path
    
     - prevent use-after-free in cfs
    
     - Prevent a race condition when updating CPU cache domains
    
     - Factor out common shared part of smp_prepare_cpus() into a common
       helper which can be called by both baremetal and Xen, in order to fix
       a booting of Xen PV guests
    
    * tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      preempt: Restore preemption model selection configs
      arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
      sched/fair: Prevent dead task groups from regaining cfs_rq's
      sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
      x86/smp: Factor out parts of native_smp_prepare_cpus()
    torvalds committed Nov 14, 2021
  12. Merge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull perf fixes from Borislav Petkov:
    
     - Prevent unintentional page sharing by checking whether a page
       reference to a PMU samples page has been acquired properly before
       that
    
     - Make sure the LBR_SELECT MSR is saved/restored too
    
     - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any
       residual data left
    
    * tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf/core: Avoid put_page() when GUP fails
      perf/x86/vlbr: Add c->flags to vlbr event constraints
      perf/x86/lbr: Reset LBR_SELECT during vlbr reset
    torvalds committed Nov 14, 2021
  13. Merge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull x86 fixes from Borislav Petkov:
    
     - Add the model number of a new, Raptor Lake CPU, to intel-family.h
    
     - Do not log spurious corrected MCEs on SKL too, due to an erratum
    
     - Clarify the path of paravirt ops patches upstream
    
     - Add an optimization to avoid writing out AMX components to sigframes
       when former are in init state
    
    * tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/cpu: Add Raptor Lake to Intel family
      x86/mce: Add errata workaround for Skylake SKX37
      MAINTAINERS: Add some information to PARAVIRT_OPS entry
      x86/fpu: Optimize out sigframe xfeatures when in init state
    torvalds committed Nov 14, 2021
  14. Merge tag 'perf-tools-for-v5.16-2021-11-13' of git://git.kernel.org/p…

    …ub/scm/linux/kernel/git/acme/linux
    
    Pull more perf tools updates from Arnaldo Carvalho de Melo:
     "Hardware tracing:
    
       - ARM:
          * Print the size of the buffer size consistently in hexadecimal in
            ARM Coresight.
          * Add Coresight snapshot mode support.
          * Update --switch-events docs in 'perf record'.
          * Support hardware-based PID tracing.
          * Track task context switch for cpu-mode events.
    
       - Vendor events:
          * Add metric events JSON file for power10 platform
    
      perf test:
    
       - Get 'perf test' unit tests closer to kunit.
    
       - Topology tests improvements.
    
       - Remove bashisms from some tests.
    
      perf bench:
    
       - Fix memory leak of perf_cpu_map__new() in the futex benchmarks.
    
      libbpf:
    
       - Add some more weak libbpf functions o allow building with the
         libbpf versions, old ones, present in distros.
    
      libbeauty:
    
       - Translate [gs]setsockopt 'level' argument integer values to
         strings.
    
      tools headers UAPI:
    
       - Sync futex_waitv, arch prctl, sound, i195_drm and msr-index files
         with the kernel sources.
    
      Documentation:
    
       - Add documentation to 'struct symbol'.
    
       - Synchronize the definition of enum perf_hw_id with code in
         tools/perf/design.txt"
    
    * tag 'perf-tools-for-v5.16-2021-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (67 commits)
      perf tests: Remove bash constructs from stat_all_pmu.sh
      perf tests: Remove bash construct from record+zstd_comp_decomp.sh
      perf test: Remove bash construct from stat_bpf_counters.sh test
      perf bench futex: Fix memory leak of perf_cpu_map__new()
      tools arch x86: Sync the msr-index.h copy with the kernel sources
      tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
      tools headers UAPI: Sync sound/asound.h with the kernel sources
      tools headers UAPI: Sync linux/prctl.h with the kernel sources
      tools headers UAPI: Sync arch prctl headers with the kernel sources
      perf tools: Add more weak libbpf functions
      perf bpf: Avoid memory leak from perf_env__insert_btf()
      perf symbols: Factor out annotation init/exit
      perf symbols: Bit pack to save a byte
      perf symbols: Add documentation to 'struct symbol'
      tools headers UAPI: Sync files changed by new futex_waitv syscall
      perf test bpf: Use ARRAY_CHECK() instead of ad-hoc equivalent, addressing array_size.cocci warning
      perf arm-spe: Support hardware-based PID tracing
      perf arm-spe: Save context ID in record
      perf arm-spe: Update --switch-events docs in 'perf record'
      perf arm-spe: Track task context switch for cpu-mode events
      ...
    torvalds committed Nov 14, 2021
  15. Merge tag 'irqchip-fixes-5.16-1' of git://git.kernel.org/pub/scm/linu…

    …x/kernel/git/maz/arm-platforms into irq/urgent
    
    Pull irqchip fixes from Marc Zyngier:
    
      - Address an issue with the SiFive PLIC being unable to EOI
        a masked interrupt
    
      - Move the disable/enable methods in the CSky mpintc to
        mask/unmask
    
      - Fix a regression in the OF irq code where an interrupt-controller
        property in the same node as an interrupt-map property would get
        ignored
    
    Link: https://lore.kernel.org/all/20211112173459.4015233-1-maz@kernel.org
    Thomas Gleixner committed Nov 14, 2021

Commits on Nov 13, 2021

  1. Merge tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux

    Pull zstd update from Nick Terrell:
     "Update to zstd-1.4.10.
    
      Add myself as the maintainer of zstd and update the zstd version in
      the kernel, which is now 4 years out of date, to a much more recent
      zstd release. This includes bug fixes, much more extensive fuzzing,
      and performance improvements. And generates the kernel zstd
      automatically from upstream zstd, so it is easier to keep the zstd
      verison up to date, and we don't fall so far out of date again.
    
      This includes 5 commits that update the zstd library version:
    
       - Adds a new kernel-style wrapper around zstd.
    
         This wrapper API is functionally equivalent to the subset of the
         current zstd API that is currently used. The wrapper API changes to
         be kernel style so that the symbols don't collide with zstd's
         symbols. The update to zstd-1.4.10 maintains the same API and
         preserves the semantics, so that none of the callers need to be
         updated. All callers are updated in the commit, because there are
         zero functional changes.
    
       - Adds an indirection for `lib/decompress_unzstd.c` so it doesn't
         depend on the layout of `lib/zstd/` to include every source file.
         This allows the next patch to be automatically generated.
    
       - Imports the zstd-1.4.10 source code. This commit is automatically
         generated from upstream zstd (https://github.com/facebook/zstd).
    
       - Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`.
    
       - Fixes a newly added build warning for clang.
    
      The discussion around this patchset has been pretty long, so I've
      included a FAQ-style summary of the history of the patchset, and why
      we are taking this approach.
    
      Why do we need to update?
      -------------------------
    
      The zstd version in the kernel is based off of zstd-1.3.1, which is
      was released August 20, 2017. Since then zstd has seen many bug fixes
      and performance improvements. And, importantly, upstream zstd is
      continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to
      older versions. So the only way to sanely get these fixes is to keep
      up to date with upstream zstd.
    
      There are no known security issues that affect the kernel, but we need
      to be able to update in case there are. And while there are no known
      security issues, there are relevant bug fixes. For example the problem
      with large kernel decompression has been fixed upstream for over 2
      years [1]
    
      Additionally the performance improvements for kernel use cases are
      significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz:
    
       - BtrFS zstd compression at levels 1 and 3 is 5% faster
    
       - BtrFS zstd decompression+read is 15% faster
    
       - SquashFS zstd decompression+read is 15% faster
    
       - F2FS zstd compression+write at level 3 is 8% faster
    
       - F2FS zstd decompression+read is 20% faster
    
       - ZRAM decompression+read is 30% faster
    
       - Kernel zstd decompression is 35% faster
    
       - Initramfs zstd decompression+build is 5% faster
    
      On top of this, there are significant performance improvements coming
      down the line in the next zstd release, and the new automated update
      patch generation will allow us to pull them easily.
    
      How is the update patch generated?
      ----------------------------------
    
      The first two patches are preparation for updating the zstd version.
      Then the 3rd patch in the series imports upstream zstd into the
      kernel. This patch is automatically generated from upstream. A script
      makes the necessary changes and imports it into the kernel. The
      changes are:
    
       - Replace all libc dependencies with kernel replacements and rewrite
         includes.
    
       - Remove unncessary portability macros like: #if defined(_MSC_VER).
    
       - Use the kernel xxhash instead of bundling it.
    
      This automation gets tested every commit by upstream's continuous
      integration. When we cut a new zstd release, we will submit a patch to
      the kernel to update the zstd version in the kernel.
    
      The automated process makes it easy to keep the kernel version of zstd
      up to date. The current zstd in the kernel shares the guts of the
      code, but has a lot of API and minor changes to work in the kernel.
      This is because at the time upstream zstd was not ready to be used in
      the kernel envrionment as-is. But, since then upstream zstd has
      evolved to support being used in the kernel as-is.
    
      Why are we updating in one big patch?
      -------------------------------------
    
      The 3rd patch in the series is very large. This is because it is
      restructuring the code, so it both deletes the existing zstd, and
      re-adds the new structure. Future updates will be directly
      proportional to the changes in upstream zstd since the last import.
      They will admittidly be large, as zstd is an actively developed
      project, and has hundreds of commits between every release. However,
      there is no other great alternative.
    
      One option ruled out is to replay every upstream zstd commit. This is
      not feasible for several reasons:
    
       - There are over 3500 upstream commits since the zstd version in the
         kernel.
    
       - The automation to automatically generate the kernel update was only
         added recently, so older commits cannot easily be imported.
    
       - Not every upstream zstd commit builds.
    
       - Only zstd releases are "supported", and individual commits may have
         bugs that were fixed before a release.
    
      Another option to reduce the patch size would be to first reorganize
      to the new file structure, and then apply the patch. However, the
      current kernel zstd is formatted with clang-format to be more
      "kernel-like". But, the new method imports zstd as-is, without
      additional formatting, to allow for closer correlation with upstream,
      and easier debugging. So the patch wouldn't be any smaller.
    
      It also doesn't make sense to import upstream zstd commit by commit
      going forward. Upstream zstd doesn't support production use cases
      running of the development branch. We have a lot of post-commit
      fuzzing that catches many bugs, so indiviudal commits may be buggy,
      but fixed before a release. So going forward, I intend to import every
      (important) zstd release into the Kernel.
    
      So, while it isn't ideal, updating in one big patch is the only patch
      I see forward.
    
      Who is responsible for this code?
      ---------------------------------
    
      I am. This patchset adds me as the maintainer for zstd. Previously,
      there was no tree for zstd patches. Because of that, there were
      several patches that either got ignored, or took a long time to merge,
      since it wasn't clear which tree should pick them up. I'm officially
      stepping up as maintainer, and setting up my tree as the path through
      which zstd patches get merged. I'll make sure that patches to the
      kernel zstd get ported upstream, so they aren't erased when the next
      version update happens.
    
      How is this code tested?
      ------------------------
    
      I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS,
      Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and
      aarch64. I checked both performance and correctness.
    
      Also, thanks to many people in the community who have tested these
      patches locally.
    
      Lastly, this code will bake in linux-next before being merged into
      v5.16.
    
      Why update to zstd-1.4.10 when zstd-1.5.0 has been released?
      ------------------------------------------------------------
    
      This patchset has been outstanding since 2020, and zstd-1.4.10 was the
      latest release when it was created. Since the update patch is
      automatically generated from upstream, I could generate it from
      zstd-1.5.0.
    
      However, there were some large stack usage regressions in zstd-1.5.0,
      and are only fixed in the latest development branch. And the latest
      development branch contains some new code that needs to bake in the
      fuzzer before I would feel comfortable releasing to the kernel.
    
      Once this patchset has been merged, and we've released zstd-1.5.1, we
      can update the kernel to zstd-1.5.1, and exercise the update process.
    
      You may notice that zstd-1.4.10 doesn't exist upstream. This release
      is an artifical release based off of zstd-1.4.9, with some fixes for
      the kernel backported from the development branch. I will tag the
      zstd-1.4.10 release after this patchset is merged, so the Linux Kernel
      is running a known version of zstd that can be debugged upstream.
    
      Why was a wrapper API added?
      ----------------------------
    
      The first versions of this patchset migrated the kernel to the
      upstream zstd API. It first added a shim API that supported the new
      upstream API with the old code, then updated callers to use the new
      shim API, then transitioned to the new code and deleted the shim API.
      However, Cristoph Hellwig suggested that we transition to a kernel
      style API, and hide zstd's upstream API behind that. This is because
      zstd's upstream API is supports many other use cases, and does not
      follow the kernel style guide, while the kernel API is focused on the
      kernel's use cases, and follows the kernel style guide.
    
      Where is the previous discussion?
      ---------------------------------
    
      Links for the discussions of the previous versions of the patch set
      below. The largest changes in the design of the patchset are driven by
      the discussions in v11, v5, and v1. Sorry for the mix of links, I
      couldn't find most of the the threads on lkml.org"
    
    Link: https://lkml.org/lkml/2020/9/29/27 [1]
    Link: https://www.spinics.net/lists/linux-crypto/msg58189.html [v12]
    Link: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/ [v11]
    Link: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/ [v10]
    Link: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/ [v9]
    Link: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/ [v8]
    Link: https://lkml.org/lkml/2020/12/3/1195 [v7]
    Link: https://lkml.org/lkml/2020/12/2/1245 [v6]
    Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v5]
    Link: https://www.spinics.net/lists/linux-btrfs/msg105783.html [v4]
    Link: https://lkml.org/lkml/2020/9/23/1074 [v3]
    Link: https://www.spinics.net/lists/linux-btrfs/msg105505.html [v2]
    Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v1]
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    Tested By: Paul Jones <paul@pauljones.id.au>
    Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
    Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
    Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
    
    * tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux:
      lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical
      MAINTAINERS: Add maintainer entry for zstd
      lib: zstd: Upgrade to latest upstream zstd version 1.4.10
      lib: zstd: Add decompress_sources.h for decompress_unzstd
      lib: zstd: Add kernel-specific API
    torvalds committed Nov 13, 2021
  2. Merge tag 'virtio-mem-for-5.16' of git://github.com/davidhildenbrand/…

    …linux
    
    Pull virtio-mem update from David Hildenbrand:
     "Support the VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE feature in virtio-mem,
      now that "accidential" access to logically unplugged memory inside
      added Linux memory blocks is no longer possible, because we:
    
       - Removed /dev/kmem in commit bbcd53c ("drivers/char: remove
         /dev/kmem for good")
    
       - Disallowed access to virtio-mem device memory via /dev/mem in
         commit 2128f4e ("virtio-mem: disallow mapping virtio-mem memory
         via /dev/mem")
    
       - Sanitized access to virtio-mem device memory via /proc/kcore in
         commit 0daa322 ("fs/proc/kcore: don't read offline sections,
         logically offline pages and hwpoisoned pages")
    
       - Sanitized access to virtio-mem device memory via /proc/vmcore in
         commit ce28146 ("virtio-mem: kdump mode to sanitize
         /proc/vmcore access")
    
      The new VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE feature that will be
      required by some hypervisors implementing virtio-mem in the near
      future, so let's support it now that we safely can"
    
    * tag 'virtio-mem-for-5.16' of git://github.com/davidhildenbrand/linux:
      virtio-mem: support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
    torvalds committed Nov 13, 2021
  3. perf tests: Remove bash constructs from stat_all_pmu.sh

    The tests were passing but without testing and were printing the
    following:
    
      $ ./perf test -v 90
      90: perf all PMU test                                               :
      --- start ---
      test child forked, pid 51650
      Testing cpu/branch-instructions/
      ./tests/shell/stat_all_pmu.sh: 10: [:
       Performance counter stats for 'true':
    
                 137,307      cpu/branch-instructions/
    
             0.001686672 seconds time elapsed
    
             0.001376000 seconds user
             0.000000000 seconds sys: unexpected operator
    
    Changing the regexes to a grep works in sh and prints this:
    
      $ ./perf test -v 90
      90: perf all PMU test                                               :
      --- start ---
      test child forked, pid 60186
      [...]
      Testing tlb_flush.stlb_any
      test child finished with 0
      ---- end ----
      perf all PMU test: Ok
    
    Signed-off-by: James Clark <james.clark@arm.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Florian Fainelli <f.fainelli@gmail.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: John Fastabend <john.fastabend@gmail.com>
    Cc: KP Singh <kpsingh@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Martin KaFai Lau <kafai@fb.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Song Liu <songliubraving@fb.com>
    Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
    Cc: Thomas Richter <tmricht@linux.ibm.com>
    Cc: Yonghong Song <yhs@fb.com>
    Cc: bpf@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Link: https://lore.kernel.org/r/20211028134828.65774-4-james.clark@arm.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    James-A-Clark authored and Arnaldo Carvalho de Melo committed Nov 13, 2021
  4. perf tests: Remove bash construct from record+zstd_comp_decomp.sh

    Commit 463538a ("perf tests: Fix test 68 zstd compression for
    s390") inadvertently removed the -g flag from all platforms rather than
    just s390, because the [[ ]] construct fails in sh. Changing to single
    brackets restores testing of call graphs and removes the following error
    from the output:
    
      $ ./perf test -v 85
      85: Zstd perf.data compression/decompression                        :
      --- start ---
      test child forked, pid 50643
      Collecting compressed record file:
      ./tests/shell/record+zstd_comp_decomp.sh: 15: [[: not found
    
    Fixes: 463538a ("perf tests: Fix test 68 zstd compression for s390")
    Signed-off-by: James Clark <james.clark@arm.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Florian Fainelli <f.fainelli@gmail.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: John Fastabend <john.fastabend@gmail.com>
    Cc: KP Singh <kpsingh@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Martin KaFai Lau <kafai@fb.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Song Liu <songliubraving@fb.com>
    Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
    Cc: Thomas Richter <tmricht@linux.ibm.com>
    Cc: Yonghong Song <yhs@fb.com>
    Cc: bpf@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Link: https://lore.kernel.org/r/20211028134828.65774-3-james.clark@arm.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    James-A-Clark authored and Arnaldo Carvalho de Melo committed Nov 13, 2021
  5. perf test: Remove bash construct from stat_bpf_counters.sh test

    Currently the test skips with an error because == only works in bash:
    
      $ ./perf test 91 -v
      Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
      91: perf stat --bpf-counters test                                   :
      --- start ---
      test child forked, pid 44586
      ./tests/shell/stat_bpf_counters.sh: 26: [: -v: unexpected operator
      test child finished with -2
      ---- end ----
      perf stat --bpf-counters test: Skip
    
    Changing == to = does the same thing, but doesn't result in an error:
    
      ./perf test 91 -v
      Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
      91: perf stat --bpf-counters test                                   :
      --- start ---
      test child forked, pid 45833
      Skipping: --bpf-counters not supported
        Error: unknown option `bpf-counters'
      [...]
      test child finished with -2
      ---- end ----
      perf stat --bpf-counters test: Skip
    
    Signed-off-by: James Clark <james.clark@arm.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Florian Fainelli <f.fainelli@gmail.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: John Fastabend <john.fastabend@gmail.com>
    Cc: KP Singh <kpsingh@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Martin KaFai Lau <kafai@fb.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Song Liu <songliubraving@fb.com>
    Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
    Cc: Thomas Richter <tmricht@linux.ibm.com>
    Cc: Yonghong Song <yhs@fb.com>
    Cc: bpf@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Link: https://lore.kernel.org/r/20211028134828.65774-2-james.clark@arm.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    James-A-Clark authored and Arnaldo Carvalho de Melo committed Nov 13, 2021
  6. perf bench futex: Fix memory leak of perf_cpu_map__new()

    ASan reports memory leaks while running:
    
      $ sudo ./perf bench futex all
    
    The leaks are caused by perf_cpu_map__new not being freed.
    This patch adds the missing perf_cpu_map__put since it calls
    cpu_map_delete implicitly.
    
    Fixes: 9c3516d ("libperf: Add perf_cpu_map__new()/perf_cpu_map__read() functions")
    Signed-off-by: Sohaib Mohamed <sohaib.amhmd@gmail.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: André Almeida <andrealmeid@collabora.com>
    Cc: Darren Hart <dvhart@infradead.org>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lore.kernel.org/lkml/20211112201134.77892-1-sohaib.amhmd@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    smalinux authored and Arnaldo Carvalho de Melo committed Nov 13, 2021
  7. tools arch x86: Sync the msr-index.h copy with the kernel sources

    To pick up the changes in:
    
      dae1bd5 ("x86/msr-index: Add MSRs for XFD")
    
    Addressing these tools/perf build warnings:
    
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
    
    That makes the beautification scripts to pick some new entries:
    
      $ diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      --- tools/arch/x86/include/asm/msr-index.h	2021-07-15 16:17:01.819817827 -0300
      +++ arch/x86/include/asm/msr-index.h	2021-11-06 15:49:33.738517311 -0300
      @@ -625,6 +625,8 @@
    
       #define MSR_IA32_BNDCFGS_RSVD		0x00000ffc
    
      +#define MSR_IA32_XFD			0x000001c4
      +#define MSR_IA32_XFD_ERR		0x000001c5
       #define MSR_IA32_XSS			0x00000da0
    
       #define MSR_IA32_APICBASE		0x0000001b
      $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > /tmp/before
      $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
      $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > /tmp/after
      $ diff -u /tmp/before /tmp/after
      --- /tmp/before	2021-11-13 11:10:39.964201505 -0300
      +++ /tmp/after	2021-11-13 11:10:47.902410873 -0300
      @@ -93,6 +93,8 @@
       	[0x000001b0] = "IA32_ENERGY_PERF_BIAS",
       	[0x000001b1] = "IA32_PACKAGE_THERM_STATUS",
       	[0x000001b2] = "IA32_PACKAGE_THERM_INTERRUPT",
      +	[0x000001c4] = "IA32_XFD",
      +	[0x000001c5] = "IA32_XFD_ERR",
       	[0x000001c8] = "LBR_SELECT",
       	[0x000001c9] = "LBR_TOS",
       	[0x000001d9] = "IA32_DEBUGCTLMSR",
      $
    
    And this gets rebuilt:
    
      CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
      INSTALL  trace_plugins
      LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
      LD       /tmp/build/perf/trace/beauty/perf-in.o
      LD       /tmp/build/perf/perf-in.o
      LINK     /tmp/build/perf/perf
    
    Now one can trace systemwide asking to see backtraces to where those
    MSRs are being read/written with:
    
      # perf trace -e msr:*_msr/max-stack=32/ --filter="msr==IA32_XFD || msr==IA32_XFD_ERR"
      ^C#
      #
    
    If we use -v (verbose mode) we can see what it does behind the scenes:
    
      # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_XFD || msr==IA32_XFD_ERR"
      <SNIP>
      New filter for msr:read_msr: (msr==0x1c4 || msr==0x1c5) && (common_pid != 4448951 && common_pid != 8781)
      New filter for msr:write_msr: (msr==0x1c4 || msr==0x1c5) && (common_pid != 4448951 && common_pid != 8781)
      <SNIP>
      ^C#
    
    Example with a frequent msr:
    
      # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
      Using CPUID AuthenticAMD-25-21-0
      0x48
      New filter for msr:read_msr: (msr==0x48) && (common_pid != 3738351 && common_pid != 3564)
      0x48
      New filter for msr:write_msr: (msr==0x48) && (common_pid != 3738351 && common_pid != 3564)
      mmap size 528384B
      Looking at the vmlinux_path (8 entries long)
      symsrc__init: build id mismatch for vmlinux.
      Using /proc/kcore for kernel data
      Using /proc/kallsyms for symbols
           0.000 pipewire/2479 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             __switch_to_xtra ([kernel.kallsyms])
                                             __switch_to ([kernel.kallsyms])
                                             __schedule ([kernel.kallsyms])
                                             schedule ([kernel.kallsyms])
                                             schedule_hrtimeout_range_clock ([kernel.kallsyms])
                                             do_epoll_wait ([kernel.kallsyms])
                                             __x64_sys_epoll_wait ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                             epoll_wait (/usr/lib64/libc-2.33.so)
                                             [0x76c4] (/usr/lib64/spa-0.2/support/libspa-support.so)
                                             [0x4cf0] (/usr/lib64/spa-0.2/support/libspa-support.so)
           0.027 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             __switch_to_xtra ([kernel.kallsyms])
                                             __switch_to ([kernel.kallsyms])
                                             __schedule ([kernel.kallsyms])
                                             schedule_idle ([kernel.kallsyms])
                                             do_idle ([kernel.kallsyms])
                                             cpu_startup_entry ([kernel.kallsyms])
                                             start_kernel ([kernel.kallsyms])
                                             secondary_startup_64_no_verify ([kernel.kallsyms])
      #
    
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Chang S. Bae <chang.seok.bae@intel.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Link: https://lore.kernel.org/lkml/YY%2FJdb6on7swsn+C@kernel.org/
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Arnaldo Carvalho de Melo committed Nov 13, 2021
  8. tools headers UAPI: Sync drm/i915_drm.h with the kernel sources

    To pick up the changes in:
    
      e5e3217 ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
      9409eb3 ("drm/i915: Expose logical engine instance to user")
      ea673f1 ("drm/i915/uapi: Add comment clarifying purpose of I915_TILING_* values")
      d3ac8d4 ("drm/i915/pxp: interfaces for using protected objects")
      cbbd376 ("drm/i915/pxp: Create the arbitrary session after boot")
    
    That don't add any new ioctl, so no changes in tooling.
    
    This silences this perf build warning:
    
      Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
      diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
    
    Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Cc: Huang, Sean Z <sean.z.huang@intel.com>
    Cc: John Harrison <John.C.Harrison@Intel.com>
    Cc: Matthew Brost <matthew.brost@intel.com>
    Cc: Matt Roper <matthew.d.roper@intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Arnaldo Carvalho de Melo committed Nov 13, 2021
  9. tools headers UAPI: Sync sound/asound.h with the kernel sources

    To pick up the changes in:
    
      5aec579 ("ALSA: uapi: Fix a C++ style comment in asound.h")
    
    That is just changing a // style comment to /* */.
    
    This silences this perf build warning:
    
      Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
      diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h
    
    Cc: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Arnaldo Carvalho de Melo committed Nov 13, 2021
  10. tools headers UAPI: Sync linux/prctl.h with the kernel sources

    To pick the changes in:
    
      61bc346 ("uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument")
    
    That don't result in any changes in tooling:
    
      $ tools/perf/trace/beauty/prctl_option.sh > before
      $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
      $ tools/perf/trace/beauty/prctl_option.sh > after
      $ diff -u before after
      $
    
    Just silences this perf tools build warning:
    
      Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
      diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h
    
    Cc: Christian Brauner <christian.brauner@ubuntu.com>
    Cc: Eugene Syromiatnikov <esyr@redhat.com>
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Arnaldo Carvalho de Melo committed Nov 13, 2021
Older