Skip to content
Permalink
Stephane-Erani…
Switch branches/tags

Commits on Feb 8, 2022

  1. perf report: add addr_from/addr_to sort dimensions

    With the existing symbol_from/symbol_to, branches captured in the same
    function would be collapsed into a single function if the latencies associated
    with the each branch (cycles) were all the same.  That is the case on Intel
    Broadwell, for instance. Since Intel Skylake, the latency is captured by
    hardware and therefore is used to disambiguate branches.
    
    Add addr_from/addr_to sort dimensions to sort branches based on their
    addresses and not the function there are in. The output is still the function
    name but the offset within the function is provided to uniquely identify each
    branch.  These new sort dimensions also help with annotate because they create
    different entries in the histogram which, in turn, generates proper branch
    annotations.
    
    Here is an example using AMD's branch sampling:
    
    $ perf record -a -b -c 1000037 -e cpu/branch-brs/ test_prg
    
    $ perf report
    Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
    Overhead  Command          Source Shared Object  Source Symbol                                   Target Symbol                                   Basic Block Cycle
      99.65%  test_prg	   test_prg              [.] test_thread                                 [.] test_thread                                 -
       0.02%  test_prg         [kernel.vmlinux]      [k] asm_sysvec_apic_timer_interrupt             [k] error_entry                                 -
    
    $ perf report -F overhead,comm,dso,addr_from,addr_to
    Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
    Overhead  Command          Shared Object     Source Address          Target Address
       4.22%  test_prg         test_prg          [.] test_thread+0x3c    [.] test_thread+0x4
       4.13%  test_prg         test_prg          [.] test_thread+0x4     [.] test_thread+0x3a
       4.09%  test_prg         test_prg          [.] test_thread+0x3a    [.] test_thread+0x6
       4.08%  test_prg         test_prg          [.] test_thread+0x2     [.] test_thread+0x3c
       4.06%  test_prg         test_prg          [.] test_thread+0x3e    [.] test_thread+0x2
       3.87%  test_prg         test_prg          [.] test_thread+0x6     [.] test_thread+0x38
       3.84%  test_prg         test_prg          [.] test_thread         [.] test_thread+0x3e
       3.76%  test_prg         test_prg          [.] test_thread+0x1e    [.] test_thread
       3.76%  test_prg         test_prg          [.] test_thread+0x38    [.] test_thread+0x8
       3.56%  test_prg         test_prg          [.] test_thread+0x22    [.] test_thread+0x1e
       3.54%  test_prg         test_prg          [.] test_thread+0x8     [.] test_thread+0x36
       3.47%  test_prg         test_prg          [.] test_thread+0x1c    [.] test_thread+0x22
       3.45%  test_prg         test_prg          [.] test_thread+0x36    [.] test_thread+0xa
       3.28%  test_prg         test_prg          [.] test_thread+0x24    [.] test_thread+0x1c
       3.25%  test_prg         test_prg          [.] test_thread+0xa     [.] test_thread+0x34
       3.24%  test_prg         test_prg          [.] test_thread+0x1a    [.] test_thread+0x24
       3.20%  test_prg         test_prg          [.] test_thread+0x34    [.] test_thread+0xc
       3.04%  test_prg         test_prg          [.] test_thread+0x26    [.] test_thread+0x1a
       3.01%  test_prg         test_prg          [.] test_thread+0xc     [.] test_thread+0x32
       2.98%  test_prg         test_prg          [.] test_thread+0x18    [.] test_thread+0x26
       2.94%  test_prg         test_prg          [.] test_thread+0x32    [.] test_thread+0xe
       2.76%  test_prg         test_prg          [.] test_thread+0x28    [.] test_thread+0x18
       2.73%  test_prg         test_prg          [.] test_thread+0xe     [.] test_thread+0x30
       2.67%  test_prg         test_prg          [.] test_thread+0x30    [.] test_thread+0x10
       2.67%  test_prg         test_prg          [.] test_thread+0x16    [.] test_thread+0x28
       2.46%  test_prg         test_prg          [.] test_thread+0x10    [.] test_thread+0x2e
       2.44%  test_prg         test_prg          [.] test_thread+0x2a    [.] test_thread+0x16
       2.38%  test_prg         test_prg          [.] test_thread+0x14    [.] test_thread+0x2a
       2.32%  test_prg         test_prg          [.] test_thread+0x2e    [.] test_thread+0x12
       2.28%  test_prg         test_prg          [.] test_thread+0x12    [.] test_thread+0x2c
       2.16%  test_prg         test_prg          [.] test_thread+0x2c    [.] test_thread+0x14
       0.02%  test_prg         [kernel.vmlinux]  [k] asm_sysvec_apic_ti+0x5  [k] error_entry
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  2. perf tools: Improve error handling of AMD Branch Sampling

    Improve the error message printed by perf when perf_event_open() fails on
    AMD Zen3 when using the branch sampling feature. In the case of EINVAL, there
    are two main reasons: frequency mode or period is smaller than the depth of
    the branch sampling buffer (16). The patch checks the parameters of the call
    and tries to print a relevant message to explain the error:
    
    $ perf record -b -e cpu/branch-brs/ -c 10 ls
    Error:
    AMD Branch Sampling does not support sampling period smaller than what is reported in /sys/devices/cpu/caps/branches.
    
    $ perf record -b -e cpu/branch-brs/ ls
    Error:
    AMD Branch Sampling does not support frequency mode sampling, must pass a fixed sampling period via -c option or cpu/branch-brs,period=xxxx/.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    [Rebased on commit 9fe8895 ("perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings")]
    Signed-off-by: Kim Phillips <kim.phillips@amd.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  3. perf tools: Improve IBS error handling

    improve the error message returned on failed perf_event_open() on AMD when
    using IBS.
    
    Output of executing 'perf record -e ibs_op// true' BEFORE this patch:
    
    The sys_perf_event_open() syscall returned with 22 (Invalid argument)for event (ibs_op//u).
    /bin/dmesg | grep -i perf may provide additional information.
    
    Output after:
    
    AMD IBS cannot exclude kernel events.  Try running at a higher privilege level.
    
    Output of executing 'sudo perf record -e ibs_op// true' BEFORE this patch:
    
    Error:
    The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (ibs_op//).
    /bin/dmesg | grep -i perf may provide additional information.
    
    Output after:
    
    Error:
    AMD IBS may only be available in system-wide/per-cpu mode.  Try using -a, or -C and workload affinity
    
    Signed-off-by: Kim Phillips <kim.phillips@amd.com>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Joao Martins <joao.m.martins@oracle.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Michael Petlan <mpetlan@redhat.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Robert Richter <robert.richter@amd.com>
    Cc: Stephane Eranian <eranian@google.com>
    kimphillamd authored and intel-lab-lkp committed Feb 8, 2022
  4. perf/x86/amd: add idle hooks for branch sampling

    On AMD Fam19h Zen3, the branch sampling (BRS) feature must be disabled before
    entering low power and re-enabled (if was active) when returning from low
    power. Otherwise, the NMI interrupt may be held up for too long and cause
    problems. Stopping BRS will cause the NMI to be delivered if it was held up.
    
    Define a perf_amd_brs_lopwr_cb() callback to stop/restart BRS.  The callback
    is protected by a jump label which is enabled only when AMD BRS is detected.
    In all other cases, the callback is never called.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  5. ACPI: add perf low power callback

    Add an optional callback needed by some PMU features, e.g., AMD
    BRS, to give a chance to the perf_events code to change its state before
    a CPU goes to low power and after it comes back.
    
    The callback is void when the PERF_NEEDS_LOPWR_CB flag is not set.
    This flag must be set in arch specific perf_event.h header whenever needed.
    When not set, there is no impact on the ACPI code.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  6. perf/x86/amd: make Zen3 branch sampling opt-in

    Add a kernel config option CONFIG_PERF_EVENTS_AMD_BRS
    to make the support for AMD Zen3 Branch Sampling (BRS) an opt-in
    compile time option.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  7. perf/x86/amd: add AMD branch sampling period adjustment

    Add code to adjust the sampling event period when used with the Branch
    Sampling feature (BRS). Given the depth of the BRS (16), the period is
    reduced by that depth such that in the best case scenario, BRS saturates at
    the desired sampling period. In practice, though, the processor may execute
    more branches. Given a desired period P and a depth D, the kernel programs
    the actual period at P - D. After P occurrences of the sampling event, the
    counter overflows. It then may take X branches (skid) before the NMI is
    caught and held by the hardware and BRS activates. Then, after D branches,
    BRS saturates and the NMI is delivered.  With no skid, the effective period
    would be (P - D) + D = P. In practice, however, it will likely be (P - D) +
    X + D. There is no way to eliminate X or predict X.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  8. perf/x86/amd: enable branch sampling priv level filtering

    The AMD Branch Sampling features does not provide hardware filtering by
    privilege level. The associated PMU counter does but not the branch sampling
    by itself. Given how BRS operates there is a possibility that BRS captures
    kernel level branches even though the event is programmed to count only at
    the user level.
    
    Implement a workaround in software by removing the branches which belong to
    the wrong privilege level. The privilege level is evaluated on the target of
    the branch and not the source so as to be compatible with other architectures.
    As a consequence of this patch, the number of entries in the
    PERF_RECORD_BRANCH_STACK buffer may be less than the maximum (16).  It could
    even be zero. Another consequence is that consecutive entries in the branch
    stack may not reflect actual code path and may have discontinuities, in case
    kernel branches were suppressed. But this is no different than what happens
    on other architectures.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  9. perf/x86/amd: add branch-brs helper event for Fam19h BRS

    Add a pseudo event called branch-brs to help use the FAM Fam19h
    Branch Sampling feature (BRS). BRS samples taken branches, so it is best used
    when sampling on a retired taken branch event (0xc4) which is what BRS
    captures.  Instead of trying to remember the event code or actual event name,
    users can simply do:
    
    $ perf record -b -e cpu/branch-brs/ -c 1000037 .....
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  10. perf/x86/amd: add AMD Fam19h Branch Sampling support

    Add support for the AMD Fam19h 16-deep branch sampling feature as
    described in the AMD PPR Fam19h Model 01h Revision B1.  This is a model
    specific extension. It is not an architected AMD feature.
    
    The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR
    registers. There is no branch type filtering. All control flow changes are
    captured. BRS relies on specific programming of the core PMU of Fam19h.  In
    particular, the following requirements must be met:
     - the sampling period be greater than 16 (BRS depth)
     - the sampling period must use a fixed and not frequency mode
    
    BRS interacts with the NMI interrupt as well. Because enabling BRS is
    expensive, it is only activated after P event occurrences, where P is the
    desired sampling period.  At P occurrences of the event, the counter
    overflows, the CPU catches the interrupt, activates BRS for 16 branches until
    it saturates, and then delivers the NMI to the kernel.  Between the overflow
    and the time BRS activates more branches may be executed skewing the period.
    All along, the sampling event keeps counting. The skid may be attenuated by
    reducing the sampling period by 16 (subsequent patch).
    
    BRS is integrated into perf_events seamlessly via the same
    PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry
    records in the sampling buffer. No prediction information is supported. The
    branches are stored in reverse order of execution.  The most recent branch is
    the first entry in each record.
    
    No modification to the perf tool is necessary.
    
    BRS can be used with any sampling event. However, it is recommended to use
    the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS
    captures.
    
    $ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test
    
    $ perf report -D
    56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
    ... branch stack: nr:16
    .....  0: 0000000000401d24 -> 0000000000401d5a 0 cycles      0
    .....  1: 0000000000401d5c -> 0000000000401d24 0 cycles      0
    .....  2: 0000000000401d22 -> 0000000000401d5c 0 cycles      0
    .....  3: 0000000000401d5e -> 0000000000401d22 0 cycles      0
    .....  4: 0000000000401d20 -> 0000000000401d5e 0 cycles      0
    .....  5: 0000000000401d3e -> 0000000000401d20 0 cycles      0
    .....  6: 0000000000401d42 -> 0000000000401d3e 0 cycles      0
    .....  7: 0000000000401d3c -> 0000000000401d42 0 cycles      0
    .....  8: 0000000000401d44 -> 0000000000401d3c 0 cycles      0
    .....  9: 0000000000401d3a -> 0000000000401d44 0 cycles      0
    ..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles      0
    ..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles      0
    ..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles      0
    ..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles      0
    ..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles      0
    ..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles      0
     ... thread: test:18230
     ...... dso: test
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  11. x86/cpufeatures: add AMD Fam19h Branch Sampling feature

    Add a cpu feature for AMD Fam19h Branch Sampling feature as bit
    31 of EBX on CPUID leaf function 0x80000008.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022
  12. perf/core: add perf_clear_branch_entry_bitfields() helper

    Make it simpler to reset all the info fields on the
    perf_branch_entry by adding a helper inline function.
    
    The goal is to centralize the initialization to avoid missing
    a field in case more are added.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Stephane Eranian authored and intel-lab-lkp committed Feb 8, 2022

Commits on Feb 2, 2022

  1. perf/x86/intel: Increase max number of the fixed counters

    The new PEBS format 5 implies that the number of the fixed counters can
    be up to 16. The current INTEL_PMC_MAX_FIXED is still 4. If the current
    kernel runs on a future platform which has more than 4 fixed counters,
    a warning will be triggered. The number of the fixed counters will be
    clipped to 4. Users have to upgrade the kernel to access the new fixed
    counters.
    
    Add a new default constraint for PerfMon v5 and up, which can support
    up to 16 fixed counters. The pseudo-encoding is applied for the fixed
    counters 4 and later. The user can have generic support for the new
    fixed counters on the future platfroms without updating the kernel.
    
    Increase the INTEL_PMC_MAX_FIXED to 16.
    
    Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Andi Kleen <ak@linux.intel.com>
    Link: https://lkml.kernel.org/r/1643750603-100733-3-git-send-email-kan.liang@linux.intel.com
    Kan Liang authored and Peter Zijlstra committed Feb 2, 2022
  2. KVM: x86: use the KVM side max supported fixed counter

    KVM vPMU doesn't support to emulate all the fixed counters that the
    host PMU driver has supported, e.g. the fixed counter 3 used by
    Topdown metrics hasn't been supported by KVM so far.
    
    Rename MAX_FIXED_COUNTERS to KVM_PMC_MAX_FIXED to have a more
    straightforward naming convention as INTEL_PMC_MAX_FIXED used by the
    host PMU driver, and fix vPMU to use the KVM side KVM_PMC_MAX_FIXED
    for the virtual fixed counter emulation, instead of the host side
    INTEL_PMC_MAX_FIXED.
    
    Signed-off-by: Wei Wang <wei.w.wang@intel.com>
    Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/1643750603-100733-2-git-send-email-kan.liang@linux.intel.com
    wei-w-wang authored and Peter Zijlstra committed Feb 2, 2022
  3. perf/x86/intel: Enable PEBS format 5

    The new PEBS Record Format 5 is similar to the PEBS Record Format 4. The
    only difference is the layout of the Counter Reset fields of the PEBS
    Config Buffer in the DS area. For the PEBS format 4, the Counter Reset
    fields allocation is for 8 general-purpose counters followed by 4
    fixed-function counters. For the PEBS format 5, the Counter Reset fields
    allocation is for 32 general-purpose counters followed by 16
    fixed-function counters.
    
    Extend the MAX_PEBS_EVENTS to 32. Add MAX_PEBS_EVENTS_FMT4 for the
    previous platform. Except for the DS auto-reload code, other places
    already assume 32 counters. Only check the PEBS_FMT in the DS
    auto-reload code.
    
    Extend the MAX_FIXED_PEBS_EVENTS to 16, which only impacts the size of
    struct debug_store and some local temporary variables. The size of
    struct debug_store increases 288B, which is small and should be
    acceptable.
    
    Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/1643750603-100733-1-git-send-email-kan.liang@linux.intel.com
    Kan Liang authored and Peter Zijlstra committed Feb 2, 2022
  4. perf/core: Allow kernel address filter when not filtering the kernel

    The so-called 'kernel' address filter can also be useful for filtering
    fixed addresses in user space.  Allow that.
    
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20220131072453.2839535-6-adrian.hunter@intel.com
    ahunter6 authored and Peter Zijlstra committed Feb 2, 2022
  5. perf/x86/intel/pt: Fix address filter config for 32-bit kernel

    Change from shifting 'unsigned long' to 'u64' to prevent the config bits
    being lost on a 32-bit kernel.
    
    Fixes: eadf48c ("perf/x86/intel/pt: Add support for address range filtering in PT")
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20220131072453.2839535-5-adrian.hunter@intel.com
    ahunter6 authored and Peter Zijlstra committed Feb 2, 2022
  6. perf/core: Fix address filter parser for multiple filters

    Reset appropriate variables in the parser loop between parsing separate
    filters, so that they do not interfere with parsing the next filter.
    
    Fixes: 375637b ("perf/core: Introduce address range filtering")
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20220131072453.2839535-4-adrian.hunter@intel.com
    ahunter6 authored and Peter Zijlstra committed Feb 2, 2022
  7. x86: Share definition of __is_canonical_address()

    Reduce code duplication by moving canonical address code to a common header
    file.
    
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20220131072453.2839535-3-adrian.hunter@intel.com
    ahunter6 authored and Peter Zijlstra committed Feb 2, 2022
  8. perf/x86/intel/pt: Relax address filter validation

    The requirement for 64-bit address filters is that they are canonical
    addresses. In other respects any address range is allowed which would
    include user space addresses.
    
    That can be useful for tracing virtual machine guests because address
    filtering can be used to advantage in place of current privilege level
    (CPL) filtering.
    
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20220131072453.2839535-2-adrian.hunter@intel.com
    ahunter6 authored and Peter Zijlstra committed Feb 2, 2022

Commits on Jan 30, 2022

  1. Linux 5.17-rc2

    torvalds committed Jan 30, 2022
  2. Merge tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/s…

    …cm/linux/kernel/git/tip/tip
    
    Pull irq fixes from Borislav Petkov:
    
     - Drop an unused private data field in the AIC driver
    
     - Various fixes to the realtek-rtl driver
    
     - Make the GICv3 ITS driver compile again in !SMP configurations
    
     - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec
    
     - Yet another kfree/bitmap_free conversion
    
     - Various DT updates (Renesas, SiFive)
    
    * tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples
      dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts
      dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support
      irqchip/gic-v3-its: Reset each ITS's BASERn register before probe
      irqchip/gic-v3-its: Fix build for !SMP
      irqchip/loongson-pch-ms: Use bitmap_free() to free bitmap
      irqchip/realtek-rtl: Service all pending interrupts
      irqchip/realtek-rtl: Fix off-by-one in routing
      irqchip/realtek-rtl: Map control data to virq
      irqchip/apple-aic: Drop unused ipi_hwirq field
    torvalds committed Jan 30, 2022
  3. Merge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/…

    …scm/linux/kernel/git/tip/tip
    
    Pull perf fixes from Borislav Petkov:
    
     - Prevent accesses to the per-CPU cgroup context list from another CPU
       except the one it belongs to, to avoid list corruption
    
     - Make sure parent events are always woken up to avoid indefinite hangs
       in the traced workload
    
    * tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf/core: Fix cgroup event list management
      perf: Always wake the parent event
    torvalds committed Jan 30, 2022
  4. Merge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub…

    …/scm/linux/kernel/git/tip/tip
    
    Pull scheduler fix from Borislav Petkov:
     "Make sure the membarrier-rseq fence commands are part of the reported
      set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY"
    
    * tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
    torvalds committed Jan 30, 2022
  5. Merge tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull x86 fixes from Borislav Petkov:
    
     - Add another Intel CPU model to the list of CPUs supporting the
       processor inventory unique number
    
     - Allow writing to MCE thresholding sysfs files again - a previous
       change had accidentally disabled it and no one noticed. Goes to show
       how much is this stuff used
    
    * tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
      x86/MCE/AMD: Allow thresholding interface updates after init
    torvalds committed Jan 30, 2022
  6. Merge branch 'akpm' (patches from Andrew)

    Merge misc fixes from Andrew Morton:
     "12 patches.
    
      Subsystems affected by this patch series: sysctl, binfmt, ia64, mm
      (memory-failure, folios, kasan, and psi), selftests, and ocfs2"
    
    * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
      ocfs2: fix a deadlock when commit trans
      jbd2: export jbd2_journal_[grab|put]_journal_head
      psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
      psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
      mm, kasan: use compare-exchange operation to set KASAN page tag
      kasan: test: fix compatibility with FORTIFY_SOURCE
      tools/testing/scatterlist: add missing defines
      mm: page->mapping folio->mapping should have the same offset
      memory-failure: fetch compound_head after pgmap_pfn_valid()
      ia64: make IA64_MCA_RECOVERY bool instead of tristate
      binfmt_misc: fix crash when load/unload module
      include/linux/sysctl.h: fix register_sysctl_mount_point() return type
    torvalds committed Jan 30, 2022
  7. ocfs2: fix a deadlock when commit trans

    commit 6f1b228 introduces a regression which can deadlock as
    follows:
    
      Task1:                              Task2:
      jbd2_journal_commit_transaction     ocfs2_test_bg_bit_allocatable
      spin_lock(&jh->b_state_lock)        jbd_lock_bh_journal_head
      __jbd2_journal_remove_checkpoint    spin_lock(&jh->b_state_lock)
      jbd2_journal_put_journal_head
      jbd_lock_bh_journal_head
    
    Task1 and Task2 lock bh->b_state and jh->b_state_lock in different
    order, which finally result in a deadlock.
    
    So use jbd2_journal_[grab|put]_journal_head instead in
    ocfs2_test_bg_bit_allocatable() to fix it.
    
    Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.com
    Fixes: 6f1b228 ("ocfs2: fix race between searching chunks and release journal_head from buffer_head")
    Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
    Tested-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
    Reported-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Gang He <ghe@suse.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    josephhz authored and torvalds committed Jan 30, 2022
  8. jbd2: export jbd2_journal_[grab|put]_journal_head

    Patch series "ocfs2: fix a deadlock case".
    
    This fixes a deadlock case in ocfs2.  We firstly export jbd2 symbols
    jbd2_journal_[grab|put]_journal_head as preparation and later use them
    in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the
    deadlock.
    
    This patch (of 2):
    
    This exports symbols jbd2_journal_[grab|put]_journal_head, which will be
    used outside modules, e.g.  ocfs2.
    
    Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com
    Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Gang He <ghe@suse.com>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
    Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    josephhz authored and torvalds committed Jan 30, 2022
  9. psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n

    When CONFIG_PROC_FS is disabled psi code generates the following
    warnings:
    
      kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=]
          1364 | static const struct proc_ops psi_cpu_proc_ops = {
               |                              ^~~~~~~~~~~~~~~~
      kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=]
          1355 | static const struct proc_ops psi_memory_proc_ops = {
               |                              ^~~~~~~~~~~~~~~~~~~
      kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=]
          1346 | static const struct proc_ops psi_io_proc_ops = {
               |                              ^~~~~~~~~~~~~~~
    
    Make definitions of these structures and related functions conditional
    on CONFIG_PROC_FS config.
    
    Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com
    Fixes: 0e94682 ("psi: introduce psi monitor")
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Reported-by: kernel test robot <lkp@intel.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    surenbaghdasaryan authored and torvalds committed Jan 30, 2022
  10. psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n

    When CONFIG_CGROUPS is disabled psi code generates the following
    warnings:
    
      kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes]
          1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group,
               |                     ^~~~~~~~~~~~~~~~~~
      kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes]
          1182 | void psi_trigger_destroy(struct psi_trigger *t)
               |      ^~~~~~~~~~~~~~~~~~~
      kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes]
          1249 | __poll_t psi_trigger_poll(void **trigger_ptr,
               |          ^~~~~~~~~~~~~~~~
    
    Change the declarations of these functions in the header to provide the
    prototypes even when they are unused.
    
    Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com
    Fixes: 0e94682 ("psi: introduce psi monitor")
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Reported-by: kernel test robot <lkp@intel.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    surenbaghdasaryan authored and torvalds committed Jan 30, 2022
  11. mm, kasan: use compare-exchange operation to set KASAN page tag

    It has been reported that the tag setting operation on newly-allocated
    pages can cause the page flags to be corrupted when performed
    concurrently with other flag updates as a result of the use of
    non-atomic operations.
    
    Fix the problem by using a compare-exchange loop to update the tag.
    
    Link: https://lkml.kernel.org/r/20220120020148.1632253-1-pcc@google.com
    Link: https://linux-review.googlesource.com/id/I456b24a2b9067d93968d43b4bb3351c0cec63101
    Fixes: 2813b9c ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
    Signed-off-by: Peter Collingbourne <pcc@google.com>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    pcc authored and torvalds committed Jan 30, 2022
  12. kasan: test: fix compatibility with FORTIFY_SOURCE

    With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
    dynamic checks using __builtin_object_size(ptr), which when failed will
    panic the kernel.
    
    Because the KASAN test deliberately performs out-of-bounds operations,
    the kernel panics with FORTIFY_SOURCE, for example:
    
     | kernel BUG at lib/string_helpers.c:910!
     | invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
     | CPU: 1 PID: 137 Comm: kunit_try_catch Tainted: G    B             5.16.0-rc3+ #3
     | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
     | RIP: 0010:fortify_panic+0x19/0x1b
     | ...
     | Call Trace:
     |  kmalloc_oob_in_memset.cold+0x16/0x16
     |  ...
    
    Fix it by also hiding `ptr` from the optimizer, which will ensure that
    __builtin_object_size() does not return a valid size, preventing
    fortified string functions from panicking.
    
    Link: https://lkml.kernel.org/r/20220124160744.1244685-1-elver@google.com
    Signed-off-by: Marco Elver <elver@google.com>
    Reported-by: Nico Pache <npache@redhat.com>
    Reviewed-by: Nico Pache <npache@redhat.com>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Brendan Higgins <brendanhiggins@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    melver authored and torvalds committed Jan 30, 2022
  13. tools/testing/scatterlist: add missing defines

    The cited commits replaced preemptible with pagefault_disabled and
    flush_kernel_dcache_page with flush_dcache_page respectively, hence need
    to update the corresponding defines in the test.
    
      scatterlist.c: In function ‘sg_miter_stop’:
      scatterlist.c:919:4: warning: implicit declaration of function ‘flush_dcache_page’ [-Wimplicit-function-declaration]
          flush_dcache_page(miter->page);
          ^~~~~~~~~~~~~~~~~
      In file included from linux/scatterlist.h:8:0,
                       from scatterlist.c:9:
      scatterlist.c:922:18: warning: implicit declaration of function ‘pagefault_disabled’ [-Wimplicit-function-declaration]
          WARN_ON_ONCE(!pagefault_disabled());
                        ^
      linux/mm.h:23:25: note: in definition of macro ‘WARN_ON_ONCE’
        int __ret_warn_on = !!(condition);                      \
                               ^~~~~~~~~
    
    Link: https://lkml.kernel.org/r/20220118082105.1737320-1-maorg@nvidia.com
    Fixes: 723aca2 ("mm/scatterlist: replace the !preemptible warning in sg_miter_stop()")
    Fixes: 0e84f5d ("scatterlist: replace flush_kernel_dcache_page with flush_dcache_page")
    Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    maorgottlieb authored and torvalds committed Jan 30, 2022
  14. mm: page->mapping folio->mapping should have the same offset

    As with the other members of folio, the offset of page->mapping and
    folio->mapping must be the same.  The compile-time check was
    inadvertently removed during development.  Add it back.
    
    [willy@infradead.org: changelog redo]
    
    Link: https://lkml.kernel.org/r/20220104011734.21714-1-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jan 30, 2022
  15. memory-failure: fetch compound_head after pgmap_pfn_valid()

    memory_failure_dev_pagemap() at the moment assumes base pages (e.g.
    dax_lock_page()).  For devmap with compound pages fetch the
    compound_head in case a tail page memory failure is being handled.
    
    Currently this is a nop, but in the advent of compound pages in
    dev_pagemap it allows memory_failure_dev_pagemap() to keep working.
    
    Without this fix memory-failure handling (i.e.  MCEs on pmem) with
    device-dax configured namespaces will regress (and crash).
    
    Link: https://lkml.kernel.org/r/20211202204422.26777-2-joao.m.martins@oracle.com
    Reported-by: Jane Chu <jane.chu@oracle.com>
    Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
    Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    jpemartins authored and torvalds committed Jan 30, 2022
Older