Skip to content
Permalink
Nicholas-Piggi…
Switch branches/tags

Commits on Nov 10, 2021

  1. powerpc/64s: Get LPID bit width from device tree

    Allow the LPID bit width and partition table size to be set at runtime
    from the device tree.
    
    Move the PID bit width detection into the same place.
    
    KVM does not support using different sizes yet, this is mainly required
    to get the PTCR register values correct.
    
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    npiggin authored and intel-lab-lkp committed Nov 10, 2021

Commits on Nov 1, 2021

  1. powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST

    Until now, all tests involving CONFIG_STRICT_KERNEL_RWX were done with
    DEBUG_RODATA_TEST to check the result. But now that
    CONFIG_STRICT_KERNEL_RWX is selected by default, it came without
    CONFIG_DEBUG_RODATA_TEST and led to the following Oops
    
    [    6.830908] Freeing unused kernel image (initmem) memory: 352K
    [    6.840077] BUG: Unable to handle kernel data access on write at 0xc1285200
    [    6.846836] Faulting instruction address: 0xc0004b6c
    [    6.851745] Oops: Kernel access of bad area, sig: 11 [#1]
    [    6.857075] BE PAGE_SIZE=16K PREEMPT CMPC885
    [    6.861348] SAF3000 DIE NOTIFICATION
    [    6.864830] CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0-rc5-s3k-dev-02255-g2747d7b7916f torvalds#451
    [    6.873429] NIP:  c0004b6c LR: c0004b60 CTR: 00000000
    [    6.878419] REGS: c902be60 TRAP: 0300   Not tainted  (5.15.0-rc5-s3k-dev-02255-g2747d7b7916f)
    [    6.886852] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 53000335  XER: 8000ff40
    [    6.893564] DAR: c1285200 DSISR: 82000000
    [    6.893564] GPR00: 0c000000 c902bf20 c20f4000 08000000 00000001 04001f00 c1800000 00000035
    [    6.893564] GPR08: ff0001ff c1280000 00000002 c0004b60 00001000 00000000 c0004b1c 00000000
    [    6.893564] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [    6.893564] GPR24: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c1060000
    [    6.932034] NIP [c0004b6c] kernel_init+0x50/0x138
    [    6.936682] LR [c0004b60] kernel_init+0x44/0x138
    [    6.941245] Call Trace:
    [    6.943653] [c902bf20] [c0004b60] kernel_init+0x44/0x138 (unreliable)
    [    6.950022] [c902bf30] [c001122c] ret_from_kernel_thread+0x5c/0x64
    [    6.956135] Instruction dump:
    [    6.959060] 48ffc521 48045469 4800d8cd 3d20c086 89295fa0 2c090000 41820058 480796c9
    [    6.966890] 4800e48d 3d20c128 39400002 3fe0c106 <91495200> 3bff8000 4806fa1d 481f7d75
    [    6.974902] ---[ end trace 1e397bacba4aa610 ]---
    
    0xc1285200 corresponds to 'system_state' global var that the kernel is trying to set to
    SYSTEM_RUNNING. This var is above the RO/RW limit so it shouldn't Oops.
    
    It oopses because the dirty bit is missing.
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/3d5800b0bbcd7b19761b98f50421358667b45331.1635520232.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Nov 1, 2021

Commits on Oct 29, 2021

  1. powerpc/32e: Ignore ESR in instruction storage interrupt handler

    A e5500 machine running a 32-bit kernel sometimes hangs at boot,
    seemingly going into an infinite loop of instruction storage interrupts.
    
    The ESR (Exception Syndrome Register) has a value of 0x800000 (store)
    when this happens, which is likely set by a previous store. An
    instruction TLB miss interrupt would then leave ESR unchanged, and if no
    PTE exists it calls directly to the instruction storage interrupt
    handler without changing ESR.
    
    access_error() does not cause a segfault due to a store to a read-only
    vma because is_exec is true. Most subsequent fault handling does not
    check for a write fault on a read-only vma, and might do strange things
    like create a writeable PTE or call page_mkwrite on a read only vma or
    file. It's not clear what happens here to cause the infinite faulting in
    this case, a fault handler failure or low level PTE or TLB handling.
    
    In any case this can be fixed by having the instruction storage
    interrupt zero regs->dsisr rather than storing the ESR value to it.
    
    Fixes: a01a3f2 ("powerpc: remove arguments from fault handler functions")
    Cc: stable@vger.kernel.org # v5.12+
    Reported-by: Jacques de Laval <jacques.delaval@protonmail.com>
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Tested-by: Jacques de Laval <jacques.delaval@protonmail.com>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211028133043.4159501-1-npiggin@gmail.com
    npiggin authored and mpe committed Oct 29, 2021
  2. powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module …

    …unload
    
    Commit 587164c, introduced new opal message type (OPAL_MSG_PRD2) and
    added opal notifier. But I missed to unregister the notifier during
    module unload path. This results in below call trace if you try to
    unload and load opal_prd module.
    
    Also add new notifier_block for OPAL_MSG_PRD2 message.
    
    Sample calltrace (modprobe -r opal_prd; modprobe opal_prd)
      BUG: Unable to handle kernel data access on read at 0xc0080000192200e0
      Faulting instruction address: 0xc00000000018d1cc
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
      CPU: 66 PID: 7446 Comm: modprobe Kdump: loaded Tainted: G            E     5.14.0prd torvalds#759
      NIP:  c00000000018d1cc LR: c00000000018d2a8 CTR: c0000000000cde10
      REGS: c0000003c4c0f0a0 TRAP: 0300   Tainted: G            E      (5.14.0prd)
      MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24224824  XER: 20040000
      CFAR: c00000000018d2a4 DAR: c0080000192200e0 DSISR: 40000000 IRQMASK: 1
      ...
      NIP notifier_chain_register+0x2c/0xc0
      LR  atomic_notifier_chain_register+0x48/0x80
      Call Trace:
        0xc000000002090610 (unreliable)
        atomic_notifier_chain_register+0x58/0x80
        opal_message_notifier_register+0x7c/0x1e0
        opal_prd_probe+0x84/0x150 [opal_prd]
        platform_probe+0x78/0x130
        really_probe+0x110/0x5d0
        __driver_probe_device+0x17c/0x230
        driver_probe_device+0x60/0x130
        __driver_attach+0xfc/0x220
        bus_for_each_dev+0xa8/0x130
        driver_attach+0x34/0x50
        bus_add_driver+0x1b0/0x300
        driver_register+0x98/0x1a0
        __platform_driver_register+0x38/0x50
        opal_prd_driver_init+0x34/0x50 [opal_prd]
        do_one_initcall+0x60/0x2d0
        do_init_module+0x7c/0x320
        load_module+0x3394/0x3650
        __do_sys_finit_module+0xd4/0x160
        system_call_exception+0x140/0x290
        system_call_common+0xf4/0x258
    
    Fixes: 587164c ("powerpc/powernv: Add new opal message type")
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211028165716.41300-1-hegdevasant@linux.vnet.ibm.com
    Vasant Hegde authored and mpe committed Oct 29, 2021
  3. powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEB…

    …UG_PAGEALLOC
    
    When ARCH_SUPPORTS_DEBUG_PAGEALLOC is not selected, the user can
    still select CONFIG_DEBUG_PAGEALLOC in which case __kernel_map_pages()
    is provided by mm/page_poison.c
    
    So only define __kernel_map_pages() when both
    CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC
    are defined.
    
    Fixes: 68b44f9 ("powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/971b69739ff4746252e711a9845210465c023a9e.1635425947.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 29, 2021

Commits on Oct 28, 2021

  1. Merge branch 'topic/ppc-kvm' into next

    Merge a couple of KVM ppc patches we are keeping in a topic branch.
    mpe committed Oct 28, 2021
  2. MAINTAINERS: Update powerpc KVM entry

    Paul is no longer handling patches for kvmppc.
    
    Instead we'll treat them as regular powerpc patches, taking them via the
    powerpc tree, using the topic/ppc-kvm branch when necessary.
    
    Also drop the web reference, it doesn't have any information
    specifically relevant to powerpc KVM.
    
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Acked-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211027061646.540708-1-mpe@ellerman.id.au
    mpe committed Oct 28, 2021
  3. powerpc/xmon: fix task state output

    p_state is unsigned since the commit 2f064a5
    
    The patch also uses TASK_RUNNING instead of null.
    
    Fixes: 2f064a5 ("sched: Change task_struct::state")
    Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211026133108.7113-1-kda@linux-powerpc.org
    Denis Kirjanov authored and mpe committed Oct 28, 2021
  4. powerpc/44x/fsp2: add missing of_node_put

    Early exits from for_each_compatible_node() should decrement the
    node reference counter.  Reported by Coccinelle:
    
    ./arch/powerpc/platforms/44x/fsp2.c:206:1-25: WARNING: Function
    "for_each_compatible_node" should have of_node_put() before return
    around line 218.
    
    Fixes: 7813043 ("powerpc/44x/fsp2: Add irq error handlers")
    Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/1635406102-88719-1-git-send-email-cuibixuan@linux.alibaba.com
    Bixuan Cui authored and mpe committed Oct 28, 2021
  5. powerpc/dcr: Use cmplwi instead of 3-argument cmpli

    In dcr-low.S we use cmpli with three arguments, instead of four
    arguments as defined in the ISA:
    
    	cmpli	cr0,r3,1024
    
    This appears to be a PPC440-ism, looking at the "PPC440x5 CPU Core
    User’s Manual" it shows cmpli having no L field, but implied to be 0 due
    to the core being 32-bit. It mentions that the ISA defines four
    arguments and recommends using cmplwi.
    
    It also corresponds to the old POWER instruction set, which had no L
    field there, a reserved bit instead.
    
    dcr-low.S is only built 32-bit, because it is only built when
    DCR_NATIVE=y, which is only selected by 40x and 44x. Looking at the
    generated code (with gcc/gas) we see cmplwi as expected.
    
    Although gas is happy with the 3-argument version when building for
    32-bit, the LLVM assembler is not and errors out with:
    
      arch/powerpc/sysdev/dcr-low.S:27:10: error: invalid operand for instruction
       cmpli 0,%r3,1024; ...
               ^
    
    Switch to the cmplwi extended opcode, which avoids any confusion when
    reading the ISA, fixes the issue with the LLVM assembler, and also means
    the code could be built 64-bit in future (though that's very unlikely).
    
    Reported-by: Nick Desaulniers <ndesaulniers@google.com>
    Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    BugLink: ClangBuiltLinux#1419
    Link: https://lore.kernel.org/r/20211014024424.528848-1-mpe@ellerman.id.au
    mpe committed Oct 28, 2021
  6. KVM: PPC: Tick accounting should defer vtime accounting 'til after IR…

    …Q handling
    
    Commit 1126652 ("KVM: PPC: Book3S HV: Context tracking exit guest
    context before enabling irqs") moved guest_exit() into the interrupt
    protected area to avoid wrong context warning (or worse). The problem is
    that tick-based time accounting has not yet been updated at this point
    (because it depends on the timer interrupt firing), so the guest time
    gets incorrectly accounted to system time.
    
    To fix the problem, follow the x86 fix in commit 1604571 ("Defer
    vtime accounting 'til after IRQ handling"), and allow host IRQs to run
    before accounting the guest exit time.
    
    In the case vtime accounting is enabled, this is not required because TB
    is used directly for accounting.
    
    Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a
    guest running a kernel compile, the 'guest' fields of /proc/stat are
    stuck at zero. With the patch they can be observed increasing roughly as
    expected.
    
    Fixes: e233d54 ("KVM: booke: use __kvm_guest_exit")
    Fixes: 1126652 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs")
    Cc: stable@vger.kernel.org # 5.12+
    Signed-off-by: Laurent Vivier <lvivier@redhat.com>
    [np: only required for tick accounting, add Book3E fix, tweak changelog]
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211027142150.3711582-1-npiggin@gmail.com
    vivier authored and mpe committed Oct 28, 2021
  7. powerpc/security: Use a mutex for interrupt exit code patching

    The mitigation-patching.sh script in the powerpc selftests toggles
    all mitigations on and off simultaneously, revealing that rfi_flush
    and stf_barrier cannot safely operate at the same time due to races
    in updating the static key.
    
    On some systems, the static key code throws a warning and the kernel
    remains functional.  On others, the kernel will hang or crash.
    
    Fix this by slapping on a mutex.
    
    Fixes: 1379974 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
    Cc: stable@vger.kernel.org # v5.14+
    Signed-off-by: Russell Currey <ruscur@russell.cc>
    Acked-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211027072410.40950-1-ruscur@russell.cc
    ruscur authored and mpe committed Oct 28, 2021

Commits on Oct 27, 2021

  1. powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return void

    Up to now mcu_gpiochip_remove() returns zero unconditionally. Make it
    return void instead which makes it easier to see in the callers that
    there is no error to handle.
    
    Also the return value of i2c remove callbacks is ignored anyway.
    
    Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211021105657.72572-1-u.kleine-koenig@pengutronix.de
    ukleinek authored and mpe committed Oct 27, 2021
  2. powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs

    Building tqm8541_defconfig results in:
    
    	arch/powerpc/mm/nohash/fsl_book3e.c: In function 'settlbcam':
    	arch/powerpc/mm/nohash/fsl_book3e.c:126:40: error: '_PAGE_BAP_SX' undeclared (first use in this function)
    	  126 |         TLBCAM[index].MAS3 |= (flags & _PAGE_BAP_SX) ? MAS3_SX : 0;
    	      |                                        ^~~~~~~~~~~~
    	arch/powerpc/mm/nohash/fsl_book3e.c:126:40: note: each undeclared identifier is reported only once for each function it appears in
    	make[3]: *** [scripts/Makefile.build:277: arch/powerpc/mm/nohash/fsl_book3e.o] Error 1
    	make[2]: *** [scripts/Makefile.build:540: arch/powerpc/mm/nohash] Error 2
    	make[1]: *** [scripts/Makefile.build:540: arch/powerpc/mm] Error 2
    	make: *** [Makefile:1868: arch/powerpc] Error 2
    
    This is because _PAGE_BAP_SX is not defined when using 32 bits PTE.
    
    Now that _PAGE_EXEC contains both _PAGE_BAP_SX and _PAGE_BAP_UX, it can be used instead.
    
    Fixes: 01116e6 ("powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/91a0235e7f2a85308b84aa5b9efd8d022e2b899a.1635226743.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 27, 2021
  3. powerpc/book3e: Fix set_memory_x() and set_memory_nx()

    set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC.
    set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC.
    
    Book3e has 2 bits, UX and SX, which defines the exec rights
    resp. for user (PR=1) and for kernel (PR=0).
    
    _PAGE_EXEC is defined as UX only.
    
    An executable kernel page is set with either _PAGE_KERNEL_RWX
    or _PAGE_KERNEL_ROX, which both have SX set and UX cleared.
    
    So set_memory_nx() call for an executable kernel page does
    nothing because UX is already cleared.
    
    And set_memory_x() on a non-executable kernel page makes it
    executable for the user and keeps it non-executable for kernel.
    
    Also, pte_exec() always returns 'false' on kernel pages, because
    it checks _PAGE_EXEC which doesn't include SX, so for instance
    the W+X check doesn't work.
    
    To fix this:
      - change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER
      - sets both UX and SX in _PAGE_EXEC so that pte_exec() returns
        true whenever one of the two bits is set and pte_exprotect()
        clears both bits.
      - Define a book3e specific version of pte_mkexec() which sets
        either SX or UX based on UR.
    
    Fixes: 1f9ad21 ("powerpc/mm: Implement set_memory() routines")
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/c41100f9c144dc5b62e5a751b810190c6b5d42fd.1635226743.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 27, 2021
  4. powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()

    Commit 26973fa ("powerpc/mm: use pte helpers in generic code")
    changed those two functions to use pte helpers to determine which
    bits to clear and which bits to set.
    
    This change was based on the assumption that bits to be set/cleared
    are always the same and can be determined by applying the pte
    manipulation helpers on __pte(0).
    
    But on platforms like book3e, the bits depend on whether the page
    is a user page or not.
    
    For the time being it more or less works because of _PAGE_EXEC being
    used for user pages only and exec right being set at all time on
    kernel page. But following patch will clean that and output of
    pte_mkexec() will depend on the page being a user or kernel page.
    
    Instead of trying to make an even more complicated helper where bits
    would become dependent on the final pte value, come back to a more
    static situation like before commit 26973fa ("powerpc/mm: use
    pte helpers in generic code"), by introducing an 8xx specific
    version of __ptep_set_access_flags() and ptep_set_wrprotect().
    
    Fixes: 26973fa ("powerpc/mm: use pte helpers in generic code")
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/922bdab3a220781bae2360ff3dd5adb7fe4d34f1.1635226743.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 27, 2021
  5. powerpc/bpf: Fix write protecting JIT code

    Running program with bpf-to-bpf function calls results in data access
    exception (0x300) with the below call trace:
    
      bpf_int_jit_compile+0x238/0x750 (unreliable)
      bpf_check+0x2008/0x2710
      bpf_prog_load+0xb00/0x13a0
      __sys_bpf+0x6f4/0x27c0
      sys_bpf+0x2c/0x40
      system_call_exception+0x164/0x330
      system_call_vectored_common+0xe8/0x278
    
    as bpf_int_jit_compile() tries writing to write protected JIT code
    location during the extra pass.
    
    Fix it by holding off write protection of JIT code until the extra
    pass, where branch target addresses fixup happens.
    
    Fixes: 62e3d42 ("powerpc/bpf: Write protect JIT code")
    Cc: stable@vger.kernel.org # v5.14+
    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
    Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211025055649.114728-1-hbathini@linux.ibm.com
    hbathini authored and mpe committed Oct 27, 2021
  6. selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-pat…

    …ching.sh
    
    The EPOCHSECONDS environment variable was added in bash 5.0 (released
    2019).  Some distributions of the "stable" and "long-term" variety ship
    older versions of bash than this, so swap to using the date command
    instead.
    
    "%s" was added to coreutils `date` in 1993 so we should be good, but who
    knows, it is a GNU extension and not part of the POSIX spec for `date`.
    
    Signed-off-by: Russell Currey <ruscur@russell.cc>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211025102436.19177-1-ruscur@russell.cc
    ruscur authored and mpe committed Oct 27, 2021
  7. powerpc/64s/interrupt: Fix check_return_regs_valid() false positive

    The check_return_regs_valid() can cause a false positive if the return
    regs are marked as norestart and they are an HSRR type interrupt,
    because the low bit in the bottom of regs->trap causes interrupt type
    matching to fail.
    
    This can occcur for example on bare metal with a HV privileged doorbell
    interrupt that causes a signal, but do_signal returns early because
    get_signal() fails, and takes the "No signal to deliver" path. In this
    case no signal was delivered so the return location is not changed so
    return SRRs are not invalidated, yet set_trap_norestart is called, which
    messes up the match. Building go-1.16.6 is known to reproduce this.
    
    Fix it by using the TRAP() accessor which masks out the low bit.
    
    Fixes: 6eaaf9d ("powerpc/64s/interrupt: Check and fix srr_valid without crashing")
    Cc: stable@vger.kernel.org # v5.14+
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211026122531.3599918-1-npiggin@gmail.com
    npiggin authored and mpe committed Oct 27, 2021
  8. powerpc/boot: Set LC_ALL=C in wrapper script

    While trying to build a simple Image for ACADIA platform, I got the
    following error:
    
    	  WRAP    arch/powerpc/boot/simpleImage.acadia
    	INFO: Uncompressed kernel (size 0x6ae7d0) overlaps the address of the wrapper(0x400000)
    	INFO: Fixing the link_address of wrapper to (0x700000)
    	powerpc64-linux-gnu-ld : mode d'émulation non reconnu : -T
    	Émulations prises en charge : elf64ppc elf32ppc elf32ppclinux elf32ppcsim elf64lppc elf32lppc elf32lppclinux elf32lppcsim
    	make[1]: *** [arch/powerpc/boot/Makefile:424 : arch/powerpc/boot/simpleImage.acadia] Erreur 1
    	make: *** [arch/powerpc/Makefile:285 : simpleImage.acadia] Erreur 2
    
    Trying again with V=1 shows the following command
    
    	powerpc64-linux-gnu-ld -m -T arch/powerpc/boot/zImage.lds -Ttext 0x700000 --no-dynamic-linker -o arch/powerpc/boot/simpleImage.acadia -Map wrapper.map arch/powerpc/boot/fixed-head.o arch/powerpc/boot/simpleboot.o ./zImage.3278022.o arch/powerpc/boot/wrapper.a
    
    The argument of '-m' is missing.
    
    This is due to the wrapper script calling 'objdump -p vmlinux' and
    looking for 'file format', whereas the output of objdump is:
    
    	vmlinux:     format de fichier elf32-powerpc
    
    	En-tête de programme:
    	    LOAD off    0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
    	         filesz 0x0069e1d4 memsz 0x006c128c flags rwx
    	    NOTE off    0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
    	         filesz 0x00000054 memsz 0x00000054 flags ---
    
    Add LC_ALL=C at the beginning of the wrapper script in order to get the
    output expected by the script:
    
    	vmlinux:     file format elf32-powerpc
    
    	Program Header:
    	    LOAD off    0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
    	         filesz 0x0069e1d4 memsz 0x006c128c flags rwx
    	    NOTE off    0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
    	         filesz 0x00000054 memsz 0x00000054 flags ---
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/a9ff3bc98035f63b122c051f02dc47c7aed10430.1635256089.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 27, 2021
  9. powerpc/64s: Default to 64K pages for 64 bit book3s

    For 64-bit book3s the default should be 64K as that's what modern CPUs
    are designed for.
    
    The following defconfigs already set CONFIG_PPC_64K_PAGES:
    
     cell_defconfig
     pasemi_defconfig
     powernv_defconfig
     ppc64_defconfig
     pseries_defconfig
     skiroot_defconfig
    
    The have the option removed from the defconfig, as it is now the
    default.
    
    The defconfigs that now need to set CONFIG_PPC_4K_PAGES to maintain
    their existing behaviour are:
    
     g5_defconfig
     maple_defconfig
     microwatt_defconfig
     ps3_defconfig
    
    Signed-off-by: Joel Stanley <joel@jms.id.au>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    BugLink: linuxppc/issues#109
    Link: https://lore.kernel.org/r/20211015001649.45591-1-joel@jms.id.au
    shenki authored and mpe committed Oct 27, 2021
  10. Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC"

    This reverts commit 566af8c.
    
    This caused some conflicts vs the audit tree, and the audit maintainers
    would prefer we postpone this to the next merge window so we have more
    time for testing.
    
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    mpe committed Oct 27, 2021

Commits on Oct 22, 2021

  1. powerpc/pseries/mobility: ignore ibm, platform-facilities updates

    On VMs with NX encryption, compression, and/or RNG offload, these
    capabilities are described by nodes in the ibm,platform-facilities device
    tree hierarchy:
    
      $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
      /sys/firmware/devicetree/base/ibm,platform-facilities/
      ├── ibm,compression-v1
      ├── ibm,random-v1
      └── ibm,sym-encryption-v1
    
      3 directories
    
    The acceleration functions that these nodes describe are not disrupted by
    live migration, not even temporarily.
    
    But the post-migration ibm,update-nodes sequence firmware always sends
    "delete" messages for this hierarchy, followed by an "add" directive to
    reconstruct it via ibm,configure-connector (log with debugging statements
    enabled in mobility.c):
    
      mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
      mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
      mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
      mobility: removing node /ibm,platform-facilities:4294967286
      ...
      mobility: added node /ibm,platform-facilities:4294967286
    
    Note we receive a single "add" message for the entire hierarchy, and what
    we receive from the ibm,configure-connector sequence is the top-level
    platform-facilities node along with its three children. The debug message
    simply reports the parent node and not the whole subtree.
    
    Also, significantly, the nodes added are almost completely equivalent to
    the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
    the leaf nodes is the only property I've observed to differ, and Linux does
    not use that. So in practice, the sum of update messages Linux receives for
    this hierarchy is equivalent to minor property updates.
    
    We succeed in removing the original hierarchy from the device tree. But the
    vio bus code is ignorant of this, and does not unbind or relinquish its
    references. The leaf nodes, still reachable through sysfs, of course still
    refer to the now-freed ibm,platform-facilities parent node, which makes
    use-after-free possible:
    
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
      refcount_warn_saturate+0x160/0x1f0 (unreliable)
      kobject_get+0xf0/0x100
      of_node_get+0x30/0x50
      of_get_parent+0x50/0xb0
      of_fwnode_get_parent+0x54/0x90
      fwnode_count_parents+0x50/0x150
      fwnode_full_name_string+0x30/0x110
      device_node_string+0x49c/0x790
      vsnprintf+0x1c0/0x4c0
      sprintf+0x44/0x60
      devspec_show+0x34/0x50
      dev_attr_show+0x40/0xa0
      sysfs_kf_seq_show+0xbc/0x200
      kernfs_seq_show+0x44/0x60
      seq_read_iter+0x2a4/0x740
      kernfs_fop_read_iter+0x254/0x2e0
      new_sync_read+0x120/0x190
      vfs_read+0x1d0/0x240
    
    Moreover, the "new" replacement subtree is not correctly added to the
    device tree, resulting in ibm,platform-facilities parent node without the
    appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
    
      $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
      /sys/firmware/devicetree/base/ibm,platform-facilities/
    
      0 directories
    
      $ cd /sys/devices/vio ; find . -xtype l -exec file {} +
      ./ibm,sym-encryption-v1/of_node: broken symbolic link to
        ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
      ./ibm,random-v1/of_node:         broken symbolic link to
        ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
      ./ibm,compression-v1/of_node:    broken symbolic link to
        ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
    
    This is because add_dt_node() -> dlpar_attach_node() attaches only the
    parent node returned from configure-connector, ignoring any children. This
    should be corrected for the general case, but fixing that won't help with
    the stale OF node references, which is the more urgent problem.
    
    One way to address that would be to make the drivers respond to node
    removal notifications, so that node references can be dropped
    appropriately. But this would likely force the drivers to disrupt active
    clients for no useful purpose: equivalent nodes are immediately re-added.
    And recall that the acceleration capabilities described by the nodes remain
    available throughout the whole process.
    
    The solution I believe to be robust for this situation is to convert
    remove+add of a node with an unchanged phandle to an update of the node's
    properties in the Linux device tree structure. That would involve changing
    and adding a fair amount of code, and may take several iterations to land.
    
    Until that can be realized we have a confirmed use-after-free and the
    possibility of memory corruption. So add a limited workaround that
    discriminates on the node type, ignoring adds and removes. This should be
    amenable to backporting in the meantime.
    
    Fixes: 410bccf ("powerpc/pseries: Partition migration in the kernel")
    Cc: stable@vger.kernel.org
    Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
    nathanlynch authored and mpe committed Oct 22, 2021
  2. powerpc/32: Don't use a struct based type for pte_t

    Long time ago we had a config item called STRICT_MM_TYPECHECKS
    to build the kernel with pte_t defined as a structure in order
    to perform additional build checks or build it with pte_t
    defined as a simple type in order to get simpler generated code.
    
    Commit 670eea9 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS")
    made the struct based definition the only one, considering that the
    generated code was similar in both cases.
    
    That's right on ppc64 because the ABI is such that the content of a
    struct having a single simple type element is passed as register,
    but on ppc32 such a structure is passed via the stack like any
    structure.
    
    Simple test function:
    
    	pte_t test(pte_t pte)
    	{
    		return pte;
    	}
    
    Before this patch we get
    
    	c00108ec <test>:
    	c00108ec:	81 24 00 00 	lwz     r9,0(r4)
    	c00108f0:	91 23 00 00 	stw     r9,0(r3)
    	c00108f4:	4e 80 00 20 	blr
    
    So, for PPC32, restore the simple type behaviour we got before
    commit 670eea9, but instead of adding a config option to
    activate type check, do it when __CHECKER__ is set so that type
    checking is performed by 'sparse' and provides feedback like:
    
    	arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types)
    	arch/powerpc/mm/pgtable.c:466:16:    expected unsigned long
    	arch/powerpc/mm/pgtable.c:466:16:    got struct pte_t [usertype] x
    
    With this patch we now get
    
    	c0010890 <test>:
    	c0010890:	4e 80 00 20 	blr
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    [mpe: Define STRICT_MM_TYPECHECKS rather than repeating the condition]
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/c904599f33aaf6bb7ee2836a9ff8368509e0d78d.1631887042.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  3. powerpc/breakpoint: Cleanup

    cache_op_size() does exactly the same as l1_dcache_bytes().
    
    Remove it.
    
    MSR_64BIT already exists, no need to enclode the check
    around #ifdef __powerpc64__
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/6184b08088312a7d787d450eb902584e4ae77f7a.1632317816.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  4. powerpc: Activate CONFIG_STRICT_KERNEL_RWX by default

    CONFIG_STRICT_KERNEL_RWX should be set by default on every
    architectures (See KSPP#4)
    
    On PPC32 we have to find a compromise between performance and/or
    memory wasting and selection of strict_kernel_rwx, because it implies
    either smaller memory chunks or larger alignment between RO memory
    and RW memory.
    
    For instance the 8xx maps memory with 8M pages. So either the limit
    between RO and RW must be 8M aligned or it falls back or 512k pages
    which implies more pressure on the TLB.
    
    book3s/32 maps memory with BATs as much as possible. BATS can have
    any power-of-two size between 128k and 256M but we have only 4 to 8
    BATs so the alignment must be good enough to allow efficient use of
    the BATs and avoid falling back on standard page mapping which would
    kill performance.
    
    So let's go one step forward and make it the default but still allow
    users to unset it when wanted.
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/057c40164084bfc7d77c0b2ff78d95dbf6a2a21b.1632503622.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  5. powerpc/8xx: Simplify TLB handling

    In the old days, TLB handling for 8xx was using tlbie and tlbia
    instructions directly as much as possible.
    
    But commit f048aac ("powerpc/mm: Add SMP support to no-hash
    TLB handling") broke that by introducing out-of-line unnecessary
    complex functions for booke/smp which don't have tlbie/tlbia
    instructions and require more complex handling.
    
    Restore direct use of tlbie and tlbia for 8xx which is never SMP.
    
    With this patch we now get
    
    	c00ecc68 <ptep_clear_flush>:
    	c00ecc68:	39 00 00 00 	li      r8,0
    	c00ecc6c:	81 46 00 00 	lwz     r10,0(r6)
    	c00ecc70:	91 06 00 00 	stw     r8,0(r6)
    	c00ecc74:	7c 00 2a 64 	tlbie   r5,r0
    	c00ecc78:	7c 00 04 ac 	hwsync
    	c00ecc7c:	91 43 00 00 	stw     r10,0(r3)
    	c00ecc80:	4e 80 00 20 	blr
    
    Before it was
    
    	c0012880 <local_flush_tlb_page>:
    	c0012880:	2c 03 00 00 	cmpwi   r3,0
    	c0012884:	41 82 00 54 	beq     c00128d8 <local_flush_tlb_page+0x58>
    	c0012888:	81 22 00 00 	lwz     r9,0(r2)
    	c001288c:	81 43 00 20 	lwz     r10,32(r3)
    	c0012890:	39 29 00 01 	addi    r9,r9,1
    	c0012894:	91 22 00 00 	stw     r9,0(r2)
    	c0012898:	2c 0a 00 00 	cmpwi   r10,0
    	c001289c:	41 82 00 10 	beq     c00128ac <local_flush_tlb_page+0x2c>
    	c00128a0:	81 2a 01 dc 	lwz     r9,476(r10)
    	c00128a4:	2c 09 ff ff 	cmpwi   r9,-1
    	c00128a8:	41 82 00 0c 	beq     c00128b4 <local_flush_tlb_page+0x34>
    	c00128ac:	7c 00 22 64 	tlbie   r4,r0
    	c00128b0:	7c 00 04 ac 	hwsync
    	c00128b4:	81 22 00 00 	lwz     r9,0(r2)
    	c00128b8:	39 29 ff ff 	addi    r9,r9,-1
    	c00128bc:	2c 09 00 00 	cmpwi   r9,0
    	c00128c0:	91 22 00 00 	stw     r9,0(r2)
    	c00128c4:	4c a2 00 20 	bclr+   4,eq
    	c00128c8:	81 22 00 70 	lwz     r9,112(r2)
    	c00128cc:	71 29 00 04 	andi.   r9,r9,4
    	c00128d0:	4d 82 00 20 	beqlr
    	c00128d4:	48 65 76 74 	b       c0669f48 <preempt_schedule>
    	c00128d8:	81 22 00 00 	lwz     r9,0(r2)
    	c00128dc:	39 29 00 01 	addi    r9,r9,1
    	c00128e0:	91 22 00 00 	stw     r9,0(r2)
    	c00128e4:	4b ff ff c8 	b       c00128ac <local_flush_tlb_page+0x2c>
    ...
    	c00ecdc8 <ptep_clear_flush>:
    	c00ecdc8:	94 21 ff f0 	stwu    r1,-16(r1)
    	c00ecdcc:	39 20 00 00 	li      r9,0
    	c00ecdd0:	93 c1 00 08 	stw     r30,8(r1)
    	c00ecdd4:	83 c6 00 00 	lwz     r30,0(r6)
    	c00ecdd8:	91 26 00 00 	stw     r9,0(r6)
    	c00ecddc:	93 e1 00 0c 	stw     r31,12(r1)
    	c00ecde0:	7c 08 02 a6 	mflr    r0
    	c00ecde4:	7c 7f 1b 78 	mr      r31,r3
    	c00ecde8:	7c 83 23 78 	mr      r3,r4
    	c00ecdec:	7c a4 2b 78 	mr      r4,r5
    	c00ecdf0:	90 01 00 14 	stw     r0,20(r1)
    	c00ecdf4:	4b f2 5a 8d 	bl      c0012880 <local_flush_tlb_page>
    	c00ecdf8:	93 df 00 00 	stw     r30,0(r31)
    	c00ecdfc:	7f e3 fb 78 	mr      r3,r31
    	c00ece00:	80 01 00 14 	lwz     r0,20(r1)
    	c00ece04:	83 c1 00 08 	lwz     r30,8(r1)
    	c00ece08:	83 e1 00 0c 	lwz     r31,12(r1)
    	c00ece0c:	7c 08 03 a6 	mtlr    r0
    	c00ece10:	38 21 00 10 	addi    r1,r1,16
    	c00ece14:	4e 80 00 20 	blr
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/fb324f1c8f2ddb57cf6aad1cea26329558f1c1c0.1631887021.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  6. powerpc/lib/sstep: Don't use __{get/put}_user() on kernel addresses

    In the old days, when we didn't have kernel userspace access
    protection and had set_fs(), it was wise to use __get_user()
    and friends to read kernel memory.
    
    Nowadays, get_user() and put_user() are granting userspace access and
    are exclusively for userspace access.
    
    Convert single step emulation functions to user_access_begin() and
    friends and use unsafe_get_user() and unsafe_put_user().
    
    When addressing kernel addresses, there is no need to open userspace
    access. And for book3s/32 it is particularly important to no try and
    open userspace access on kernel address, because that would break the
    content of kernel space segment registers. No guard has been put
    against that risk in order to avoid degrading performance.
    
    copy_from_kernel_nofault() and copy_to_kernel_nofault() should
    be used but they are out-of-line functions which would degrade
    performance. Those two functions are making use of
    __get_kernel_nofault() and __put_kernel_nofault() macros.
    Those two macros are just wrappers behind __get_user_size_goto() and
    __put_user_size_goto().
    
    unsafe_get_user() and unsafe_put_user() are also wrappers of
    __get_user_size_goto() and __put_user_size_goto(). Use them to
    access kernel space. That allows refactoring userspace and
    kernelspace access.
    
    Reported-by: Stan Johnson <userm57@yahoo.com>
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Depends-on: 4fe5cda ("powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin")
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/22831c9d17f948680a12c5292e7627288b15f713.1631817805.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  7. powerpc: warn on emulation of dcbz instruction in kernel mode

    dcbz instruction shouldn't be used on non-cached memory. Using
    it on non-cached memory can result in alignment exception and
    implies a heavy handling.
    
    Instead of silentely emulating the instruction and resulting in high
    performance degradation, warn whenever an alignment exception is
    taken in kernel mode due to dcbz, so that the user is made aware that
    dcbz instruction has been used unexpectedly by the kernel.
    
    Reported-by: Stan Johnson <userm57@yahoo.com>
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/2e3acfe63d289c6fba366e16973c9ab8369e8b75.1631803922.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  8. powerpc/32: Add support for out-of-line static calls

    Add support for out-of-line static calls on PPC32. This change
    improve performance of calls to global function pointers by
    using direct calls instead of indirect calls.
    
    The trampoline is initialy populated with a 'blr' or branch to target,
    followed by an unreachable long jump sequence.
    
    In order to cater with parallele execution, the trampoline needs to
    be updated in a way that ensures it remains consistent at all time.
    This means we can't use the traditional lis/addi to load r12 with
    the target address, otherwise there would be a window during which
    the first instruction contains the upper part of the new target
    address while the second instruction still contains the lower part of
    the old target address. To avoid that the target address is stored
    just after the 'bctr' and loaded from there with a single instruction.
    
    Then, depending on the target distance, arch_static_call_transform()
    will either replace the first instruction by a direct 'bl <target>' or
    'nop' in order to have the trampoline fall through the long jump
    sequence.
    
    For the special case of __static_call_return0(), to avoid the risk of
    a far branch, a version of it is inlined at the end of the trampoline.
    
    Performancewise the long jump sequence is probably not better than
    the indirect calls set by GCC when we don't use static calls, but
    such calls are unlikely to be required on powerpc32: With most
    configurations the kernel size is far below 32 Mbytes so only
    modules may happen to be too far. And even modules are likely to
    be close enough as they are allocated below the kernel core and
    as close as possible of the kernel text.
    
    static_call selftest is running successfully with this change.
    
    With this patch, __do_irq() has the following sequence to trace
    irq entries:
    
    	c0004a00 <__SCT__tp_func_irq_entry>:
    	c0004a00:	48 00 00 e0 	b       c0004ae0 <__traceiter_irq_entry>
    	c0004a04:	3d 80 c0 00 	lis     r12,-16384
    	c0004a08:	81 8c 4a 1c 	lwz     r12,18972(r12)
    	c0004a0c:	7d 89 03 a6 	mtctr   r12
    	c0004a10:	4e 80 04 20 	bctr
    	c0004a14:	38 60 00 00 	li      r3,0
    	c0004a18:	4e 80 00 20 	blr
    	c0004a1c:	00 00 00 00 	.long 0x0
    ...
    	c0005654 <__do_irq>:
    ...
    	c0005664:	7c 7f 1b 78 	mr      r31,r3
    ...
    	c00056a0:	81 22 00 00 	lwz     r9,0(r2)
    	c00056a4:	39 29 00 01 	addi    r9,r9,1
    	c00056a8:	91 22 00 00 	stw     r9,0(r2)
    	c00056ac:	3d 20 c0 af 	lis     r9,-16209
    	c00056b0:	81 29 74 cc 	lwz     r9,29900(r9)
    	c00056b4:	2c 09 00 00 	cmpwi   r9,0
    	c00056b8:	41 82 00 10 	beq     c00056c8 <__do_irq+0x74>
    	c00056bc:	80 69 00 04 	lwz     r3,4(r9)
    	c00056c0:	7f e4 fb 78 	mr      r4,r31
    	c00056c4:	4b ff f3 3d 	bl      c0004a00 <__SCT__tp_func_irq_entry>
    
    Before this patch, __do_irq() was doing the following to trace irq
    entries:
    
    	c0005700 <__do_irq>:
    ...
    	c0005710:	7c 7e 1b 78 	mr      r30,r3
    ...
    	c000574c:	93 e1 00 0c 	stw     r31,12(r1)
    	c0005750:	81 22 00 00 	lwz     r9,0(r2)
    	c0005754:	39 29 00 01 	addi    r9,r9,1
    	c0005758:	91 22 00 00 	stw     r9,0(r2)
    	c000575c:	3d 20 c0 af 	lis     r9,-16209
    	c0005760:	83 e9 f4 cc 	lwz     r31,-2868(r9)
    	c0005764:	2c 1f 00 00 	cmpwi   r31,0
    	c0005768:	41 82 00 24 	beq     c000578c <__do_irq+0x8c>
    	c000576c:	81 3f 00 00 	lwz     r9,0(r31)
    	c0005770:	80 7f 00 04 	lwz     r3,4(r31)
    	c0005774:	7d 29 03 a6 	mtctr   r9
    	c0005778:	7f c4 f3 78 	mr      r4,r30
    	c000577c:	4e 80 04 21 	bctrl
    	c0005780:	85 3f 00 0c 	lwzu    r9,12(r31)
    	c0005784:	2c 09 00 00 	cmpwi   r9,0
    	c0005788:	40 82 ff e4 	bne     c000576c <__do_irq+0x6c>
    
    Behind the fact of now using a direct 'bl' instead of a
    'load/mtctr/bctr' sequence, we can also see that we get one less
    register on the stack.
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  9. powerpc/machdep: Remove stale functions from ppc_md structure

    ppc_md.iommu_save() is not set anymore by any platform after
    commit c40785a ("powerpc/dart: Use a cachable DART").
    So iommu_save() has become a nop and can be removed.
    
    ppc_md.show_percpuinfo() is not set anymore by any platform after
    commit 4350147 ("[PATCH] ppc64: SMU based macs cpufreq support").
    
    Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were
    removed by commit 0f03a43 ("[POWERPC] Remove todc code from
    ARCH=powerpc")
    
    Last user of kgdb_map_scc() was removed by commit 17ce452 ("kgdb,
    powerpc: arch specific powerpc kgdb support").
    
    ppc.machine_kexec_prepare() has not been used since
    commit 8ee3e0d ("powerpc: Remove the main legacy iSerie platform
    code"). This allows the removal of machine_kexec_prepare() and the
    rename of default_machine_kexec_prepare() into machine_kexec_prepare()
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Reviewed-by: Daniel Axtens <dja@axtens.net>
    [mpe: Drop prototype for default_machine_kexec_prepare() as noted by dja]
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/24d4ca0ada683c9436a5f812a7aeb0a1362afa2b.1630398606.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  10. powerpc/time: Remove generic_suspend_{dis/en}able_irqs()

    Commit d75d68c ("powerpc: Clean up obsolete code relating to
    decrementer and timebase") made generic_suspend_enable_irqs() and
    generic_suspend_disable_irqs() static.
    
    Fold them into their only caller.
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Reviewed-by: Daniel Axtens <dja@axtens.net>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/c3f9ec9950394ef939014f7934268e6ee30ca04f.1630398566.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  11. powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC

    Commit e65e1fc ("[PATCH] syscall class hookup for all normal
    targets") added generic support for AUDIT but that didn't include
    support for bi-arch like powerpc.
    
    Commit 4b58841 ("audit: Add generic compat syscall support")
    added generic support for bi-arch.
    
    Convert powerpc to that bi-arch generic audit support.
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/a4b3951d1191d4183d92a07a6097566bde60d00a.1629812058.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  12. powerpc/32: Don't use lmw/stmw for saving/restoring non volatile regs

    Instructions lmw/stmw are interesting for functions that are rarely
    used and not in the cache, because only one instruction is to be
    copied into the instruction cache instead of 19. However those
    instruction are less performant than 19x raw lwz/stw as they require
    synchronisation plus one additional cycle.
    
    SAVE_NVGPRS / REST_NVGPRS are used in only a few places which are
    mostly in interrupts entries/exits and in task switch so they are
    likely already in the cache.
    
    Using standard lwz improves null_syscall selftest by:
    - 10 cycles on mpc832x.
    - 2 cycles on mpc8xx.
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/316c543b8906712c108985c8463eec09c8db577b.1629732542.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Oct 22, 2021
  13. powerpc/5200: dts: fix memory node unit name

    Fixes build warnings:
    Warning (unit_address_vs_reg): /memory: node has a reg or ranges property, but no unit name
    
    Signed-off-by: Anatolij Gustschin <agust@denx.de>
    Reviewed-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211013220532.24759-4-agust@denx.de
    vdsao authored and mpe committed Oct 22, 2021
Older