Skip to content
Permalink
Nicholas-Piggi…
Switch branches/tags

Commits on Aug 16, 2021

  1. powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interrupts

    Reading the CFAR register is quite costly (~20 cycles on POWER9). It is
    a good idea to have for most synchronous interrupts, but for async ones
    it is much less important.
    
    Doorbell, external, and decrementer interrupts are the important
    asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible,
    because it might be a guest exit that requires CFAR preserved. But for
    now the important pseries interrupts can avoid loading CFAR.
    
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    npiggin authored and intel-lab-lkp committed Aug 16, 2021
  2. powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless pe…

    …rf is in use
    
    Enabling MSR[EE] in interrupt handlers while interrupts are still soft
    masked allows PMIs to profile interrupt handlers to some degree, beyond
    what SIAR latching allows.
    
    When perf is not being used, this is almost useless work. It requires an
    extra mtmsrd in the irq handler, and it also opens the door to masked
    interrupts hitting and requiring replay, which is more expensive than
    just taking them directly. This effect can be noticable in high IRQ
    workloads.
    
    Avoid enabling MSR[EE] unless perf is currently in use. This saves about
    60 cycles (or 8%) on a simple decrementer interrupt microbenchmark.
    Replayed interrupts drop from 1.4% of interrupts to 0.003%.
    
    This does prevent the soft-nmi interrupt being taken in these handlers,
    but that's not too reliable anyway. The SMP watchdog will continue to be
    the reliable way to catch lockups.
    
    Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
    Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    npiggin authored and intel-lab-lkp committed Aug 16, 2021
  3. powerpc/64s/perf: add power_pmu_running to query whether perf is bein…

    …g used
    
    Interrupt handling code would like to know whether perf is enabled, to
    know whether it should enable MSR[EE] to improve PMI coverage.
    
    Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
    Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    npiggin authored and intel-lab-lkp committed Aug 16, 2021
  4. powerpc/64: handle MSR EE and RI in interrupt entry wrapper

    Similarly to the system call change in the previous patch, the mtmsrd to
    enable RI can be combined with the mtmsrd to enable EE for interrupts
    which enable the latter, which tends to be the important synchronous
    interrupts (i.e., page faults).
    
    Do this by enabling EE and RI together at the beginning of the entry
    wrapper if PACA_IRQ_HARD_DIS is clear, and just enabling RI if it is set
    (which means something wanted EE=0).
    
    Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones
    leave it unchanged, so by default they always get EE=1 unless they
    interrupt a caller that has hard disabled. When the sync interrupt
    later calls interrupt_cond_local_irq_enable(), that will not require
    another mtmsrd because we already enabled here.
    
    64e is conceptually unchanged, but it also sets MSR[EE]=1 now in the
    interrupt wrapper for synchronous interrupts with the same code.
    
    On 64s, saves one mtmsrd L=1 for synchronous interrupts on 64s, which
    saves about 20 cycles. For kernel-mode interrupts, both synchronous and
    asynchronous, this saves an additional ~40 cycles due to the mtmsrd
    being moved ahead of mfspr SPRN_AMR, which prevents a SPR scoreboard
    stall (on POWER9).
    
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    npiggin authored and intel-lab-lkp committed Aug 16, 2021

Commits on Aug 15, 2021

  1. powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() wit…

    …h asm goto
    
    Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
    flexibility to GCC.
    
    For that add an entry to the exception table so that
    program_check_exception() knowns where to resume execution
    after a WARNING.
    
    Here are two exemples. The first one is done on PPC32 (which
    benefits from the previous patch), the second is on PPC64.
    
    	unsigned long test(struct pt_regs *regs)
    	{
    		int ret;
    
    		WARN_ON(regs->msr & MSR_PR);
    
    		return regs->gpr[3];
    	}
    
    	unsigned long test9w(unsigned long a, unsigned long b)
    	{
    		if (WARN_ON(!b))
    			return 0;
    		return a / b;
    	}
    
    Before the patch:
    
    	000003a8 <test>:
    	 3a8:	81 23 00 84 	lwz     r9,132(r3)
    	 3ac:	71 29 40 00 	andi.   r9,r9,16384
    	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
    	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
    	 3b8:	4e 80 00 20 	blr
    
    	 3bc:	0f e0 00 00 	twui    r0,0
    	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
    	 3c4:	4e 80 00 20 	blr
    
    	0000000000000bf0 <.test9w>:
    	 bf0:	7c 89 00 74 	cntlzd  r9,r4
    	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
    	 bf8:	0b 09 00 00 	tdnei   r9,0
    	 bfc:	2c 24 00 00 	cmpdi   r4,0
    	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
    	 c04:	7c 63 23 92 	divdu   r3,r3,r4
    	 c08:	4e 80 00 20 	blr
    
    	 c0c:	38 60 00 00 	li      r3,0
    	 c10:	4e 80 00 20 	blr
    
    After the patch:
    
    	000003a8 <test>:
    	 3a8:	81 23 00 84 	lwz     r9,132(r3)
    	 3ac:	71 29 40 00 	andi.   r9,r9,16384
    	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
    	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
    	 3b8:	4e 80 00 20 	blr
    
    	 3bc:	0f e0 00 00 	twui    r0,0
    
    	0000000000000c50 <.test9w>:
    	 c50:	7c 89 00 74 	cntlzd  r9,r4
    	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
    	 c58:	0b 09 00 00 	tdnei   r9,0
    	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
    	 c60:	4e 80 00 20 	blr
    
    	 c70:	38 60 00 00 	li      r3,0
    	 c74:	4e 80 00 20 	blr
    
    In the first exemple, we see GCC doesn't need to duplicate what
    happens after the trap.
    
    In the second exemple, we see that GCC doesn't need to emit a test
    and a branch in the likely path in addition to the trap.
    
    We've got some WARN_ON() in .softirqentry.text section so it needs
    to be added in the OTHER_TEXT_SECTIONS in modpost.c
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/389962b1b702e3c78d169e59bcfac56282889173.1618331882.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Aug 15, 2021

Commits on Aug 14, 2021

  1. powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

    powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
    
    For catching simple conditions like a variable having value 0, this
    is efficient because it does the test and the trap at the same time.
    But most conditions used with BUG_ON or WARN_ON are more complex and
    forces GCC to format the condition into a 0 or 1 value in a register.
    This will usually require 2 to 3 instructions.
    
    The most efficient solution would be to use __builtin_trap() because
    GCC is able to optimise the use of the different trap instructions
    based on the requested condition, but this is complex if not
    impossible for the following reasons:
    - __builtin_trap() is a non-recoverable instruction, so it can't be
    used for WARN_ON
    - Knowing which line of code generated the trap would require the
    analysis of DWARF information. This is not a feature we have today.
    
    As mentioned in commit 8d4fbcf ("Fix WARN_ON() on bitfield ops")
    the way WARN_ON() is implemented is suboptimal. That commit also
    mentions an issue with 'long long' condition. It fixed it for
    WARN_ON() but the same problem still exists today with BUG_ON() on
    PPC32. It will be fixed by using the generic implementation.
    
    By using the generic implementation, gcc will naturally generate a
    branch to the unconditional trap generated by BUG().
    
    As modern powerpc implement zero-cycle branch,
    that's even more efficient.
    
    And for the functions using WARN_ON() and its return, the test
    on return from WARN_ON() is now also used for the WARN_ON() itself.
    
    On PPC64 we don't want it because we want to be able to use CFAR
    register to track how we entered the code that trapped. The CFAR
    register would be clobbered by the branch.
    
    A simple test function:
    
    	unsigned long test9w(unsigned long a, unsigned long b)
    	{
    		if (WARN_ON(!b))
    			return 0;
    		return a / b;
    	}
    
    Before the patch:
    
    	0000046c <test9w>:
    	 46c:	7c 89 00 34 	cntlzw  r9,r4
    	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
    	 474:	0f 09 00 00 	twnei   r9,0
    	 478:	2c 04 00 00 	cmpwi   r4,0
    	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
    	 480:	7c 63 23 96 	divwu   r3,r3,r4
    	 484:	4e 80 00 20 	blr
    
    	 488:	38 60 00 00 	li      r3,0
    	 48c:	4e 80 00 20 	blr
    
    After the patch:
    
    	00000468 <test9w>:
    	 468:	2c 04 00 00 	cmpwi   r4,0
    	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
    	 470:	7c 63 23 96 	divwu   r3,r3,r4
    	 474:	4e 80 00 20 	blr
    
    	 478:	0f e0 00 00 	twui    r0,0
    	 47c:	38 60 00 00 	li      r3,0
    	 480:	4e 80 00 20 	blr
    
    So we see before the patch we need 3 instructions on the likely path
    to handle the WARN_ON(). With the patch the trap goes on the unlikely
    path.
    
    See below the difference at the entry of system_call_exception where
    we have several BUG_ON(), allthough less impressing.
    
    With the patch:
    
    	00000000 <system_call_exception>:
    	   0:	81 6a 00 84 	lwz     r11,132(r10)
    	   4:	90 6a 00 88 	stw     r3,136(r10)
    	   8:	71 60 00 02 	andi.   r0,r11,2
    	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
    	  10:	71 60 40 00 	andi.   r0,r11,16384
    	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
    	  18:	71 6b 80 00 	andi.   r11,r11,32768
    	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
    	  20:	94 21 ff e0 	stwu    r1,-32(r1)
    	  24:	93 e1 00 1c 	stw     r31,28(r1)
    	  28:	7d 8c 42 e6 	mftb    r12
    	...
    	  7c:	0f e0 00 00 	twui    r0,0
    	  80:	0f e0 00 00 	twui    r0,0
    	  84:	0f e0 00 00 	twui    r0,0
    
    Without the patch:
    
    	00000000 <system_call_exception>:
    	   0:	94 21 ff e0 	stwu    r1,-32(r1)
    	   4:	93 e1 00 1c 	stw     r31,28(r1)
    	   8:	90 6a 00 88 	stw     r3,136(r10)
    	   c:	81 6a 00 84 	lwz     r11,132(r10)
    	  10:	69 60 00 02 	xori    r0,r11,2
    	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
    	  18:	0f 00 00 00 	twnei   r0,0
    	  1c:	69 60 40 00 	xori    r0,r11,16384
    	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
    	  24:	0f 00 00 00 	twnei   r0,0
    	  28:	69 6b 80 00 	xori    r11,r11,32768
    	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
    	  30:	0f 0b 00 00 	twnei   r11,0
    	  34:	7d 8c 42 e6 	mftb    r12
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/b286e07fb771a664b631cd07a40b09c06f26e64b.1618331881.git.christophe.leroy@csgroup.eu
    chleroy authored and mpe committed Aug 14, 2021

Commits on Aug 13, 2021

  1. powerpc/pseries: Add support for FORM2 associativity

    PAPR interface currently supports two different ways of communicating resource
    grouping details to the OS. These are referred to as Form 0 and Form 1
    associativity grouping. Form 0 is the older format and is now considered
    deprecated. This patch adds another resource grouping named FORM2.
    
    Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210812132223.225214-6-aneesh.kumar@linux.ibm.com
    kvaneesh authored and mpe committed Aug 13, 2021
  2. powerpc/pseries: Add a helper for form1 cpu distance

    This helper is only used with the dispatch trace log collection.
    A later patch will add Form2 affinity support and this change helps
    in keeping that simpler. Also add a comment explaining we don't expect
    the code to be called with FORM0
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210812132223.225214-5-aneesh.kumar@linux.ibm.com
    kvaneesh authored and mpe committed Aug 13, 2021
  3. powerpc/pseries: Consolidate different NUMA distance update code paths

    The associativity details of the newly added resourced are collected from
    the hypervisor via "ibm,configure-connector" rtas call. Update the numa
    distance details of the newly added numa node after the above call.
    
    Instead of updating NUMA distance every time we lookup a node id
    from the associativity property, add helpers that can be used
    during boot which does this only once. Also remove the distance
    update from node id lookup helpers.
    
    Currently, we duplicate parsing code for ibm,associativity and
    ibm,associativity-lookup-arrays in the kernel. The associativity array provided
    by these device tree properties are very similar and hence can use
    a helper to parse the node id and numa distance details.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210812132223.225214-4-aneesh.kumar@linux.ibm.com
    kvaneesh authored and mpe committed Aug 13, 2021
  4. powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY

    Also make related code cleanup that will allow adding FORM2_AFFINITY in
    later patches. No functional change in this patch.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210812132223.225214-3-aneesh.kumar@linux.ibm.com
    kvaneesh authored and mpe committed Aug 13, 2021
  5. powerpc/pseries: rename min_common_depth to primary_domain_index

    No functional change in this patch.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210812132223.225214-2-aneesh.kumar@linux.ibm.com
    kvaneesh authored and mpe committed Aug 13, 2021
  6. powerpc: rename powerpc_debugfs_root to arch_debugfs_dir

    No functional change in this patch. arch_debugfs_dir is the generic kernel
    name declared in linux/debugfs.h for arch-specific debugfs directory.
    Architectures like x86/s390 already use the name. Rename powerpc
    specific powerpc_debugfs_root to arch_debugfs_dir.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com
    kvaneesh authored and mpe committed Aug 13, 2021
  7. powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs …

    …entry
    
    Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling.
    Also add a debugfs entry for tlb_local_single_page_flush_ceiling.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210812132831.233794-1-aneesh.kumar@linux.ibm.com
    kvaneesh authored and mpe committed Aug 13, 2021
  8. cpufreq: powernv: Fix init_chip_info initialization in numa=off

    In the numa=off kernel command-line configuration init_chip_info() loops
    around the number of chips and attempts to copy the cpumask of that node
    which is NULL for all iterations after the first chip.
    
    Hence, store the cpu mask for each chip instead of derving cpumask from
    node while populating the "chips" struct array and copy that to the
    chips[i].mask
    
    Fixes: 053819e ("cpufreq: powernv: Handle throttling due to Pmax capping at chip level")
    Cc: stable@vger.kernel.org # v4.3+
    Reported-by: Shirisha Ganta <shirisha.ganta1@ibm.com>
    Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
    Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
    [mpe: Rename goto label to out_free_chip_cpu_mask]
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210728120500.87549-2-psampat@linux.ibm.com
    pratiksampat authored and mpe committed Aug 13, 2021
  9. powerpc: wii_defconfig: Enable OTP by default

    This selects the nintendo-otp module when building for this platform.
    
    Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210801073822.12452-6-linkmauve@linkmauve.fr
    linkmauve authored and mpe committed Aug 13, 2021
  10. powerpc: wii.dts: Expose the OTP on this platform

    This can be used by the newly-added nintendo-otp nvmem module.
    
    Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210801073822.12452-5-linkmauve@linkmauve.fr
    linkmauve authored and mpe committed Aug 13, 2021
  11. powerpc: wii.dts: Reduce the size of the control area

    This is wrong, but needed in order to avoid overlapping ranges with the
    OTP area added in the next commit.  A refactor of this part of the
    device tree is needed: according to Wiibrew[1], this area starts at
    0x0d800000 and spans 0x400 bytes (that is, 0x100 32-bit registers),
    encompassing PIC and GPIO registers, amongst the ones already exposed in
    this device tree, which should become children of the control@d800000
    node.
    
    [1] https://wiibrew.org/wiki/Hardware/Hollywood_Registers
    
    Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210801073822.12452-4-linkmauve@linkmauve.fr
    linkmauve authored and mpe committed Aug 13, 2021

Commits on Aug 10, 2021

  1. powerpc: Bulk conversion to generic_handle_domain_irq()

    Wherever possible, replace constructs that match either
    generic_handle_irq(irq_find_mapping()) or
    generic_handle_irq(irq_linear_revmap()) to a single call to
    generic_handle_domain_irq().
    
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210802162630.2219813-13-maz@kernel.org
    Marc Zyngier authored and mpe committed Aug 10, 2021
  2. KVM: PPC: Book3S HV: XIVE: Add support for automatic save-restore

    On P10, the feature doing an automatic "save & restore" of a VCPU
    interrupt context is set by default in OPAL. When a VP context is
    pulled out, the state of the interrupt registers are saved by the XIVE
    interrupt controller under the internal NVP structure representing the
    VP. This saves a costly store/load in guest entries and exits.
    
    If OPAL advertises the "save & restore" feature in the device tree,
    it should also have set the 'H' bit in the CAM line. Check that when
    vCPUs are connected to their ICP in KVM before going any further.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210720134209.256133-3-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  3. KVM: PPC: Book3S HV: XIVE: Add a 'flags' field

    Use it to hold platform specific features. P9 DD2 introduced
    single-escalation support. P10 will add others.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210720134209.256133-2-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  4. powerpc: use IRQF_NO_DEBUG for IPIs

    There is no need to use the lockup detector ("noirqdebug") for IPIs.
    The ipistorm benchmark measures a ~10% improvement on high systems
    when this flag is set.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210719130614.195886-1-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  5. powerpc/xive: Use XIVE domain under xmon and debugfs

    The default domain of the PCI/MSIs is not the XIVE domain anymore. To
    list the IRQ mappings under XMON and debugfs, query the IRQ data from
    the low level XIVE domain.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-32-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  6. KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts

    PCI MSIs now live in an MSI domain but the underlying calls, which
    will EOI the interrupt in real mode, need an HW IRQ number mapped in
    the XICS IRQ domain. Grab it there.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-31-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  7. powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi()

    pnv_opal_pci_msi_eoi() is called from KVM to EOI passthrough interrupts
    when in real mode. Adding MSI domain broke the hack using the
    'ioda.irq_chip' field to deduce the owning PHB. Fix that by using the
    IRQ chip data in the MSI domain.
    
    The 'ioda.irq_chip' field is now unused and could be removed from the
    pnv_phb struct.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-30-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  8. powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices

    Before MSI domains, the default IRQ chip of PHB3 MSIs was patched by
    pnv_set_msi_irq_chip() with the custom EOI handler pnv_ioda2_msi_eoi()
    and the owning PHB was deduced from the 'ioda.irq_chip' field. This
    path has been deprecated by the MSI domains but it is still in use by
    the P8 CAPI 'cxl' driver.
    
    Rewriting this driver to support MSI would be a waste of time.
    Nevertheless, we can still remove the IRQ chip patch and set the IRQ
    chip data instead. This is cleaner.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-29-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  9. powerpc/xics: Fix IRQ migration

    desc->irq_data points to the top level IRQ data descriptor which is
    not necessarily in the XICS IRQ domain. MSIs are in another domain for
    instance. Fix that by looking for a mapping on the low level XICS IRQ
    domain.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-28-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  10. powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough in…

    …terrupt
    
    The pnv_ioda2_msi_eoi() chip handler is not used anymore for MSIs.
    Simply use the check on the PSI-MSI chip.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-27-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  11. powerpc/powernv/pci: Drop unused MSI code

    MSIs should be fully managed by the PCI and IRQ subsystems now.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-26-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  12. powerpc/pseries/pci: Drop unused MSI code

    MSIs should be fully managed by the PCI and IRQ subsystems now.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-25-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  13. powerpc/xics: Drop unmask of MSIs at startup

    That was a workaround in the XICS domain because of the lack of MSI
    domain. This is now handled.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-24-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  14. powerpc/pci: Drop XIVE restriction on MSI domains

    The PowerNV and pSeries platforms now have support for both the XICS
    and XIVE IRQ domains.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-23-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  15. powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3

    PHB3s need an extra OPAL call to EOI the interrupt. The call takes an
    OPAL HW IRQ number but it is translated into a vector number in OPAL.
    Here, we directly use the vector number of the in-the-middle "PNV-MSI"
    domain instead of grabbing the OPAL HW IRQ number in the XICS parent
    domain.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-22-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  16. powerpc/xics: Add support for IRQ domain hierarchy

    XICS doesn't have any state associated with the IRQ. The support is
    straightforward and simpler than for XIVE.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-21-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  17. powerpc/xics: Add debug logging to the set_irq_affinity handlers

    It really helps to know how the HW is configured when tweaking the IRQ
    subsystem.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-20-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
  18. powerpc/xics: Give a name to the default XICS IRQ domain

    and clean up the error path.
    
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210701132750.1475580-19-clg@kaod.org
    legoater authored and mpe committed Aug 10, 2021
Older