Skip to content

Commit 05ec424

Browse files
Gavin Shanozbenh
authored andcommitted
powerpc/eeh: Avoid event on passed PE
We must not handle EEH error on devices which are passed to somebody else. Instead, we expect that the frozen device owner detects an EEH error and recovers from it. This avoids EEH error handling on passed through devices so the device owner gets a chance to handle them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
1 parent 9287b95 commit 05ec424

File tree

3 files changed

+17
-1
lines changed

3 files changed

+17
-1
lines changed

arch/powerpc/include/asm/eeh.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include <linux/list.h>
2626
#include <linux/string.h>
2727
#include <linux/time.h>
28+
#include <linux/atomic.h>
2829

2930
struct pci_dev;
3031
struct pci_bus;
@@ -84,6 +85,7 @@ struct eeh_pe {
8485
int freeze_count; /* Times of froze up */
8586
struct timeval tstamp; /* Time on first-time freeze */
8687
int false_positives; /* Times of reported #ff's */
88+
atomic_t pass_dev_cnt; /* Count of passed through devs */
8789
struct eeh_pe *parent; /* Parent PE */
8890
struct list_head child_list; /* Link PE to the child list */
8991
struct list_head edevs; /* Link list of EEH devices */
@@ -93,6 +95,11 @@ struct eeh_pe {
9395
#define eeh_pe_for_each_dev(pe, edev, tmp) \
9496
list_for_each_entry_safe(edev, tmp, &pe->edevs, list)
9597

98+
static inline bool eeh_pe_passed(struct eeh_pe *pe)
99+
{
100+
return pe ? !!atomic_read(&pe->pass_dev_cnt) : false;
101+
}
102+
96103
/*
97104
* The struct is used to trace EEH state for the associated
98105
* PCI device node or PCI device. In future, it might

arch/powerpc/kernel/eeh.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
400400
if (ret > 0)
401401
return ret;
402402

403+
/*
404+
* If the PE isn't owned by us, we shouldn't check the
405+
* state. Instead, let the owner handle it if the PE has
406+
* been frozen.
407+
*/
408+
if (eeh_pe_passed(pe))
409+
return 0;
410+
403411
/* If we already have a pending isolation event for this
404412
* slot, we know it's bad already, we don't need to check.
405413
* Do this checking under a lock; as multiple PCI devices

arch/powerpc/platforms/powernv/eeh-ioda.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -812,7 +812,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
812812
opal_pci_eeh_freeze_clear(phb->opal_id, frozen_pe_no,
813813
OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
814814
ret = EEH_NEXT_ERR_NONE;
815-
} else if ((*pe)->state & EEH_PE_ISOLATED) {
815+
} else if ((*pe)->state & EEH_PE_ISOLATED ||
816+
eeh_pe_passed(*pe)) {
816817
ret = EEH_NEXT_ERR_NONE;
817818
} else {
818819
pr_err("EEH: Frozen PE#%x on PHB#%x detected\n",

0 commit comments

Comments
 (0)