You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hardware testing with heavy message flooding results in IRQs stopping when in Error Passive State
To Reproduce
Steps to reproduce the behaviour, start the driver:
dev-can-linux -Ex -vvvvv
Then from another console run the following script to flood heavy amount of CAN-bus messages to the driver:
#!/bin/bash
# Basic while loop
counter=1
while [ $counter -le 100000 ]
do
echo -n test > /dev/can1/tx0
echo -n test > /dev/can0/tx0
((counter++))
done
echo Test sequence complete
The driver soon gets overwhelmed with the messages and gets stuck in Error Passive State:
error warning interrupt
Controller changed from Error Active State (0) into Error Warning State (1).
netif_rx: adv_pci-can0: controller problems: 8
netif_rx: adv_pci-can0: TX error counter; tx:60, rx:0
error warning interrupt
Controller changed from Error Active State (0) into Error Warning State (1).
netif_rx: adv_pci-can1: controller problems: 8
netif_rx: adv_pci-can1: TX error counter; tx:70, rx:0
error passive interrupt
Controller changed from Error Warning State (1) into Error Passive State (2).
netif_rx: adv_pci-can0: controller problems: 20
netif_rx: adv_pci-can0: TX error counter; tx:80, rx:0
error passive interrupt
Controller changed from Error Warning State (1) into Error Passive State (2).
netif_rx: adv_pci-can1: controller problems: 20
netif_rx: adv_pci-can1: TX error counter; tx:80, rx:0
Expected behaviour
According to the documentation of SJA1000 chip, during Error Passive State the device should still receive messages and provide IRQs until the amount of errors reach greater than 256, in which case the device chip should enter Bus-Off state. The chip never enters Bus-Off state, the last netif_rx message we receive is that the chip is in Error Passive State and then no further IRQs arrive.
Screenshots
dev-can-linux v1.3.4
Harmonized with Linux Kernel version 69
dev-can-linux comes with ABSOLUTELY NO WARRANTY; for details use option `-w'.
This is free software, and you are welcome to redistribute it
under certain conditions; option `-c' for details.
warning: release versions allow at max -vv option.
driver start (version: 1.3.4)
Auto detected device (13fe:00d7) successfully: (driver "adv_pci")
initializing device 13fe:00d7
read ssvid: 13fe
read ssid: 00d7
read cs: 0, slot: 0, func: 0, devfn: 0
read capability[2]: 0x10
capability 0x10 (PCIe) already enabled
PCIe version: 1
read capability[1]: 0x05
nirq: 8
capability 0x05 (MSI) Per Vector Masking (PVM) not supported
capability 0x05 (MSI) enabled
read ba[0] MEM { addr: df302000, size: 800 }
read ba[1] MEM { addr: df301000, size: 80 }
read ba[2] MEM { addr: df300000, size: 80 }
read irq[0]: 266
read irq[1]: 267
read irq[2]: 268
read irq[3]: 269
read irq[4]: 270
read irq[5]: 271
read irq[6]: 272
read irq[7]: 273
ioremap [df302000] mapping to [53da9db000] successful
reg_base=53da9db000 irq=266
setting BTR0=0x01 BTR1=0x1c
ioremap [df302400] mapping to [53da9dc400] successful
reg_base=53da9dc400 irq=266
setting BTR0=0x01 BTR1=0x1c
error warning interrupt
Controller changed from Error Active State (0) into Error Warning State (1).
netif_rx: adv_pci-can0: controller problems: 8
netif_rx: adv_pci-can0: TX error counter; tx:60, rx:0
error warning interrupt
Controller changed from Error Active State (0) into Error Warning State (1).
netif_rx: adv_pci-can1: controller problems: 8
netif_rx: adv_pci-can1: TX error counter; tx:70, rx:0
error passive interrupt
Controller changed from Error Warning State (1) into Error Passive State (2).
netif_rx: adv_pci-can0: controller problems: 20
netif_rx: adv_pci-can0: TX error counter; tx:80, rx:0
error passive interrupt
Controller changed from Error Warning State (1) into Error Passive State (2).
netif_rx: adv_pci-can1: controller problems: 20
netif_rx: adv_pci-can1: TX error counter; tx:80, rx:0
If the chip had entered Bus-Off state, the current implementation would have performed a chip restart, which is the expected behaviour. Of course the chip cannot handle the amount of data provided, but we would have expected it to progress to Bus-Off and then get rebooted inline with the current restart_ms value of 50ms the driver was started with (default option).
We tested with special implementation via canctl to poke the driver to check the chip registers to see if it is in Buss-Off state, however it was not. The chip also reported the IRQ system was still ON. We were worried the IRQ handler was missing an IRQ, however this test ruled this possibility out.
Other cases online also suggest in Error Passive State others with different hardware and software have experienced the same with no recorded resolution.
The text was updated successfully, but these errors were encountered:
- Implemented command-line option '-R' to allow for Bus error state recovery; particularly for Error Passive State issues.
- Removed legacy debug asserts in netif_wake_queue() and netif_stop_queue(); these were only active for Debug builds.
- Fixed a bug in command-line help option '-r'; it was still '-b' from prior changes.
- Implemented command-line option '-R' to allow for Bus error state recovery; particularly for Error Passive State issues.
- Removed legacy debug asserts in netif_wake_queue() and netif_stop_queue(); these were only active for Debug builds.
- Fixed a bug in command-line help option '-r'; it was still '-b' from prior changes.
Describe the bug
Hardware testing with heavy message flooding results in IRQs stopping when in Error Passive State
To Reproduce
Steps to reproduce the behaviour, start the driver:
Then from another console run the following script to flood heavy amount of CAN-bus messages to the driver:
The driver soon gets overwhelmed with the messages and gets stuck in Error Passive State:
Expected behaviour
According to the documentation of SJA1000 chip, during Error Passive State the device should still receive messages and provide IRQs until the amount of errors reach greater than 256, in which case the device chip should enter Bus-Off state. The chip never enters Bus-Off state, the last netif_rx message we receive is that the chip is in Error Passive State and then no further IRQs arrive.
Screenshots
Platform
Driver
Additional context
If the chip had entered Bus-Off state, the current implementation would have performed a chip restart, which is the expected behaviour. Of course the chip cannot handle the amount of data provided, but we would have expected it to progress to Bus-Off and then get rebooted inline with the current restart_ms value of 50ms the driver was started with (default option).
We tested with special implementation via canctl to poke the driver to check the chip registers to see if it is in Buss-Off state, however it was not. The chip also reported the IRQ system was still ON. We were worried the IRQ handler was missing an IRQ, however this test ruled this possibility out.
Other cases online also suggest in Error Passive State others with different hardware and software have experienced the same with no recorded resolution.
The text was updated successfully, but these errors were encountered: