Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparky SBC | Kernel errors #4080

Open
seniorgod opened this issue Feb 2, 2021 · 23 comments
Open

Sparky SBC | Kernel errors #4080

seniorgod opened this issue Feb 2, 2021 · 23 comments

Comments

@seniorgod
Copy link

Dear all,

i use dietpi since years with my allo sparky und uxbridge (for better sound).
Until last week i use dietpi based on stretch. I don't know if i use dietpi with an older Debian version other than stretch. I set up my system new with the base image of dietpi for stretch in September last year. From that point until last week it plays without a problem.
Last weekend i set up the system new, based on the image for DietPi on buster. Now, two days later i face problems during playback Music. The system freezes with kernel traces.

Is there a new kernel in the image based on buster that changed something?

@Joulinar
Copy link
Collaborator

Joulinar commented Feb 2, 2021

Hi,

basically DietPi is not providing any kernel. This is done by the base image uses. In your case it is a Raspberry OS. Means you are running Raspberry OS Buster Kernel, provided by Raspberry Pi Foundation. And yes there is a totally different kernel used on Buster compare to Stretch.

Stretch: 4.19.42-v7+
Buster: 5.4.83-v7+

@seniorgod
Copy link
Author

seniorgod commented Feb 2, 2021 via email

@Joulinar
Copy link
Collaborator

Joulinar commented Feb 2, 2021

I guess @MichaIng could explain it way better 😃

@MichaIng
Copy link
Owner

MichaIng commented Feb 3, 2021

The kernel is the same indeed, the official Allo Sparky SBC kernel including patches.
Could you give some details on the errors you face?

@seniorgod
Copy link
Author

seniorgod commented Feb 3, 2021 via email

@seniorgod
Copy link
Author

seniorgod commented Feb 7, 2021 via email

@MichaIng
Copy link
Owner

MichaIng commented Feb 7, 2021

Good that it runs stable so far. The kernel errors are not beautiful, but I'm not even sure if they are related to each other. Scheduling and IRQ handling seems to be involved, but I'm not great in reading this.

Could you print:

cat /proc/interrupts
cpu

@MichaIng
Copy link
Owner

MichaIng commented Feb 7, 2021

root@DietPi:/mnt/dietpi_userdata# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       
 29:    6276247    6267591    6608169    8794807       GIC  twd
 30:          0          0          0          0       GIC  owl_wdt
 32:    1581015          0          1          0       GIC  ethernet_mac
 33:          0          0          0          0       GIC  asoc_de
 43:          9          0          0          0       GIC  timer1_tick
 55:          0          0          0          0       GIC  xhci-hcd:usb1
 56:         56          0          0          0       GIC  aotg_hub_hcd:usb3
 57:    4813928          0          0          0       GIC  b0170000.i2c
 58:          0          0          0          0       GIC  b0174000.i2c
 59:          2          0          0          0       GIC  b0178000.i2c
 64:         65          0          0          0       GIC
 74:          1          0          0          0       GIC  sdcard
 76:      55961          0          0          0       GIC  emmc
 78:          0          0          0          0       GIC  hdmidev
 81:          0          0          0          0       GIC  vce_isr
 82:          0          0          0          0       GIC  vde_isr
 89:      20599          0          0          0       GIC  owl_dma0
 90:          0          0          0          0       GIC  owl_dma1
 91:          0          0          0          0       GIC  owl_dma2
 92:          0          0          0          0       GIC  owl_dma3
 93:   40053562          0          0          0       GIC  aotg_hub_hcd:usb4
215:          0          0          0          0  owl_sirq_irq  atc2603c
216:          0          0          0          0  atc2603c  atc260x_onoff
217:          0          0          0          0  atc2603c  RTC alarm
219:          0          0          0          0  atc2603c  atc260x-irkeypad
IPI0:          0          0          0          0  CPU wakeup interrupts
IPI1:          0          0          0          0  Timer broadcast interrupts
IPI2:     196210     865331     498975     562729  Rescheduling interrupts
IPI3:        528        965        943       1058  Function call interrupts
IPI4:        480          8         11          5  Single function call interrupts
IPI5:          0          0          0          0  CPU stop interrupts
IPI6:          0          0          0          0  completion interrupts
IPI7:          0          0          0          0  CPU backtrace
Err:          0
root@DietPi:/mnt/dietpi_userdata# cpu
 ─────────────────────────────────────────────────────
 DietPi CPU Info
 Use dietpi-config to change CPU / performance options
 ─────────────────────────────────────────────────────
 Architecture |     armv7l
 Temperature  |     45'C : 113'F (Optimal temperature)
 Governor     |     conservative
 Throttle up  |     80% CPU usage

                 Current Freq    Min Freq   Max Freq
 CPU0         |      504 MHz      240 MHz    1104 MHz
 CPU1         |      504 MHz      240 MHz    1104 MHz
 CPU2         |      504 MHz      240 MHz    1104 MHz
 CPU3         |      504 MHz      240 MHz    1104 MHz

[ INFO ] DietPi-CPU_info | CPU current frequency, may be affected by this script, due to the processing required to run it.

@MichaIng
Copy link
Owner

MichaIng commented Feb 7, 2021

Okay, CPU temperature and governor work fine, also conservative should be, as the name implies, a conservative choice in regards to stability, as clock rates are adjusted in small steps.

As expected, most IRQs are handled by CPU0 only. Is it possible to change that on Sparky SBC?

echo 2 > /proc/irq/93/smp_affinity

If this succeeds (not every kernel allows it), USB port 4 interrupts should be handled by CPU1 then, so cat /proc/interrupts should show start counting in the CPU1 column:

 93:   40053562          0          0          0       GIC  aotg_hub_hcd:usb4

This could enhance/speedup interrupt handling in general, when CPU0 is very busy already and needs to handle many interrupts then as well, but no guarantee it solves the kernel errors or make any other notable difference.

It can be applied at boot automatically:

echo 'f /proc/irq/93/smp_affinity - - - - 2' > /etc/tmpfiles.d/usb4_smp_affinity.conf

@seniorgod
Copy link
Author

seniorgod commented Feb 7, 2021 via email

@MichaIng
Copy link
Owner

MichaIng commented Feb 7, 2021

This thin_repair stuff is also new to me. So it seems to require either -i and/or also -o to repair a metadata file. Indeed it's possible that the quite old kernel 3.10 starts to have issues with newer libraries/software, especially such like Docker which make use of kernel features. The good news that that everything that you can do with Docker, can be done without Docker as well. Only it requires some more install steps, in case 😉.

last question: the irq affinity ….. when i change this, could there be any risk for the stability of the system?

Only if you do it the opposite way round: Forcefully load all interrupts onto a single CPU. But we move one device from a more loaded CPU to a lesser loaded one (at least when it's about the interrupts), so either there is no difference, or a bit faster interrupt handling for this USB port. If the kernel cannot deal with it, writing to that file would simply fail. E.g. on RPi it's not possible to change it.

@seniorgod
Copy link
Author

i give up here and move roll to dietpi 6.34.3 on stretch for sparky
thanks for you help

@MichaIng
Copy link
Owner

MichaIng commented Feb 8, 2021

Okay. Would be interesting btw if this solves the kernel errors. If they stay, we could consult Allo developers and with some luck a kernel patch could be supplied.

@seniorgod
Copy link
Author

Dear all,

today my dietpi on sparky "crashed" during playback. Beside you find the kernel trace:

Apr 02 16:40:43 DietPi kernel: ------------[ cut here ]------------
Apr 02 16:40:43 DietPi kernel: WARNING: at /imxhdd/opt/1301TAG/sparky_volumio/kernel/net/sched/sch_generic.c:255 dev_watchdog+0x24c/0x26c()
Apr 02 16:40:43 DietPi kernel: NETDEV WATCHDOG: eth0 (owl-ethernet): transmit queue 0 timed out
Apr 02 16:40:43 DietPi kernel: Modules linked in: snd_usb_audio snd_hwdep snd_usbmidi_lib nls_cp437 ethernet spidev atc260x_irkeypad atc260x_cap_gauge autofs4
Apr 02 16:40:43 DietPi kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W    3.10.38 #22
Apr 02 16:40:43 DietPi kernel: [<c0015b34>] (unwind_backtrace+0x0/0x138) from [<c001304c>] (show_stack+0x24/0x2c)
Apr 02 16:40:43 DietPi kernel: [<c001304c>] (show_stack+0x24/0x2c) from [<c002beb8>] (warn_slowpath_common+0x4c/0x6c)
Apr 02 16:40:43 DietPi kernel: [<c002beb8>] (warn_slowpath_common+0x4c/0x6c) from [<c002bf6c>]  (warn_slowpath_fmt+0x30/0x40)
Apr 02 16:40:43 DietPi kernel: [<c002bf6c>] (warn_slowpath_fmt+0x30/0x40) from [<c05ea148>] (dev_watchdog+0x24c/0x26c)
Apr 02 16:40:43 DietPi kernel: [<c05ea148>] (dev_watchdog+0x24c/0x26c) from [<c003a2d8>] (call_timer_fn+0x3c/0x154)
Apr 02 16:40:43 DietPi kernel: [<c003a2d8>] (call_timer_fn+0x3c/0x154) from [<c003ac90>] (run_timer_softirq+0x1c0/0x2c4)
Apr 02 16:40:43 DietPi kernel: [<c003ac90>] (run_timer_softirq+0x1c0/0x2c4) from [<c00339e0>] (__do_softirq+0xf4/0x2a0)
Apr 02 16:40:43 DietPi kernel: [<c00339e0>] (__do_softirq+0xf4/0x2a0) from [<c0033c1c>] (do_softirq+0x4c/0x58)
Apr 02 16:40:43 DietPi kernel: [<c0033c1c>] (do_softirq+0x4c/0x58) from [<c0033e90>] (irq_exit+0x90/0xc8)
Apr 02 16:40:43 DietPi kernel: [<c0033e90>] (irq_exit+0x90/0xc8) from [<c000fbf8>] (handle_IRQ+0x3c/0x94)
Apr 02 16:40:43 DietPi kernel: [<c000fbf8>] (handle_IRQ+0x3c/0x94) from [<c00085e0>] (gic_handle_irq+0x28/0x5c)
Apr 02 16:40:43 DietPi kernel: [<c00085e0>] (gic_handle_irq+0x28/0x5c) from [<c000ef40>] (__irq_svc+0x40/0x70)
Apr 02 16:40:43 DietPi kernel: Exception stack(0xc0c77f68 to 0xc0c77fb0)
Apr 02 16:40:43 DietPi kernel: 7f60:                   ffffffed 00a37000 c0c8b6e4 00000000 c0c76000 c0c76000
Apr 02 16:40:43 DietPi kernel: 7f80: c0c76000 c0d29dec c0c89ed4 414fc091 c07c6920 c0d295bd 00000000 c0c77fb0
Apr 02 16:40:43 DietPi kernel: 7fa0: c0010050 c0010048 60000013 ffffffff
Apr 02 16:40:43 DietPi kernel: [<c000ef40>] (__irq_svc+0x40/0x70) from [<c0010048>] (arch_cpu_idle+0x28/0x38)
Apr 02 16:40:43 DietPi kernel: [<c0010048>] (arch_cpu_idle+0x28/0x38) from [<c00736b8>] (cpu_startup_entry+0x68/0x24c)
Apr 02 16:40:43 DietPi kernel: [<c00736b8>] (cpu_startup_entry+0x68/0x24c) from [<c0c00a34>] (start_kernel+0x2c4/0x320)
Apr 02 16:40:43 DietPi kernel: ---[ end trace ddc8d07941dc5ddd ]---

@seniorgod
Copy link
Author

See the kernel trace above - perhaps it helps. I will also open a ticket at the allo forum at audiophilestyle.com

@seniorgod seniorgod reopened this Apr 2, 2021
@MichaIng
Copy link
Owner

MichaIng commented Apr 2, 2021

It's the same trace you posted above already (the second one), only on CPU 0 instead of on CPU 2 🤔.

The Ethernet connection was lost/timed out. Similar report (CentOS, but doesn't matter as it's kernel-level): https://bugs.centos.org/view.php?id=6249

We updated the lastest Ethernet driver with DietPi v6.33: https://github.com/MichaIng/DietPi/blob/master/dietpi/patch_file#L2631-L2636
Here the related instructions/script that Allo suggests: https://github.com/sparky-sbc/sparky-test/tree/master/sparky-eth
The Ethernet driver was something is the only thing that was updated multiple times recently: https://github.com/sparky-sbc/sparky-test/commits/master
So that is what could be tested (using an older kernel module) or at least have a look into (Allo devs) when debugging the issue.


I'm linking your forum thread here as well, so we can follow easier: https://audiophilestyle.com/forums/topic/62542-music-playing-crashes-and-kernel-trace-on-sparkyusbridge/

Btw a good hint about the volumio kernel patch. To be true I never re-compiled the kernel of our Sparky SBC image, but only updated the system around it + DietPi updates ship individual kernel module updates, as suggested by Allo and shipped with the above sparky-test repository. Re-compiling the kernel + bootloader (+ initramfs) freshly could be a good step.

@seniorgod
Copy link
Author

ok - in that case i close it and hope that Allo ist answering in their forum

@MichaIng MichaIng changed the title differences in kernel between stretch and buster? Sparky SBC | Kernel errors Apr 2, 2021
@MichaIng
Copy link
Owner

MichaIng commented Apr 2, 2021

Okay. Since I might loose track on closed issues, please report back when you have news about it, especially when there is something we can do, a kernel patch, rebuild or workaround.

@bamyasi
Copy link

bamyasi commented May 1, 2021

Since there were no updates coming from Allo and since I've also lost ability to use my Sparky USBridge after DietPi update due to kernel crashes, could you please re-open the issue? It would be much easier to track it using GitHub since Allo does not have any decent tracking system for customers, only a forum.

@bamyasi
Copy link

bamyasi commented May 13, 2021

I got a PM from Allo.com tech support on may 4, they were promising to look into it. But I would not hold my breath, it's a 2 years old model after all. Since I have already switched to using RPi4 currently, I would not mind closing this ticket again. If I hear from Allo.com soon I will let you know anyway.

@MichaIng
Copy link
Owner

@bamyasi
Did you get any feedback from Allo support?

@bamyasi
Copy link

bamyasi commented Mar 26, 2022

No feedback from Allo.com, I guess Sparky hardware is no longer supported. Personally, I have switched to a generic RPi4 based streamer running DietPi OS. Having zero problems so far.

@MichaIng
Copy link
Owner

At least the product page shows "discontinued", the Volumio and Max2Play images aren't offered anymore either. I was thinking about asking Allo for a sample Sparky SBC, if they still have one around, to try getting mainline kernel to run on it. It was reported here that it is supported by mainline kernel device tree: https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.16-New-ARM-Hardware
But actually I'm not sure whether this is correct, since the Actions Semiconductor S700 is an ARMv8 Cortex A53 SoC while Sparky SBC has an ARMv7 Cortex A9 SoC, not sure how that SoC is named. Probably not worth it to invest time, since the Sparky SBC has/had its value with the many available HATs/DACs, and with mainline Linux those (or most) won't work due to missing drivers/device tree overlays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants