-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receiving spinlocks KP randomly #299
Comments
I've been hitting similar on a RockPro64 running Open Media Vault 0.8.3-1141. Every time I try to initialize a raid1 volume I hit this after just a couple minutes. RockPro_Raid_Init_kernlog_Crash.log
|
Are you sure that you don't have underpower issue? This clearly seems to be the one. Maybe too weak power adapter? |
I don't think 3A 5V with my R64 4GB is underpowered |
I'm running the 12v 5A adapter on the RockPro64. Also doing some testing I can dd straight from one disk to the other without hitting this. I'm seeing some of the below errors in kern.log sporadically, but if I md5sum large portions of the disks mirrored via dd I don't see any discrepancies. errors
the dd command I used to copy from sda to sdb
A coupld md5sum checks of what should be identical blocks.
EDIT: Maybe dd doesn't work. I let it run overnight and came back to the box powered down and not respondig. Unfortunately I wasn't able to gather any logs from this. I'm going to run it again to see if I can repro the issue and tail -f kern.log out to a separate file so I can see if I can glean any more info into what happened. EDIT 2: dd isn't stable. I hit this when dding /dev/zero to one of the disks. EDIT 3: I hit the spinlock error when I simply formatted just one of the disks with ext4. |
I actually figured out that I had a bad PCI card. I picked up the one at the link below, and added the "pci=nomsi" boot parameter as outlined in the following forum post and was able to initialize raid1 on 2x white label HGST as well as 2x older intel SSDs I had laying around at the same time while also running "stress -c4 -m16" then further could run dd of /dev/urandom to both raid devices and all is stable. https://www.amazon.com/gp/product/B07KNXZFRH/ref=ppx_yo_dt_b_asin_title_o04_s00?ie=UTF8&psc=1 |
I have the same issue. My configuration: Installed 0.9.14 (tried both armhf and arm64) first, updated all packages, activated SMART monitoring, quick-wiped both disks, created RAID1 via the OMV web interface. Server crashes at random times, web interface not reachable, SSH doesn't work. No dmesg output (via SSH) before the crash. Sometimes still responds to iCMP pings, sometimes doesn't. After a power cycle, RAID is in "sync pending" state, nothing suspicious in kern.log. Then tried a vanilla armbian (stretch), updated all packages, installed OMV via armbian-config, created RAID1 via the OMV web interface as before. This time received a kernel panic while connected via SSH:
Finally tried booting the armbian-setup with Any updates on this issue? |
Can you try to force downclock RAM to 400MHz? I would assume that this is
due to this (this was usually thing in the past). Is not that RAM is bad,
but maybe some setting is off somewhere.
…On Mon, Nov 18, 2019 at 12:22 PM Jonas Windhager ***@***.***> wrote:
I have the same issue.
My configuration:
ROCKPro64 4GB Single Board Computer
ROCKPro64 PCI-e to Dual SATA-II Interface Card
ROCKPro64 Metal Desktop/NAS Casing with fan
ROCKPro64 12V 5A EU Power Supply
2x Seagate IronWolf NAS HDD, 4TB each
Installed 0.9.14 (tried both armhf and arm64) first, updated all packages,
activated SMART monitoring, quick-wiped both disks, created RAID1 via the
OMV web interface. Server crashes at random times, web interface not
reachable, SSH doesn't work. No dmesg output (via SSH) before the crash.
Sometimes still responds to iCMP pings, sometimes doesn't. After a power
cycle, RAID is in "sync pending" state, nothing suspicious in kern.log.
Then tried a vanilla armbian (stretch), updated all packages, installed
OMV via armbian-config, created RAID1 via the OMV web interface as before.
This time received a kernel panic while connected via SSH:
Message from ***@***.*** at Nov 18 10:01:47 ...
kernel:[ 75.976355] Internal error: : 96000210 [#1] SMP
Message from ***@***.*** at Nov 18 10:01:47 ...
kernel:[ 83.247622] BUG: spinlock lockup suspected on CPU#2, scsi_eh_1/284
Message from ***@***.*** at Nov 18 10:01:47 ...
kernel:[ 83.251260] lock: 0xffffff8009121870, .magic: dead4ead, .owner: scsi_eh_1/284, .owner_cpu: 2
Message from ***@***.*** at Nov 18 10:03:59 ...
kernel:[ 240.428491] Kernel panic - not syncing: hung_task: blocked tasks
Finally tried booting the armbian-setup with pci=nomsi, resulting in the
same behavior as with the 0.9.14 image (no error message, just crashed).
Any updates on this issue?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#299?email_source=notifications&email_token=AASOSQNZAVHYAOKOC2HOG63QUJ3IBA5CNFSM4GBUZLP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEKDR3I#issuecomment-554973421>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASOSQP57VJPCZJV63UYAY3QUJ3IBANCNFSM4GBUZLPQ>
.
|
Hi ayufan! Thanks for your reply. How can I force the RAM clock down to 400MHz? |
Look in |
Unfortunately, setting
|
Hi, Is there some person who has this setup with HDD disk and this works correctly? ROCKPro64 4GB Single Board Computer Thanks in advance. |
@romanczukiewiczs Please ask in https://forum.pine64.org/index.php I have massiv problems with pine64 pcie-to-sata adapter. Now i use Marvell 88SE9230 without problems. |
@bullet64 do You use this with pine64rockpro? You did on this setup something with mdadm? |
Yes. Please read about it in my forum. My forum profile https://forum.pine64.org/member.php?action=profile&uid=8601 |
Hi I can get, like you, the issue in 5 minutes with PCIe and SATA disk if I use syncthing and let it scan my files or if I use dd command. When I use the sdcard I can see that when I have the issue, it is always after 10 seconds of DMA failure (DMA is started but we dont get the DMA done interruption). Note: When adding debug messages in the kernel driver, I was taking more time to get the oops, but you will get it It is unlikely that we have similar BUG in eMMC, sdcard and PCIe so I think it is a hardware issue. I am talking about this issue here: |
I couldn't make it work with the Pine64 card. Now using a Marvell-based one (88SE9128) without a single dmesg error for 3 weeks already. |
Bump, I'm having the same issue and I have created a ticket to the Pine64 support staff, waiting for a reply. I feel like there should be more users out there having the same issue, unless other users either don't use/have this card (Asmedia ASM1061), or we're just unlucky to have received defect ones. I'll keep you posted for their reply. |
Do you have (a)synchronous external abort ? Which OS are you using ? I dont have the issue anymore. |
This is the image I use: stretch-openmediavault-rockpro64-0.9.14-1159-arm64. The only odd thing is that when i inspect the chip itself it says ASM1061 but lspci returns |
I also have that: But in any case, you should know that the issue is not related to PCIe at all. |
I did as you suggested; unplugged the PCIe-adapter, booted the board and ran I'm still a bit confused of the model mismatch on the PCIe-adapter and controller. The chip clearly says ASM1061 and controller ASM1062. The spinlock only happens after about 30 minutes after I initiate RAID0 on the HDDs. |
On my card it is also written ASM1061. Maybe you couldn't trigger the issue because the bad memory was unused but reserved by someone else like in the kernel ? Do you have serial console ? If you have similar oops as what jwindhager had, then we are likely talking about the same issue. If I were you I would try to use newer uboot as mentioned before. 90% sure it will fix your problem. |
Update: I followed the steps mentioned in https://forum.pine64.org/showthread.php?tid=8372&page=2, but no ball. the board just crashes within 30 minutes again after initiating my RAID0 resync. Here are the logs:
I haven't really experienced any kernel panics this consistent before, but considering this only happens when I use the adapter and because this line is mentioned in the logs: |
One more thing: |
Alright, I now have a minimal manjaro image up and running on the board. As it only contains 1 partition, (correct me if this is wrong) I replaced the blobs u-boot.img and idbloader.img (not u-boot.itb by mistake). I initiated raid1 and let mdadm do its thing for more than two hours and no oops as of yet, which is fantastic! Now I didn't actually try to lure out a kernel oops before replacing u-boot and idbloader, but I'll check and see if I can somehow use this on OMV on my eMMC. I'll keep you posted. Thanks @abdel-unxp |
Interesting. So, using different boot loader does help?
…On Sun, 26 Jan 2020 at 18:52, Simon ***@***.***> wrote:
Alright, I now have a minimal manjaro image up and running on the board.
As it only contains 1 partition, (correct me if this is wrong) I replaced
the blobs *u-boot.img* and *idbloader.img* (not u-boot.itb by mistake).
I initiated raid1 and let *mdadm* do its thing for more than two hours
and no oops as of yet, which is fantastic! Now I didn't actually try to
lure out a kernel oops before replacing u-boot and idbloader, but I'll
check and see if I can somehow use this on OMV on my eMMC. I'll keep you
posted. Thanks @abdel-unxp <https://github.com/abdel-unxp>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#299?email_source=notifications&email_token=AASOSQPW7ELE47GN7FAZA5LQ7XEW3A5CNFSM4GBUZLP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ5ZXQY#issuecomment-578526147>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASOSQJ4SFHRVKV7BWVEJUDQ7XEW3ANCNFSM4GBUZLPQ>
.
|
Could be. I'm going to flash the stock u-boot.itb blob from the manjaro image onto OMV and see if that helps, following the instruction previously mentioned by @abdel-unxp. I'll report back after testing. |
correct |
Unfortunately it's a bust, the kernel keeps panicking even when using the same U-boot as Manjaro, same oops as before:
I'm open for suggestions, can't really get my head around this one. |
FYI - I have the same configuration and same problem as @jwindhager reported. I have the u-boot-flash-spi-rockpro64.img in place, and am booting from USB instead of an SD card, but the hardware setup is similar (just different hard drives), and I'm running stretch-openmediavault-rockpro64-0.9.14-1159-armhf. I haven't yet tried swapping out for the idloader, but I'll try to do that early next week. |
+1, same hardware Crashes with Creating a File System:
|
OK, Stable or Testing image? |
Testing, as linked. |
OK, Wil try DietPi_RockPro64-ARMv8-Buster.7z | 2020-01-22 01:31 | 98M Can it run OMV? |
Here's how to get started: https://forum.openmediavault.org/index.php/Thread/25062-Install-OMV5-on-Debian-10-Buster/ Try formatting the disk(s) first just to test and please share the results. |
At the OMV forum https://forum.openmediavault.org/index.php/Thread/30666-Create-filesystem-error-communication-failure/ ryecoaaron told me: ayufan is the expert. He suggests too small of a power supply at beginning. Probably an issue with the kernel. I'm running Bionic on mine and have no issues. I would try one of the armbian images (those come from ayufan's work too) - dl.armbian.com/rockpro64/archive/. I have had better luck with the 4.x kernels over the 5.x kernels. |
Well, I've tried Armbian, Ayufan's OMV and Manjaro and all of those just kernel panic for me. The only one that has worked so far is DietPi. Others might get other results, but that seems doubtable as long as the identical hardware is used.
|
https://dietpi.com/phpbb/viewtopic.php?p=5#p5 Installed Dietpi as minimal with OMV as stated here https://forum.openmediavault.org/index.php/Thread/25062-Install-OMV5-on-Debian-10-Buster/ Errors were encountered while processing: root@DietPi:~# cat /etc/issue |
on the WebIf Welcome to nginx! If you see this page, the nginx web server is successfully installed and working. Further configuration is required. For online documentation and support please refer to nginx.org. Thank you for using nginx. |
Feb 03 21:12:00 DietPi systemd[1]: Starting OpenBSD Secure Shell server... Creating config file /etc/php/7.3/fpm/php.ini with new version |
dpkg: error processing package openmediavault (--configure): Done Traceback (most recent call last): |
Please don't clutter this thread with issues that aren't related directly to the issue. Try formatting the disk(s) instead and see if that works without the kernel panicking. Regarding your current issue, try the stable version of DietPi and then install OMV. Any issues should be reported on their forums. |
Above instruction won't work anymore on Stable DietPi... Some packages could not be installed. This may mean that you have The following packages have unmet dependencies: Sorry can't test for you... |
Which rom are you using? DietPi or something else? The issue in this thread was also specifically reported for the pcie-adapter from pine64 and hence should be discussed around that but thanks for reporting back of the success. |
Armbian_19.11.7_Rockpro64_buster_legacy_4.4.208.7z from https://dl.armbian.com/rockpro64/archive/ My Pine64 PCI card root@rockpro64: |
I found at my computing stuff a WD 1TB Green Caviar HDD and tried it without the external power supply, similar like before with the 3TB WD Red HDD Got the same errors as before , but it took a little longer before the system crashed
|
How to bring it down to the specific cause? Power supply or PCI-SATA card or ? Faulty hardware or caused by software? |
I installed bionic-minimal-rockpro64-0.9.16-1163-armhf.img.xz from https://github.com/ayufan-rock64/linux-build/releases/tag/0.9.16 with the updates: I opened a second ssh session with the command: and did some debugging with the command: here is the log: * Documentation: https://help.ubuntu.com System information as of Sun Feb 16 15:20:07 UTC 2020 System load: 0.14 Processes: 168 * Multipass 1.0 is out! Get Ubuntu VMs on demand on your Linux, Windows or https://multipass.run/ ATA controller: and this hard drive: here's the HDD's SMART report: === START OF INFORMATION SECTION === === START OF READ SMART DATA SECTION === General SMART Values: SMART Attributes Data Structure revision number: 16 SMART Error Log Version: 1 SMART Self-test log structure revision number 1 1 Extended offline Completed without error 00% 1335 -2 Extended offline Completed without error 00% 1187 -3 Extended offline Completed without error 00% 1062 -4 Extended offline Completed without error 00% 862 -5 Extended offline Completed without error 00% 696 -6 Extended offline Completed without error 00% 549 -7 Extended offline Completed without error 00% 411 -8 Extended offline Completed without error 00% 267 -9 Extended offline Completed without error 00% 121 -SMART Selective self-test log data structure revision number 1 As a layman myself, I see some "errors" with the dmesg command: [ 1.479161] vcc5v0_host: no parameters [ 1.845571] phy phy-ff770000.syscon:usb2-phy@e460.1: Failed to get VBUS supply regulator [ 1.851160] rockchip-pcie f8000000.pcie: Looking up vpcie3v3-supply from device tree [ 3.135818] rockchip-spi ff1d0000.spi: no high_speed pinctrl state [ 3.946085] i2c i2c-10: of_i2c: modalias failure on /dp@fec00000/ports [ 4.132784] xhci-hcd xhci-hcd.9.auto: Host not halted after 16000 microseconds. [ 4.951028] cdn-dp fec00000.dp: Direct firmware load for rockchip/dptx.bin failed with error -2 [ 14.382190] ff100000.saradc supply vref not found, using dummy regulator [ 14.694857] FAT-fs (mmcblk0p6): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive! Questions:
|
Just a few minutes ago I saw this from my byobu remote session:
And then my R64 just reached a complete lockdown. Have no idea what happened, what caused this.
The text was updated successfully, but these errors were encountered: