-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USB 3 hangup during file transfer #112
Comments
Can reproduce, having the same problem. Seems to only happen during periods of heavy writing; when I only run my read-heavy task things seem to be fine. I have two USB 3.0 hard drives on an unpowered hub attached to the USB 3.0 port on the Rock64. These errors make both of them malfunction, even though I'm only writing to one. I've attached a more complete log, but I'm seeing many of the same error messages as @klerai above:
Followed by some:
And finally a large pile of Ext4 errors. |
Have you eliminated power supply problems (also on the hub), the USB3 spec only allows up to 900mA per port... |
I'm running three wall powered (have their own individual power supply) desktop hard drives off of an unpowered hub. I was under the impression that there wouldn't be any power draw (except for maybe the hub's status light), and that power wouldn't be the issue, but I'm no electrical engineer, so that may not be true. It may also be relevant that, upon rebooting, one or two of the drives will often not be recognized as connected (i.e. don't show up in the output of |
The USB 3.0 power specs were taken in consideration. Beside the Rock board with its standard power adapter, my USB in use is powered by its own 4 A power supply. All tests runs were duplicated on a Gigabyte BRX i5 desktop machine (flawlessly). The following tests were executed
My initial work was done on the Rock64 with release 0.5.14: jenkins-linux-build-rock-64-133. - It ended with the errors reported in my first message. My experiences with other systems showed that USB dropouts of external disks are usually handled by the kernel as soon the sub-system becomes back online, while the hub stays online. But with the Rock64, the whole USB channel disappears no unplug or re-plugin helps. The board has to go through a complete reboot. Below are my relevant kernel messages. ==== ================================================== |
It's also quite clear that this issue is specifically ties to disk writes: I mounted the same disks as read-only and have been running a read-intensive workload for at least a week now without issue. When I run the same read-intensive workload but mount the disks RW, the issue occurs after a few days, I'm assuming as a result of the constant updating of atimes. If I run a write-intensive load on the same disks, I get this issue within hours if not minutes. |
Same problem here. I'm trying to switch to the USB 2.0 port to see if it helps. System hangs when there are heavy writes to the external drive. |
Just came to report the same issue - this time by hitting two USB3 ethernet adapters at the same time (with connection bonding software). |
Can confirm (again) that this has happened many more times with moderately
heavy USB 3.0 I/O -- I was actually forced to stop using the Rock64 since
it couldn't run my workload without failing.
…On Sat, Jun 16, 2018, 4:56 PM Marcos Scriven ***@***.***> wrote:
Just came to report the same issue - this time by hitting two USB3
ethernet adapters at the same time (with connection bonding software).
I'll try another hub, but there's too many reports both on this issue, and
this forum post https://forum.pine64.org/showthread.php?tid=5557 to
suggest it's anything other than the Rock64 at fault here.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABZ8bF-UuXEdcAFCTLOOt5PRqcusW1K5ks5t9Ry0gaJpZM4STaMx>
.
|
Can confirm. Just yesterday my Rock64 halted but I have no logs available because my system was booted from an external USB3 HDD, so presumably, USB3 is the problem of my random freezes. I used to reproduce this behavior by setting my APM and AAM to the best performant option possible (and hence power-hungry), maybe I triggered the polyfuse or some other protection mechanism that I'm unaware of? Or is it the temp sensor detected an anomaly so it just went off without leaving any logs for me... Thankfully there are no bad blocks. The system has ran without trouble for a week before. If this keeps going on how am I going to keep this R64 for my NAS solution? EDIT: It happened again 30 minutes ago. I was under both frequent reads and writes that I have BitTorrent and Plex running alongside each other. Too bad I hadn't got a spare and long enough USB cable to see what happened. That's why you shouldn't put a headless server around your router only. EDIT 2: Just realized the absense of logs are pretty reasonable to be caused by folder2ram, so the logs are discarded in memory when I forcefully reboots. Gotta see something interesting in the dmesg after removing folder2ram if the USB hell ever happened again. EDIT 3: Nope, nothing special beside the spamming of this line during the USB hang up:
So yeah I think I tripped the USB3 chipset, it has some flaws in it. |
The same for me. |
Can you check if the same happens on 0.5.15?
Kamil
…On Mon, Jul 16, 2018 at 11:33 AM, Redwid ***@***.***> wrote:
The same for me.
Two 2.5" disks connected to usb 3.0 powered hub. On rsync data from one to
another - disks disconnects. No way to return them back, only reboot helps.
I'm on 4.4.126-rockchip-ayufan-239.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACTpQZSNzeh-NwRxUB_ec6dWqG_NjwBrks5uHF32gaJpZM4STaMx>
.
|
Thanks ayufan, I'll try in the evening. |
Yes
…On Mon, Jul 16, 2018 at 11:49 AM, Redwid ***@***.***> wrote:
Thanks ayufan, I'll try in the evening.
Is that 0.5..15, right?
https://github.com/ayufan-rock64/linux-build/releases/tag/0.5.15
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACTpQTpAvjpv9AvZQVHqMvfmPQdJ5aE_ks5uHGGxgaJpZM4STaMx>
.
|
The same result for jessie-openmediavault-rock64-0.5.15-136-armhf.img.xz
|
The same result on latest armbian firmware. But in general it allows me to copy more (>100Gb) that ayufan's ones :( |
What do you see when USB3 drops? |
Just nothing, no disks at all. I'm doing rsync copy of urbackup data from one disk to another. |
I'm seeing this issue here. I reported it to m01/rock64-arch-linux-build#2 and to https://archlinuxarm.org/forum/viewtopic.php?p=58934#p58934 I'm using a powered USB 3.0 hub from Amazon. All three hard drives connected stop working, it appears the controller runs into issues and stops working. I have two laptop external drives and one desktop external drive that has it's own separate power apart from the USB hub power. EDIT: Running Arch Linux ARM with linux-aarch64-rc kernel package
|
I have since switched back to USB 2.0, it's slower but I have yet to run into any crashing. EDIT: Since switching back to using the USB 2.0 ports I noticed some errors about my luks encrypted USB hard drives being misaligned has gone away. |
I do have this set up: If I do that connection: I've used this hub and power supply before in my other single board NAS (banana pi pro). The board able to handle 4 disks simultaneously at that time. |
I switched my USB hub out for a "Anker 10 Port 60W Data Hub with 7 USB 3.0 Ports and 3 PowerIQ Charging Ports" Now I cannot even mount any drives or do anything USB with that hub attached. Below is the log from dmesg.
|
Here is one more snip of dmesg log that might be useful, it dumped a trace
|
Still happens on 4.4.132-1075-rockchip-ayufan-ga83beded8524. I've posted an issue to the Rockchip's repository. |
I no longer have a issue running 4.4.138-1094-rockchip-ayufan-gf13a8a9a4eee After changing my USB 3.0 hub to one with enough power, USB 3.0 appears to work so far. The mainline kernel build from ayufan still has issues as well as the mainline kernel from Arch Linux ARM. I suspect there is a positive change made by ayufan or the maintainers over at https://github.com/rockchip-linux/kernel to the USB 3.0 drivers. Someone should try to get those changes merged into the mainline kernel source. Hub used is a "Anker 10 Port 60W Data Hub with 7 USB 3.0 Ports and 3 PowerIQ Charging Ports" which does work fine despite earlier comments here. |
This is great to hear :)
…On Fri, 10 Aug 2018 at 18:14, Lance ***@***.***> wrote:
I no longer have a issue running
4.4.138-1094-rockchip-ayufan-gf13a8a9a4eee After changing my USB 3.0 hub to
one with enough power, USB 3.0 appears to work so far.
The mainline kernel build from ayufan still has issues as well as the
mainline kernel from Arch Linux ARM. I suspect there is a positive change
made by ayufan or the maintainers over at
https://github.com/rockchip-linux/kernel to the USB 3.0 drivers. Someone
should try to get those changes merged into the mainline kernel source.
Hub used is a "Anker 10 Port 60W Data Hub with 7 USB 3.0 Ports and 3
PowerIQ Charging Ports" which does work fine despite earlier comments here.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACTpQd7kVx6TkWQIMUuTQcG289RyUbjYks5uPbFXgaJpZM4STaMx>
.
|
just tried 4.4.138 with usb3, hub anker https://www.anker.com/products/variant/USB-3.0-SuperSpeed-10-Port-Hub-/68ANHUB-B10A and powered external hard drives. no luck, after blacklisting uas and rebooting, I tried to transfer a 40gb file and eventually crashed. transfer speed was above 50MB/s, twice more than 3 times faster than usb2 :( [ 135.890718] xhci-hcd xhci-hcd.9.auto: xHCI host not responding to stop endpoint command. [ 136.203624] EXT4-fs error (device sda3) in ext4_writepages:2655: IO failure ` |
Mine is a Samsung Evo 860 1TB SSD. Only with the Rock64 it gives me trouble. |
I have a RPi 4 lying on my desk for 24 hours now. The 'bad' news at least for Rock64 owners: RPi 4 USB3 storage performance is slightly better than Rock64. The 'good' news: utilizing more than one USB3 disk at the same time is the same sh*t show as everywhere else in SBC land: https://www.cnx-software.com/2019/06/24/raspberry-pi-4-benchmarks-mini-review/#comment-564128 |
OK, so I have an interesting update: I hope this helps other somehow. |
@meetyg You scared me with your "Mine is an EVO 860 1Tb" since this is exactly the model I've ordered... My only hope was that the actual hardware would change during the commercial lifetime of such a product. I've received mine a few days ago, and after many tries (ie hammering on the disk to make it consume as much power as possible), everything works without any trouble. Copying large amounts of data (so its internal cache can not contain everything and it has to really write the data it has received) while reading some other data on the disk works without any problems. The funny thing is, on the "specifications" page that I've looked at before buying, it says that it consumes max. 4 W (under 5 Volts, that means 800mA). When I received the ssd, on its back it says "max current: 1.2A" (which would be 4 W under 3.3V). |
People in this thread seem to be discussing a wide range of USB issues in this, but I'd like to talk about the starting point: USB hard disks connected via powered USB3 hub. I'm also running into this USB3 (xhci, dwc3?) bug. I first encountered it about a year ago when I purchased the RockPro64 and attached multiple disks to a powered USB hub. It only occurred when transferring between two of them. After a bit of fiddling I gave up on this and favored the stability of USB2 for the hub. [single disk on USB3 works perfectly fine, zero usb related crashes for a year] This is the kernel error encountered when running multiple disks over a USB hub, which seems to be the same as OP: It's kinda frustrating that this doesn't seem to have been fixed after a year, so I started fiddling with it again to try to get it to work. Troubleshooting steps taken:
Researching this issue also points towards threads in other forums related to DMA, but I'm not sure whether that could be the culprit here (https://e2e.ti.com/support/processors/f/791/t/550846?USB-3-port-file-copy-leads-to-kernel-Panic-with-Processor-SDK#). Is there any hope for this to be fixed in the foreseeable. future (maybe upstream?) or do we have just broken hardware here?
dito. I feel mildly duped. Any advice for a different board? How's the Raspi 4 in regards to USB3?
Thanks for posting this. This patch seems to be from 2015 though. It doesn't seem to have been adapted to mainline (xhc_late_csc doesn't exist in my sources), but maybe I can apply it to the recent kernel somehow (gotta change the quirk bit at the very least). Are we even sure this addresses the same issue though? PS: Sorry if I missed something important in this thread, there's over 100 responses. |
Seems this issue is also affecting the Roshambo SNES case with its SSD cartridge:
Everytime this happens, the SSD resets:
Can't even format the SSD using Gparted. Moreover, on this distro I can't even disable UAS (tried modprobe.d/blacklist.conf, and using quirks), but none of them worked:
Edit: I don't think it could be an undervoltage issue, as the power adaptor is 12v 5A, should be enough to power a fan and the SSD (over USB 3.0) and one USB 2.0 keyboard. |
@Maxpako The undervoltage does not only happen with a power supply that would be too weak, it can also happen between the voltage regulator of the SBC (ie your RockPro64) and the ssd in its enclosure (ie through the usb cable + usb to sata adapter). This could come from a usb cable that is too long and/or too thin or it could simply be that your ssd produces power spikes that the SBC's voltage regulator can not digest (thus letting the voltage drop). In my case, everything worked fine on my laptop but every now and then when connected to my RockPro64, my ssd would get corrupt from such errors (it was a Samsung 840pro). After replacing it by another ssd (samsung 860evo) and hoping for the best, everything works (but it seems that even the voltage and power draw specifications change during the lifetime of a specific product number: earlier 860evo did produce such problems while mine works fine). I wanted to measure the voltage in my enclosure while it runs, but you would need very thin connectors and most probably a scope that can record the voltage trace... |
Still people believe in driver issues when dealing with hardware problems... If you want to UAS blacklist your USB-to-SATA adapter you could follow this route (the important part being |
With UAS blacklisted, and updated initramfs:
syslog:
lsusb and lsusb -t:
Blank driver XD I'll need to think about the voltage drop since many distributions show the same issues :\ |
I'm measuring 5.149-5.150 on the USB 3.0 power input (red and black wires) from the Roshambo case, and for me, it looks stable... but still doing the same thing. What else would you suggest to check? |
By the way, I don't think it's a hardware issue, since I've tested Android 7.1 over emmc, and the SSD over usb 3.0 works like a charm: 17.8MB/s transferring 2,34 GB file in 2 minutes and 19 seconds (from emmc to the SSD). Edit: Copying the files directly on the SSD (doing a copy inside another folder from the same SSD) works much faster: 2.34GB in 38 seconds => 63MB/s |
May I add a datapoint to this thread (I think it's one the most referenced regarding this issue). How it started: A powered USB3.0 hub was connected to my Rock64. Two USB3.0 disks were connected to the hub:
Both Seagate enclosures were UAS blacklisted since the beginning because I had problems reading SMART values. Initially, only Disk1 was plugged and everything was fine: Torrents downloads/uploads, network shares (large files transfer) used in both directions. Once done with configuration, I start initial backup for the main computer (~2.2TB). I never managed to get this one finished. It would randomly fail after 10 to 200GB. The failure would likely happen when I started large torrent downloads (on the other disk), or concurrent file transfers. When it failed, it was just rebooting and I could not find any relevant error messages in logs/dmesg... Not even usb3.0 stream/ring -related errors. I then started reading messages here and I tried all sort of combinations: hubs, no hubs, powered and non-powered ones, only Disk2 connected directly to USB3.0 port, various 5V and 12V adapters for hubs and Disk2... I started suspecting the Seagate enclosure. It is old and a lot of people complain about it regarding UAS/SMART compatibility. Nice piece of hardware, but no UAS support at all (does not really matter) and it would still fail under heavy loads (same symptoms). Could not find datasheet so I don't know what USB3.0 to SATA chip is used there... At this point I was about to give up, sell everything, and try Raspberry Pi4 which has supposedly a better USB3.0 implementation... Or a x86 miniPC, just got tired of these problems with arm SBC :) Before taking my decision, I ordered these docks from Amazon:
The SALCAR would work on my linux computer but disks would not even appear on Rock64... Did not investigate much, at this point it just became too frustrating. And finally... The FIDECO (JMS561U) just works PERFECTLY !!! In summary, it seems that there is indeed a problem with USB3.0 on the Rock64 with current linux version (hardware or software I don't want to know). Some adapter chips are better supported than others and I recommend going with JMS561U as it works perfectly for me. ORICO and UGREEN seem to build similar docks with JMS561U chip, let us know if that works for you. Side note:
Hope that helps somebody. |
there appears to be a (potential) fix for this issue. I first stumbled upon it in this thread: https://community.nxp.com/thread/511218 (0001-usb-dwc3-disable-park-mode.patch) There's also discussion concerning this mysterious park mode on a kernel mailing list: https://lkml.org/lkml/2019/10/14/496 In any case, applying the 0001-usb-dwc3-disable-park-mode.patch to linux-5.5-rc5 seems to have entirely fixed the issues for me. In the first control test the xhci controller died after a few seconds and currently it is transferring data via a usb3 and a hub for >15 minutes. I will report back concerning long term stability. At this point I recommend everyone experiencing usb3 instability to give it a shot. Maybe this can also be incorporated into the linux-mainline-kernel repo for rockpro64? |
I also into the dreaded "xHCI host not responding to stop endpoint command" problem. I am fairly certain, that this is not a power issue, because i first tried mit setup with x86 motherboard: 5TB 2.5" USB3 HDD via Amazon basic USB3 hubs, and it was unstable. I measured voltages and saw voltage level problems under load. So now, power setup is: PC 400W power supply for everything. As soon as i just copy (rsync) from one disk to another, i get the dreaded problem within a minute and loose all disks. The dreaded problem does not happen when i use x86 motherboard or RPI4. Just that RPI4 is terribly slow, and x86 of course too large for my target physical box. Did not have success getting a 5.x kernel work on rockpi4a yet, so limited to 4.4 kernel provided by radxa. Still hoping, the problem would be fixed sometime in some kernel sigh |
v0.10.0 just shipped with the |
Throwing my two cents in here: I was having this issue as well. At first I was using a Rock64, but then I switched to an Orange Pi 4 (which uses an RK3399) and had the same issue there. I was able to apply the patch linked by @Deathcow to kernel 5.4.49 -- once I did, I ran near-constant file transfers all weekend long and didn't have this issue come up once. |
So... has this patch made it to any of the ayufan kernels yet? |
It appears to have been added in mainline in v5.7: torvalds/linux@7ba6b09 |
You can try the latest 5.10:
https://github.com/ayufan-rock64/linux-mainline-kernel/releases
…On Mon, Jan 11, 2021 at 12:22 PM Marek Benc ***@***.***> wrote:
So... has this patch made it to any of the ayufan kernels yet?
It appears to have been added in mainline in v5.7: ***@***.***
<torvalds/linux@7ba6b09>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASOSQLAQSITFRVSZP5D77TSZLNNVANCNFSM4ESNUMYQ>
.
|
@dusxmt @ayufan I have been following but I have not been using my Rock64 due to this issue. So I might be a bit behind on the progress made here. If the kernel I'm using contains the needed fixes, do I need to add anything to kernel command boot line or any other configurations to make sure the fix is enabled? |
I don't think so, the fix checks the hardware revision of the dwc3 chip and conditionally applies itself. What you might need from @ayufan are the device tree patches of his kernel, to allow the kernel to see the USB3 chip. (Simply installing his kernel build should do) I wonder what the status with upstreaming the device tree changes is. |
(I can't confirm this though, because I was merely doing before-i-buy research for this board, I don't have the hardware just yet) |
Does anyone have any new updates on this? Will the required changes make it to a upstream kernel? And in the meantime @ayufan's latest kernel has all the required fixes and they work? |
@ShapeShifter499 It does appear to be in the USB Subsystem's tree: https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=testing/next&id=1080ed2a6cc53a07afdb1cd3f3968e808b714bbb - so it's only a matter of time before it's in mainline. |
I'll dig up my old rock64 to see if it works fine on a upstream Arch Linux 5.11 kernel |
Keep in mind, upstream 5.11 might not have the device tree patch that I linked. |
Wait so the 5.7 kernel patch linked to earlier isn't all that's required then? |
That patch fixes the hangup problem, the device tree patch makes the USB3 controller visible to the system in the first place. |
I see, so either wait for mainline, patch a personal build, or use ayufan's kernel. But basically the issue is known and there is a fix now. Just making sure because I don't want to spend the time only to be disappointed again. I originally brought my device in hopes of making a low powered, small NAS but that was nailed by the issues discussed here. I might have other plans I could use my board for but I don't want to worry about the USB failing on me if I decide to use it like it has before. |
During file transfer (rsync) over a powered USB 3 hub from sda to sdb, the kernel disconnects the USB channel after a few minutes. The board needs be rebooted.
The failure occurs with the first message in the kernel log (more details further down):
Feb 26 14:26:30 r64omv kernel: core: dev_pm_opp_get_voltage: Invalid parameters
messages displayed on the terminal during rsync session:
Message from syslogd@r64omv at Feb 26 14:27:37 ...
kernel:[ 1440.377657] Kernel panic - not syncing: hung_task: blocked tasks
Message from syslogd@r64omv at Feb 26 14:27:38 ...
kernel:[ 1440.849084] Kernel Offset: disabled
Message from syslogd@r64omv at Feb 26 14:27:38 ...
kernel:[ 1440.851529] Memory Limit: none
journalctl -k
Feb 26 14:21:13 r64omv kernel: sd 2:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Feb 26 14:21:13 r64omv kernel: sdb: sdb1
Feb 26 14:21:13 r64omv kernel: sd 2:0:0:0: [sdb] Attached SCSI removable disk
Feb 26 14:21:13 r64omv kernel: BTRFS: device label DATA devid 1 transid 23 /dev/sdb1
Feb 26 14:21:13 r64omv kernel: BTRFS info (device sdb1): disk space caching is enabled
Feb 26 14:21:13 r64omv kernel: BTRFS: has skinny extents
Feb 26 14:26:30 r64omv kernel: core: dev_pm_opp_get_voltage: Invalid parameters
Feb 26 14:26:30 r64omv kernel: mali-utgard ff300000.gpu: Failed to get voltage for frequency 163840000: -34
Feb 26 14:26:30 r64omv kernel: devfreq ff300000.gpu: Couldn't update frequency transition information.
Feb 26 14:28:05 r64omv kernel: xhci-hcd xhci-hcd.8.auto: xHCI host not responding to stop endpoint command.
Feb 26 14:28:05 r64omv kernel: xhci-hcd xhci-hcd.8.auto: Assuming host is dying, halting host.
Feb 26 14:28:05 r64omv kernel: xhci-hcd xhci-hcd.8.auto: xHCI host not responding to stop endpoint command.
Feb 26 14:28:05 r64omv kernel: xhci-hcd xhci-hcd.8.auto: Assuming host is dying, halting host.
Feb 26 14:28:05 r64omv kernel: xhci-hcd xhci-hcd.8.auto: Host not halted after 16000 microseconds.
Feb 26 14:28:05 r64omv kernel: hub 5-1.1:1.0: hub_port_status failed (err = -22)
Feb 26 14:28:05 r64omv kernel: xhci-hcd xhci-hcd.8.auto: HC died; cleaning up
Feb 26 14:28:05 r64omv kernel: usb 5-1.1-port1: cannot reset (err = -22)
Feb 26 14:28:05 r64omv kernel: usb 5-1.1-port1: cannot reset (err = -22)
Feb 26 14:28:05 r64omv kernel: usb 5-1.1-port1: cannot reset (err = -22)
Feb 26 14:28:05 r64omv kernel: usb 5-1.1-port1: cannot reset (err = -22)
Feb 26 14:28:05 r64omv kernel: usb 5-1.1-port1: cannot reset (err = -22)
Feb 26 14:28:05 r64omv kernel: usb 5-1.1-port1: Cannot enable. Maybe the USB cable is bad?
The text was updated successfully, but these errors were encountered: