Fix write error on board reset#88
Conversation
|
hmm that is weird, the My reason is that tud_msc_write10_complete_cb() is invoked after WRITE10 is complete, returned from that function will allow tinyusb stack to queue and receive more SCSI command, and board reset could occur before device response and cause the error. Maybe OS expect device to also response with TEST UNIT READY after WRITE10 or something. I will do more in-depth test. |
|
Ah. I had the issue before you added the while(1). I just merged that into my workaround. Let's try. |
|
With while(1) in tud_msc_write10_complete_cb (as currently on master): Error when file is uploaded. With while(1) removed there (as it was before): Error when file is uploaded. With this workaround (including the while(1): No error. With this workaround (without the while(1)): No error. Tested with tinyusb stack that comes with the Pico SDK, with Windows 10. |
Hmmm. Perhaps even deferring the reset until the main loop has cycled at least once without any commands received, could improve behaviour more? Not familiar with how the USB stack queues up these commands, but at least whatever is queued should get responded to then... |
|
Do you have a USB bus trace (or even storage command block level trace), as viewed from the host PC? There may be interactions caused by deeply rooted architectural decisions in both ghostfat and windows. (note: my windows knowledge is from >10 years ago, so it might be dated). Random thoughts on potential examples
GhostFAT sets the BPB to indicate it has a unique volume serial number, volume label, and file system label: According to Microsoft's FatGen103 (see section for BPB "Starting at Offset 36", around page 10), the volume ID enables "volume tracking on removable media". Generally, the file system and memory manager in Windows used to work hard to optimize the user experience. In the past, this included caching writes to the media, for a period of time. This is why it used to be required to "safely eject" USB flash drives, as writes would be lost when users disconnected the USB device before Windows was done flushing the cached data. I think Windows 10 is a bit "safer", and no longer caches the data for USB flash drives? Not sure? Another item I've noticed on ghostfat volumes... Windows believes that it is able to create files on the media. When I previously used command.exe, and listed the file system of a ghostfat volume, I think I sometimes saw additional system-generated directories... maybe something was generating per-volume metadata? This is an ongoing risk for ghostfat ... any OS might send writes to ghostfat, get a success message, but then fail in unexpected ways if the OS needs to read that data back from the device. Most times, the device won't have to provide that data, as the OS cached it. But ... it's still a risk. Finally, there's no guarantee that the file data will actually be sent to the device in LBA order. The storage stack was notorious for this in the past, and I believe the USB storage stack would often switch the order of two IRPs / URBs in some cases. Does ghostfat (or another part of this stack) ensure that the entirety of the UF2 file has been sent to the device, before it considers the write as being complete? |
@kaetemi -- that's an excellent question. you would think that some USB storage device somewhere exposed an "eject button", but my search skills couldn't find any suggestion that this was the case. It would be fairly useful for storage devices. Expose an "eject button" (e.g., via HID), and then get notified by the OS in some way that the OS is ready for the device's departure. Of course, the device could disappear at any time anyways, inline with USB, but ... Unfortunately, I'm not aware of this capability being built-in to the common OSs. Maybe @hathach knows of something ... he's much deeper into the USB side than I would be. |
UF2 contains an ID in each block and the total number of blocks in the "file". When the device got all the blocks it's interested in, it'll happily reset. From the device POV, everything's OK. But, that block numbering does not actually apply to the whole file either, it only applies to each portion of the UF2 file respectively. It's allowed by the UF2 spec to throw multiple firmwares (or other content) into a single UF2 file. Given that they'll have different family IDs set, the device will ignore everything except the part it's interested in, and reset as soon as it got what it needed. The block number in the UF2 file only applies to a single firmware contained in it, not the whole file. This case doesn't seem trivially fixable in a clean way. From the device POV, everything's still OK, but the OS will report a write failure, since the files are never fully written.
Yep. At least the
Looking through some documentation, it seems there's a command called PREVENT ALLOW MEDIUM REMOVAL, but I'm going to assume here that some OS will just always flag this, and not just when it's busy. I do vaguely recall CD-ROM drives back in the day refusing to eject while they were busy, but that might've just been internal behaviour...
I'll try that out. So, there's two problem cases..
A guess... Maybe it's just writing to the FAT afterwards. |
Then it would not give write errors when the user is copying the file, though. Just the badly ejected media notification (which in this situation is more acceptable than "write failure" with the giant red X error icon). (Although I guess the caching might probably delay things for too long.) |
|
One option to consider: We're well into "hack" territory here. As to prevent/allow medium removal ... that command only applies to devices with removable media. CDROMs, Iomega ZIP / JAZZ drives, etc. Here, the UF2 device is removable, not the media in the device, so it doesn't apply. |
|
@kaetemi please try to use the tinyusb that is included with this repo, that's way we are testing the same software. There are lots of improvement for rp2040 in latest release https://github.com/hathach/tinyusb/releases/tag/0.9.0 . I will also bump up tinyusb version for this repo soon as well. To be honest, this walk-around isn't look good to me. It delay the reset a bit (probably enough to response with TEST_UNIT ready) and could have timing/race condition. We should reset at the right time (after TEST_UINIT_READY) if it is what windows is looking for. I am mostly on Linux which handle this differently, I will try to test with windows. However, since your setup is unique I am not sure if I could reproduce it on other platform. Meanwhile, can you re-compile the project with |
|
Here's the logs I'm getting, two variations. (With this workaround enabled, and some extra prints added.) The status is sent after the callback. That might be the cause? |
|
Swapping the callback to happen after the status is sent appears to fix the issue. Log at the end becomes as follows. |
|
Oh. You're right. There's some different behaviour there. This is under MSC_STAGE_STATUS in Pico SDK tinyusb, but under MSC_STAGE_STATUS_SENT in 0.9.0. https://github.com/raspberrypi/tinyusb/blob/pico/src/class/msc/msc_device.c#L628 |
|
Log with 0.9.0. |
|
thank for testing it out. |
|
Np. Thanks for the help. :) |

Currently (at least, for me) the reset of the board is always causing Windows to report a write error (which is not very user friendly) when the file is finished uploading.
Deferring the reset, until after
tud_taskhas completed, appears to solve this.(It does not fix the write error case when multiple firmware families are concatenated in a single file, though. Is there any mechanism to request a safe eject from the OS?)