Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed writes and auto-reset problems when writing from Windows #111

Closed
dhalbert opened this issue Mar 14, 2017 · 45 comments
Closed

Delayed writes and auto-reset problems when writing from Windows #111

dhalbert opened this issue Mar 14, 2017 · 45 comments
Milestone

Comments

@dhalbert
Copy link
Collaborator

dhalbert commented Mar 14, 2017

I have had problems writing to the onboard filesystem from Windows. I sometimes see complete filesystem corruption, and sometimes just problems with one file. See https://forums.adafruit.com/viewtopic.php?f=60&t=109687 for background.

Specific scenario, showing a file that CPy has trouble reading:

The file in question is here: main.py.txt. (Renamed from main.py to main.py.txt so that GitHub will take it as an attachment.) I wrote this Python code while doing some CPy I/O testing, and the specific code probably doesn't have anthing to do with this problem. However, this file is 556 bytes long, so it's more than one 512-byte block, which does seem to be important.

If I write this file to D:\CIRCUITPY\main.py using NOTEPAD.EXE, it runs just fine. It prints "5" and the button-reading loop works.

If I write this exact same file using Notepad++ (a very common lightweight editor used on Windows) it does not work. The serial port shows:

Auto-soft reset is on. Simply save files over USB to run them.
main.py output:
Traceback (most recent call last):
  File "main.py", line 26
SyntaxError: invalid syntax

Line 26 is right around the 512-byte boundary in the file.

In the REPL, I can read the file I wrote with NOTEPAD.EXE:

>>> import uos
>>> uos.listdir()
['System Volume Information', 'main.py']
>>> f = open('main.py', 'rb')
>>> chars = f.read()
>>> chars
b'import nativeio as io\r\nfrom board import *\r\nimport time\r\n\r\nled = io.DigitalInOut(D13)\r\nled.switch_to_output()\r\n\r\nswitches = [io.DigitalInOut(pin) for pin in (D6, D10, D11, D12)]\r\nfor switch in switches:\r\n    switch.switch_to_input(pull=io.DigitalInOut.Pull.UP)\r\n\r\n\r\ndef blink(n, interval=0.2):\r\n    for i in range(n):\r\n        led.value = 1\r\n        time.sleep(interval)\r\n        led.value = 0\r\n        time.sleep(interval)\r\n\r\nprint(2+3)\r\n\r\n\r\nwhile(True):\r\n    for i, switch in enumerate(switches):\r\n        if not switch.value:\r\n            blink(i+1)\r\n\r\npass\r\n'
>>> f.close()

But if I write the same file again with Notepad++, I get an OSError when trying to read it in the REPL:

>>> f = open('main.py', 'rb')
>>> chars = f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: 5

I can TYPE either file from CMD.EXE, and if I use od (from GnuWin32) to look at the characters in the file, they are identical.

I have repeated the cycle of writing with NOTEPAD.EXE and then Notepad++ and I consistently have the error only with Notepad++. Once or twice TYPE complained it did not have access to the bad version of the file, but I cannot reproduce that problem consistently. If I look at the file properties in Windows Explorer, the two versions have identical properties and sizes.

I've looked at the Notepad++ source code where it writes files. It looks innocuous: it uses ::fwrite(). It does support UTF-8, but the file in question is all ASCII, and I set up Notepad++ to write it as ANSI.

This appears to be some oddity or corruption about how Windows is writing to the CPy filesystem. I've inquired on the Notepad++ forum about whether it does anything unusual when writing files, and will report back if I hear anything.

The workaround is not to use Notepad++, but it seems important to figure out what's going wrong so other users will not have the same problem. I did some websearching and haven't turned up any similar reports about Notepad++, FatFs, or MicroPython.

@tannewt
Copy link
Member

tannewt commented Mar 14, 2017

Thanks for the detailed write-up @dhalbert! I'm not sure why it varies between Notepad and Notepad++ but do have some general ideas.

First a little background. With mass storage devices the host OS is responsible for maintaining the on-disk file system metadata. This is super common and well supported across OSes. The alternative is the Media Transfer Protocol which is more complex and relies on the device maintaining the underlying file system. Android phones do this so that they can read and write files at the same time as things are changed over USB. However, MacOS in particular doesn't have built-in support for MTP. So, CircuitPython is a mass storage device instead. Relying on the OS has its downsides though.

I suspect what you are running into is caching thats done on the OS side. There's no requirement that the OS write immediately to the disk upon save (though its common to). All it has to do is give the appearance that its written from within Windows. "Safely removing" is Windows' way of saying "Force me to flush the cache." So, try doing that next time its inconsistent.

You are right that 512 bytes is important. That's the size of each filesystem block. So, I think windows is writing the first block on save but waiting on the second. If that gap is large enough then the auto-reset will trigger and show an error. The OS can also do this for the metadata parts of the file system which may have lead to your OSError.

Neither of these things should be fatal to the filesystem though. You should be able to resave or "safely remove" to cause Windows to flush its cache and make the actual stored file system consistent.

MicroPython has had similar reports because they allow for writing the file system even when the OS is as well which can definitely lead to corruption. They chose to allow this for user flexibility. In CircuitPython the file system is never writeable from a user's Python code which should reduce the chance of FS corruption. (Its still possible by disconnecting the device before the OS's cache has been written.) We hope to change this to be toggleable (writeable over USB or from CircuitPython but not both) but haven't yet. It'll probably be done when we finish the SD card support.

So, next time it happens try doing a "safe remove" on the device and see if that fixes the file for CircuitPython. If that works, then its the best we can do on the CircuitPython side.

@dhalbert
Copy link
Collaborator Author

dhalbert commented Mar 14, 2017

Thanks for the background, Scott. The USB drive is set to write changes immediately. It does not deliberately cache writes. This is typical for USB flash drives on Windows. Here's one of its property windows:

atmel-quick-removal

However, despite the setting above, there's something odd about Notepad++ or the Windows API it uses. I looked at its code and it straightforwardly does a ::fwrite() and then an ::fclose() when writing the file. When I first write the file, it reports a syntax error at the 512-byte block boundary. But if I then wait 60-90 seconds, CPy soft-resets again, apparently because Windows writes the remaining changes, and the program works. So it appears there is some delay somewhere about writing back all the changes. I'll do some further research on this.

I tried a few other things. I lengthened AUTORESET_DELAY_MS from 500 to 5000ms, but that did not help. I also tried entering the REPL (thereby disabling auto-reset), and then writing the file. When I exit the REPL with ctrl-D to initiate a soft reset, it still reports a syntax error at the 512-byte block boundary. But if I write the file, then Eject the drive (Safe remove), then there is not an error, since any remaining writes are flushed.

I might want to try instrumenting access_vfs.c and maybe some other files. How do you add printf-style logging? (Or, how do you do debugging in general?) I found mp_printf et al, but am not sure what the first arg should be.

@dhalbert dhalbert reopened this Mar 14, 2017
@tannewt
Copy link
Member

tannewt commented Mar 14, 2017

Huh, I never knew windows could do that. I'm a Mac user now and used to do Linux.

For printf I think this should work: mp_printf(&mp_plat_print, then normal printf format plus %q for qstrs.

For crashes I use GDB on the new prototype metro.

Thanks for your help!

@dhalbert
Copy link
Collaborator Author

I spent some more time on this today. I tested several other editors, including Wordpad, Mu, Atom, and EMACS. All have delayed-write problems as above except EMACS and NOTEPAD.EXE. I also found the source code for NOTEPAD.EXE.

NOTEPAD.EXE uses WIN32 I/O calls; Notepad++ uses the stdio-style wrapper provided by Microsoft (fopen, fwrite, etc.), which makes WIN32 calls at a lower level. I wrote a small C++ test program that does file writing like NOTEPAD does and also like Notepad++ does. To make a long story short, whether or not I see problems depends on the flags used to open the file: whether the file is assumed to exist when opening and/or whether it should be truncated on opening. This is at the WIN32 API CreateFile() call.

NOTEPAD and EMACS go to some effort to open the file as existing if possible. Notepad++ and the other editors don't bother to do this, but there's nothing wrong with how they write files. And even given the differences above, it's not clear to me that NOTEPAD or EMACS will not have delayed-write with larger files that are writing more than one or two blocks.

The real problem is that despite the USB drive being marked for Quick Removal (also known as ExpectSurpriseRemoval - great name), writes can be delayed. I don't know why this is so, and I'll do some more research on this.

But unless this can be fixed somehow, it seems to me that enabling auto-reset for CircuitPython on Windows is problematic. Auto-reset can trigger the reading of an incomplete file. Then, after 10's of seconds, The write will complete, it will auto-reset again and mysteriously work. Ejecting the drive manually will force the writes, but there will still be an error reported at the first incomplete write. I see this as potentially a big support issue for anyone using CPy on Windows.

I don't know if CPy can detect whether it's connected to Windows or not, and disable auto-reset if so. Alternatively, there could be a no-auto-reset version of the firmware, but that defeats the purpose of a CPy board being pre-loaded and being usable right out of the box.

Am I the only person you know of testing CPy with auto-reset on Windows?

@tannewt
Copy link
Member

tannewt commented Mar 19, 2017

Thanks for all of the investigation @dhalbert !

No, we can't know if the computer we're talking to is Windows or not AFAIK.

There is an API for turning off autoreset here that can be used to turn it off in boot.py: https://circuitpython.readthedocs.io/en/latest/atmel-samd/bindings/samd/__init__.html

Another solution could be better error messaging on the CircuitPython side. We could likely detect SyntaxErrors on the block boundary and suggest ejecting the device. Or we could simply give it as a suggestion on all SyntaxErrors.

How does that sound?

@dhalbert
Copy link
Collaborator Author

dhalbert commented Mar 19, 2017

Two replies coming up: 1/2

A note on exactly which write is being delayed:

I was able to look at the actual USB messages using Wireshark and USBPcap. The traces show that the entire file is being written out immediately; it's the metadata write, in this case the updating of the FAT (File Allocation Table) that's being delayed. I think it shows how filesystem corruption could be possible, because a filesystem transaction has not been finished.

If the file is truncated and then written, the FAT will be changed when the file is truncated (to indicate the file is 1 sector long), and then changed again when the file is written (because it >1 sectors long). Opening the file as existing only works because I don't happen to be changing the number of sectors needed for the file. If I made it smaller or larger by at least a sector, then the FAT would have to be updated.

(I still don't know why writes are being delayed. I am amazed no one else has reported something similar. By the way, I also did some USB traces of a conventional FAT USB drive, and saw the same delay. I also tried a FAT32 drive: that was much better and finished all writes within a couple of seconds.)

My comments preceded by ###.

No.  Time           Source    Destination  Protocol Length Info
  66 5.490806       host      1.2.2        USBMS    58     SCSI: Write(10) LUN: 0x00 (LBA: 0x00000003, Len: 8)
  67 5.491069       host      1.2.2        USB      4123   URB_BULK out    ### writing directory info starting at LBA 3
  70 5.534978       host      1.2.2        USBMS    58     SCSI: Write(10) LUN: 0x00 (LBA: 0x00000002, Len: 1)
  71 5.535208       hos       1.2.2        USB      539    URB_BULK out    ### writing FAT (LBA 2) to truncate file
  73 5.545051       host      1.2.2        USBMS    58     SCSI: Write(10) LUN: 0x00 (LBA: 0x00000003, Len: 8)
  74 5.545194       host      1.2.2        USB      4123   URB_BULK out    ### writing directory info again
  76 5.588842       host      1.2.2        USBMS    58     SCSI: Write(10) LUN: 0x00 (LBA: 0x0000000e, Len: 2)
  77 5.589063       host      1.2.2        USB      1051   URB_BULK out    ### write of main.py: has both sectors
   ...                                                                     ### 25-second delay
 403 31.146476      host      1.2.         USBMS    58     SCSI: Write(10) LUN: 0x00 (LBA: 0x00000002, Len: 1)
 404 31.146650      host      1.2.2        USB      539    URB_BULK out    ### write FAT again, but late

@dhalbert
Copy link
Collaborator Author

dhalbert commented Mar 19, 2017

2/2:

Thanks - I didn't know about disabling auto-reset via the samd module. I put this in boot.py, and it worked nicely.

import samd
samd.disable_autoreset()
print("*** Disabling auto-reset.")

I added a message since the firmware says auto-reset is on, and I need to contradict that. The user still needs to be reminded to Eject the CIRCUITPY drive to force the write or else a soft-reset could generate an error. So the steps are:

  1. Write the file.
  2. Eject the drive.
  3. Type ctrl-D or press the reset button to initiate a reset.

(Note that the "Eject" does not actually eject the drive and make it disappear, at least on Windows 10). It's still there, but after the Eject a notification appears that it's "safe to remove".)

Heuristic error messages might be confusing, since syntax errors are very common anyway. If you can detect the Eject, you could detect a file write without a following Eject, and remind the user to Eject. And if you do detect the Eject, you could then safely do an auto-reset after the Eject is done. So the auto-reset would actually happen on Eject, not file write. What do you think of that strategy?

So... To see if you could detect an Eject. I logged the USB events that show up during an Eject. Here's a link to the complete log with more detail, and I've included a summary below. I haven't seen these events elsewhere during FAT I/O.

No.     Time           Source                Destination           Protocol Length Info
    558 44.093344      host                  1.2.2                 USBMS    58     SCSI: Prevent/Allow Medium Removal LUN: 0x00  ALLOW
    560 44.093689      host                  1.2.2                 USBMS    58     SCSI: Start Stop Unit LUN: 0x00

@tannewt
Copy link
Member

tannewt commented Mar 19, 2017

I don't want to limit the autoreset to eject-only because it reduces the usefulness. I'm ok with autoresets that error. I think its just a matter of teaching people how to work around it if they see the issue.

I wouldn't hide the SyntaxError in favor of something else. I would just add an additional tip or hint along with it that says that the flushing could be the case.

Its a good point about the auto-reset messaging. I'll make that conditional. Filed #112 for it.

@dhalbert dhalbert changed the title FatFs problems when writing from Windows Delayed writes and auto-reset problems when writing from Windows Mar 19, 2017
@dhalbert
Copy link
Collaborator Author

dhalbert commented Apr 25, 2017

Just some more info: I formatted a 512MB flash drive multiple times with various sized partitions. At 15MB and below, writes of the FAT table are delayed. At 16MB and up, FAT writes happen promptly.

I compared USB traces from the 15MB and 16MB case. They are very similar until the delayed writes at the end. But in the 16MB case, Windows sends SCSI Prevent/Allow Medium Removal commands before writes, asking that the device not be removed. The USB stick actually returns failure on these requests (because it can't guarantee no removal), but Windows tries anyway. The 15MB trace shows no Prevent/Allow Medium Removal"commands.

So apparently at some threshold Windows decides to do writes carefully, both in terms of requesting no removal, and doing them promptly.

I've submitted this as a problem report via the "Windows Insider Feedback Hub".

@tannewt
Copy link
Member

tannewt commented Apr 25, 2017 via email

@dhalbert
Copy link
Collaborator Author

dhalbert commented Apr 27, 2017

No solution yet, but I may have found the actual Windows driver code that's causing the issue. 16MB is the breakpoint between FAT12 and FAT16; below 16MB, you get FAT12.

[EDIT: breakpoint is actually at the number of clusters: 4085. Sectors are almost always 512B, but there can be multiple sectors per cluster: this is specified in the FAT filesystem header block]

MS happens to include the FAT filesystem driver in a package of sample driver code! There are several places in that driver where, if the filesystem is FAT12, the driver will not bother to set the dirty bit.

https://github.com/Microsoft/Windows-driver-samples/blob/master/filesys/fastfat/verfysup.c#L774
https://github.com/Microsoft/Windows-driver-samples/blob/master/filesys/fastfat/cachesup.c#L1212
and maybe most critically:
https://github.com/Microsoft/Windows-driver-samples/blob/master/filesys/fastfat/cleanup.c#L1101

In the last link, in cleanup.c, the FAT is not flushed if the filesystem is FAT12. I think this may be causing exactly the behavior I see:

    //
    //  If that worked ok,  then see if we should flush the FAT as well.
    //

    if (NT_SUCCESS(Status) && Fcb && !FatIsFat12( Vcb) && 
        FlagOn( Fcb->FcbState, FCB_STATE_FLUSH_FAT)) {

        Status = FatFlushFat( IrpContext, Vcb);

I've added this info to the report I sent to MS.

@willingc
Copy link
Collaborator

In the last link, in cleanup.c, the FAT is not flushed if the filesystem is FAT12. I think this may be causing exactly the behavior I see.

@dhalbert I suspect that you have found the root cause. 👍

@ladyada
Copy link
Member

ladyada commented Apr 27, 2017

wow nice investigative analysis!
there's a few things we can do

  1. see if we can format FAT16, it wont be as space efficient but maybe thats OK
  2. wait for FAT to get flushed after a file write - reboot only after that occurs, annoying but possible

waiting for MS to fix it could be very time consuming. we could get lucky and it takes only a few weeks but it took a few years to get USB CDC serial devices to get automatically installed.

@tannewt - its your call!

@dhalbert
Copy link
Collaborator Author

Thanks. The idea of forcing FAT16 sounds good, but it's not clear to me it's going to work. Here is MS' chatty and informative spec for the various FAT filesystems. On page 16 or so it says:

Now we can determine the FAT type. Please note carefully or you will commit an off-by-one error!

In the following example, when it says <, it does not mean <=. Note also that the numbers are correct. The first number for FAT12 is 4085; the second number for FAT16 is 65525. These numbers and the ‘<’ signs are not wrong.

If(CountofClusters < 4085) {
/* Volume is FAT12 */
} else if(CountofClusters < 65525) {
    /* Volume is FAT16 */
} else {
    /* Volume is FAT32 */
}

This is the one and only way that FAT type is determined. There is no such thing as a FAT12 volume that has more than 4084 clusters. There is no such thing as a FAT16 volume that has less than 4085 clusters or more than 65,524 clusters. There is no such thing as a FAT32 volume that has less than 65,525 clusters. If you try to make a FAT volume that violates this rule, Microsoft operating systems will not handle them correctly because they will think the volume has a different type of FAT than what you think it does.

I saw this point mentioned other places. But looking through the fastfat driver, I'm not sure how it enforces this. I may have time to take a harder look tonight. Not sure if we can force FatFS to do FAT16 instead of FAT12 -- it's not in the API, so we'd need to change the code.

Besides the Feedback Hub report, I also contacted the one of the UF2 guys, and he forwarded the problem on. I will follow up with him for closure.

@tannewt
Copy link
Member

tannewt commented Apr 27, 2017

I'd go with @ladyada 's second option. It shouldn't be too hard to only autoreset after writes to a specific block or two. I don't think I'd have it on by default though. I'd just have it as a setting you can set in boot.py like turning off autoreset.

@dhalbert
Copy link
Collaborator Author

When boot.py or whatever gets written, you may see a first FAT write (e.g., when the file is truncated before it's rewritten), and then the file itself, and then (much later) the final FAT write. So detect the file write as you are already doing and only then wait for the FAT write. If the user doesn't do an Eject, it will be tens of seconds before anything happens. So a typical user will probably still want to do an Eject.

It's a little tedious an Eject from the taskbar or from an Explorer window; . I just found a little piece of freeware to make it a double-click (or a hotkey). I'll try it out -- is might be handy.

USB Disk Ejector

@ladyada
Copy link
Member

ladyada commented Apr 27, 2017

Uwe Sieber has some nice command-line friendly tools that we can integrate if necessary
http://www.uwe-sieber.de/drivetools_e.html
in particular "Eject Media"

@dhalbert
Copy link
Collaborator Author

For posterity: I asked about the bug in https://superuser.com/questions/1197897/windows-delays-writing-fat-table-on-small-usb-drive-despite-quick-removal/, and ended up answering my own query.

@tannewt
Copy link
Member

tannewt commented Jun 19, 2017

Sorry I haven't done the second option even though I said I would. I don't think its a good option actually because it doesn't actually prevent corruption. It only reduces spurious bad errors.

@dhalbert have you confirmed that FAT16 causes Windows to write faster? That could work.

@ladyada
Copy link
Member

ladyada commented Jun 19, 2017

@tannewt i guess the Q would be: does Win ever edit a file without updating the FAT?

we can look at USB traces if ya like?

@dhalbert
Copy link
Collaborator Author

dhalbert commented Jun 19, 2017

@tannewt FAT16 does cause windows to write out the metadata faster (within a few seconds). But as I found out, FAT12 is used by definition when the filesystem is a certain size, so unfortunately we can't force Windows to use FAT16 instead of FAT12 on tiny drives.

People haven't used such tiny drives en masse since floppies and very small camera flash cards.

This is a darn nuisance. I can see a few ameliorations:

  1. Get MS to fix this for Windows 10 going forward. I have tried all the obvious feedback channels, including posting in MSDN forums, mentioning it on Windows 10 Feedback Hub, and trying to get the attention of the right people via the UF2 folks. They did forward it on, but I have no idea where it stands internally. If you have any other contacts at MS (maybe due to your geography), that would be great. And even if MS fixed it, it still doesn't solve the problem of Windows 7 and unpatched Windows 10 boxes.
  2. Teach people to use Eject religiously and not press reset. Maybe provide GUI/command line tools to make it really easy.
  3. Patch or write plugins for one or more code editors to do Eject programmatically, and recommend their use on Windows.
  4. In the long run, implement MTP or similar to provide a higher-level filesystem. The lack of MTP on MacOS is an issue, and maybe that's trading one problem for another. But a more abstract filesystem would allow you to do simultaneous writes via USB and internally, solve the dot-files problem that Tony has mentioned on MacOS, etc.

Do you know of any other microcontroller packages that provide a filesystem? Seems like Micro:Bit etc all just have one-file upload.

@dhalbert
Copy link
Collaborator Author

dhalbert commented Jun 19, 2017

@ladyada Not sure this answers your question, but the very delayed FAT update happens when the file number of blocks needed for a file changes. Some editors make a point of always opening a file for write with truncation, which causes the file to go to zero length and then grow, so they always hit this problem (e.g. Notepad++). Some editors don't truncate on opening (e.g. NOTEPAD.EXE), so if the file doesn't change size enough to change the number of blocks, the delayed FAT write doesn't happen. But that's an accident of file size (and it's what confused me so much when I first encountered this problem).

@ladyada
Copy link
Member

ladyada commented Jun 19, 2017

@dhalbert ohh that makes sense - i always use xemacs so that could be part of why i've never seen it.

would there ever be a time where a file changes but the FAT doesnt get updated? i think you always at least have an update for the modification time?

@dhalbert
Copy link
Collaborator Author

dhalbert commented Jun 19, 2017

@ladyada There are two sets of metadata that get updated on file write (and sometimes on open): the directory info (like file modification times), and the actual FAT (File Allocation Table). The FAT does not contain directory info: it just contains a chained list of blocks (aka "clusters") that store the data in the file. So for instance, a file might be stored in blocks (clusters) 10, 11, and 15. The directory entry points to slot 10 in the FAT. Slot 10 contains the number 11, slot 11 contains the number 15, and slot 15 contains a special marker indicating there are no more blocks.

If the editor doesn't truncate the file when writing (NOTEPAD and Emacs don't), and the number of clusters needed doesn't change, then the actual FAT doesn't need to be updated.

So when you're editing, and your file didn't get smaller or bigger, everything is fine. And maybe when you did grow the file, it didn't work at first, and then it did, and you didn't think much of it. Maybe xemacs even does some kind of programmatic eject, though I'd think not.

Windows writes the directory info promptly. But the driver, fastfat.sys, doesn't flush the FAT entry changes promptly. I think I even found the statement where it checks whether it should do a flush, and it skips the check on FAT12. FAT12 doesn't have a "dirty" bit, which indicates whether a filesystem transaction is in progress. FAT16 and up do. (FAT16 and up also have duplicate FAT tables, for safety, and other enhancements.) FAT12 was originally for floppies, so maybe this all has to do with what was good for floppies. Or maybe it's just a bug.

@ladyada
Copy link
Member

ladyada commented Jun 19, 2017

right! i forgot that the directory management is not in the FAT just the clusters :) yeah sounds like some of these editors can be confusing... we may want to add a FAQ to the circuitpython pages to indicate editors that we don't suggest because of this.

(ironically, i wrote a fat fs handler for PIC in 2004 and have clearly erased all of that knowledge from my brainstem :)

@willingc
Copy link
Collaborator

we may want to add a FAQ to the circuitpython pages to indicate editors that we don't suggest because of this.

For Windows, it probably makes sense to suggest a few free (and easily installable) editors such as Visual Studio Code and Atom.

@dhalbert
Copy link
Collaborator Author

I've done some more research on forcing Eject after write. I found a number of code examples. The Uwe Seiber EjectMedia.exethat @ladyada mentioned looks good (#111 (comment)), and he also has source code with detailed explanations available. I've looked at Notepad++, Visual Studio Code, Atom, and Mu and it looks like it would be relatively easy to write plugins for any of them (or for Mu, add integral code) that would invoke something like EjectMedia immediately after a write, by either running an external executable or invoking the Win32 code directly.

@dhalbert
Copy link
Collaborator Author

dhalbert commented Jul 9, 2017

Random wacky idea: This is probably not worth the trouble, but I thought it was worth writing down.

I mentioned above that the determination of FAT12 vs FAT16 is based on the number of clusters. So below a certain number, the filesystem must be FAT12. This would seem to preclude formatting, say, a 2MB filesystem as FAT16.

However, I realized it may be possible to fake this. The FAT table has a special value to mark bad clusters that should not be used. For FAT16 it is 0xFFF7. So one could format a tiny filesystem as FAT16 by pretending it was larger, but marking all the clusters that are out of range as bad. So actually most of the FAT would be filled with bad cluster markers. The remainder of the FAT would work fine and be treated as FAT16 by Windows, etc. The FatFS code would probably have to modified to do this.

@ladyada
Copy link
Member

ladyada commented Jul 9, 2017

we have many options - i think right now i want to wait and see if we get this happening to other people (after we put in sufficient warnings not to reset w/o eject) - as its a bit hacky :)

@Janisku7
Copy link

@dhalbert and me have report it in Windows insiders feedback hub and if it get more votes it get fixed faster

@tannewt tannewt modified the milestone: Long term Aug 1, 2017
@kwagyeman
Copy link

kwagyeman commented Aug 4, 2017

Hi guys, this is Kwabena from OpenMV. We've been suffering from this issue with our micropython system and I stumbled across your thread. I think you've hit the exact problem. Thank you for your good notes on this. I'm going to modify OpenMV IDE to not truncate the file on saving to the disk.

Normally, folks use the system with an SD card. But, sometimes with the internal flash.

@kwagyeman
Copy link

kwagyeman commented Aug 4, 2017

I found this thread on how to unmount a disk using windows:

https://support.microsoft.com/en-us/help/165721/how-to-ejecting-removable-media-in-windows-nt-windows-2000-windows-xp

The issues seems to be solved with OpenMV IDE using this to reset the OpenMV Cam.

And on linux the syncfs() function can be used.

@dhalbert
Copy link
Collaborator Author

dhalbert commented Aug 4, 2017

@kwagyeman Glad this was helpful. The exact issue in Windows is summarized in the superuser.com link above.

On Linux and Mac, writing the FAT on a FAT12 filesystem does not seem to be delayed, so you may not need the unmount there. It seems to happen within a couple of seconds.

@dhalbert
Copy link
Collaborator Author

I'm going to close this issue for now, since we've documented the problem thoroughly in the Learn Guides and described and implemented mitigations such as editor plugins that force immediate writes. If/when there's some movement on the Windows side or we figure out an alternate filesystem that works on all the platforms, I'll reopen or create a new issue.

@kevinjwalters
Copy link

In a classroom environment with CPX boards I've found this to be problematic. I'm seeing it take roughly 10-20 seconds to see a main.py has changed after an update via cp command on linux (Rasperry Pi). We've also seen corruption and that was where the CPX was being bounced back and forth from a Windows 10 laptop to a Rasperry Pi probably without ejects.

@dhalbert
Copy link
Collaborator Author

On LInux, run sync after doing a cp (maybe make an alias to combine the two), or else use an editor that writes directly to the board and does a sync itself. This Learn Guide section describes which editors are good: https://learn.adafruit.com/welcome-to-circuitpython/creating-and-editing-code#1-use-an-editor-that-writes-out-the-file-completely-when-you-save-it

@kevinjwalters
Copy link

Is there a bug/RFE ticket from Adafruit with Microsoft? Given this was noted in 2017 this should have been fixed by them by now?

@dhalbert
Copy link
Collaborator Author

I tried very hard to get Microsoft's attention on this, including using Feedback Hub, and contacting several people who work there, asking them to pass it on. I didn't succeed, as far as I know. The behavior may have to do with not wanting to wear out floppy disks, or something like that.

@kevinjwalters
Copy link

kevinjwalters commented Mar 14, 2019

I'd say floppy disks were are even more at risk because almost all drives had a physical eject albeit with the churning noise and LED to help users trained in the art of waiting. Only those fancy Apple users had their software controlled eject plus the hole for the paper clip for when it all went wrong!

@kevinjwalters
Copy link

kevinjwalters commented Apr 29, 2019

I noticed some MSFT folk talking about PyCon/Adafruit stuff and started this discussion (@qubitron and @zooba) to try and get a bit of momentum to get this bug fixed: https://twitter.com/kevinjwalters/status/1122527653960007680

The Microsoft feedback id is: 4257403a-0bc4-4d5d-8f36-9ba682d53a45

@oldav
Copy link

oldav commented Jun 3, 2019

Hi, thank you for your perfect analysis, I have the same problem in a different context, and I spent days to find a solution. I'm not sure to find one now, but I know what to try.
If my case can help to motivate MS to fix this trouble: we use a custom ioT, which has 1MB of flash with USB port. It is used to configurate it, and to copy some data. People tend to forget to eject properly USB mass storage (because it is synced now), and it is difficult to explain that the filesystem will be corrupted, because it is smaller than 16MB... So, I agree that floppy disks don't exist anymore, but ioT is a new case MS should consider.

@dhalbert
Copy link
Collaborator Author

dhalbert commented Jun 3, 2019

@oldav There is some news about this: we are now in contact with MS, and some people there interested in CircuitPython are trying to pursue it.

@kevinjwalters
Copy link

There's a survey about Python use on microcontrollers mentioned on: https://twitter.com/nnja/status/1140807884474732544 . I've mentioned the FAT12 bug on there. I'm not sure if it'll have any effect but I'll try any angle on offer to try to get MSFT to fix this tedious problem.

"Hello there, we're a group of interns at Microsoft Vancouver, working in The Garage! We're looking for insights into your experience around physical computing projects. As well as your interest in trying out a new method of development for your project - say goodbye to waiting for your code to build and compile inside Arduino IDE with C/C++, Python is here to help! This survey will only take 3 - 5 minutes and your insights are very valuable to us!"

@tannewt
Copy link
Member

tannewt commented Jun 20, 2019

@kevinjwalters No need to keep bugging Microsoft about the FAT12 bug. They've gotten the message and the wheels are turning. Just need to have patience now.

@kevinjwalters
Copy link

@tannewt I didn't expect anything to actually happen here but it turns out it might be fixed: https://twitter.com/zooba/status/1188954487924260864

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants