Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nrf52840 feather: crash on file write #2338

Closed
jepler opened this issue Nov 29, 2019 · 2 comments · Fixed by #2369
Closed

nrf52840 feather: crash on file write #2338

jepler opened this issue Nov 29, 2019 · 2 comments · Fixed by #2369

Comments

@jepler
Copy link
Member

jepler commented Nov 29, 2019

When writing to CIRCUITPYTHON, especially with dd, a reset-to-safe-mode event can occur. When it does, the stack looks like this:

Breakpoint 1, reset_into_safe_mode (reason=reason@entry=GC_ALLOC_OUTSIDE_VM)
    at ../../supervisor/shared/safe_mode.c:84
84	void __attribute__((noinline,)) reset_into_safe_mode(safe_mode_t reason) {
(gdb) where
#0  reset_into_safe_mode (reason=reason@entry=GC_ALLOC_OUTSIDE_VM)
    at ../../supervisor/shared/safe_mode.c:84
#1  0x00027e26 in gc_alloc (n_bytes=n_bytes@entry=64, 
    has_finaliser=has_finaliser@entry=false, long_lived=<optimized out>)
    at ../../py/gc.c:479
#2  0x000274ec in m_malloc_maybe (num_bytes=num_bytes@entry=64, 
    long_lived=long_lived@entry=false) at ../../py/malloc.c:95
#3  0x000482ae in allocate_ram_cache ()
    at ../../supervisor/shared/external_flash/external_flash.c:327
#4  0x0004861c in external_flash_write_block (
    data=data@entry=0x200047e4 <_mscd_buf> "", block=block@entry=1545)
    at ../../supervisor/shared/external_flash/external_flash.c:520
#5  0x000486c6 in supervisor_flash_write_blocks (
    src=0x200047e4 <_mscd_buf> "", block_num=1545, num_blocks=8)
    at ../../supervisor/shared/external_flash/external_flash.c:554
#6  0x00043f84 in flash_write_blocks (src=<optimized out>, 
    block_num=<optimized out>, num_blocks=<optimized out>)
    at ../../supervisor/shared/flash.c:121
#7  0x00042ef0 in disk_write (pdrv=pdrv@entry=0x200044e4 <_internal_vfs>, 
    buff=buff@entry=0x200047e4 <_mscd_buf> "", sector=sector@entry=1546, 
    count=count@entry=8) at ../../extmod/vfs_fat_diskio.c:120
#8  0x0004b116 in tud_msc_write10_cb (lun=<optimized out>, lba=1546, 
    offset=<optimized out>, buffer=buffer@entry=0x200047e4 <_mscd_buf> "", 
    bufsize=bufsize@entry=4096)
    at ../../supervisor/shared/usb/usb_msc_flash.c:140
#9  0x00049e8c in mscd_xfer_cb (rhport=<optimized out>, 
    ep_addr=<optimized out>, event=<optimized out>, xferred_bytes=4096)
    at ../../lib/tinyusb/src/class/msc/msc_device.c:474
#10 0x00049564 in tud_task () at ../../lib/tinyusb/src/device/usbd.c:330
#11 0x0004aff6 in usb_background () at ../../supervisor/shared/usb/usb.c:79
#12 0x0004b448 in run_background_tasks () at background.c:61
#13 0x0005b14e in common_hal_neopixel_write (
    digitalinout=digitalinout@entry=0x20004774 <status_neopixel>, 
    pixels=pixels@entry=0x2000477c <status_neopixel_color> "\v", 
    numBytes=numBytes@entry=3) at common-hal/neopixel_write/__init__.c:203
#14 0x000440d4 in clear_temp_status ()
    at ../../supervisor/shared/rgb_led_status.c:300
#15 0x00048246 in spi_flash_flush_keep_cache (
    keep_cache=keep_cache@entry=false)
    at ../../supervisor/shared/external_flash/external_flash.c:444
#16 0x000484b8 in supervisor_flash_release_cache ()
    at ../../supervisor/shared/external_flash/external_flash.c:455
#17 0x00043dda in filesystem_flush ()
    at ../../supervisor/shared/filesystem.c:127
#18 0x000433e2 in start_mp (heap=heap@entry=0x200063c0 <allocations+16>)
    at ../../main.c:116
#19 0x00043596 in run_code_py (safe_mode=safe_mode@entry=NO_SAFE_MODE)
    at ../../main.c:240
#20 0x00043892 in main () at ../../main.c:464

How to reproduce: **with the board sitting at Press any key to enter the REPL. Use CTRL-D to reload, bulk copy some data to flash: dd if=/dev/zero bs=512 count=200 of=zero

The status LED will change to yellow and USB connectivity will stop until you reset the board.

@jepler
Copy link
Member Author

jepler commented Nov 29, 2019

@tannewt I'm happy to work on this but I'll need a little guidance on what angle you'd like me to look at it from. I viewed it as a problem with being able to reenter the filesystem code at a time when it's apparently not fully initialized(?) and tried to address it in #2319 but I totally get your urge to do more analysis before adding a band-aid fix.

@tannewt
Copy link
Member

tannewt commented Dec 3, 2019

This is a really interesting backtrace because it is a mass storage write (mscd_xfer_cb) occurring while we are flushing the filesystem (filesystem_flush) so that we can free the supervisor allocated cache in favor of a mp heap allocated one.

On SAMD we don't have this issue because neopixel_write doesn't call RUN_BACKGROUND_TASKS and therefore the USB processing. Removing background tasks from neopixel_write is probably the simplest fix but may cause audio playback issues during long neopixel writes which SAMD will already have.

Another solution I can think of is to add a lock around the flash cache that allows you to "flush, release cache, create heap" atomically so that there is always a place to put the cache.

A simpler form of this would be to never release a supervisor allocated cache once it is created and just place the heap after it. This means we'd take 4k up forever more even off usb but that is likely ok. The only case we'd need to worry about then is freeing a VM allocated cache in favor of a supervisor allocated one.

Those are some options. Let me know if they make sense. Thanks!

jepler added a commit to jepler/circuitpython that referenced this issue Dec 10, 2019
It's extremely dubious that we have these handles that we think
are to GC'd memory at a time when the gc pool may not be initialized.
Hopefully, they WERE valid GC memory and are undisturbed by the teardown
of the interpreter that can lead to this state.

In this case, don't try to m_free them, the memory will become free when
the GC heap is reinitialized.

Closes: adafruit#2338 (together with previous commit)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants