Skip to content

No bootloader mode, no usable CircuitPython, safe-mode due to 'memory access or instruction error' on Metro M4 Airlift (possibly broken memory hardware) #8487

@CrsiX

Description

@CrsiX

CircuitPython version

Adafruit CircuitPython 8.2.6 on 2023-09-12; Adafruit Metro M4 Airlift Lite with samd51j19

Code/REPL

# not available, see below

Behavior

TL;DR: my best guess at the moment is broken memory hardware, but if that's not the case (the board is not very old), it's really weird behavior. Read on below for much more details and the whole story.

It boils down to a few things:

  • Getting into bootloader mode with double-pressing RESET or using microcontroller.on_next_reset doesn't work, so re-flashing easily is not possible. It just shows me a constant red NeoPixel (indicating failure, I assume), and all four board LEDs are on (in case 1) or doesn't do anything and falls back to "normal" safe-mode (in case 2) as explained below.
  • Normal CircuitPython usability is extremely limited (safe-mode)
    • either because the CIRCUITPY drive can't be found; then storage.erase_filesystem fixed it sometimes (it also didn't present any block storage device in lsblk)
    • if the drive is found, then it presents me CircuitPython core code crashed hard. Whoops! Hard fault: memory access or instruction error.
      • however, the drive is not usable: it can sometimes be mounted, but any operation will fail and make the board reset after some seconds (I tried e.g. mount /dev/sdb1 /mnt -o ro && umount /mnt -> mounting works, but unmounting directly afterwards breaks already)
    • I was sometimes able to fix the behavior by doing a mkfs.fat manually on the partition, but after a reset of the board, I will get back to CircuitPython core code crashed hard. Whoops! Hard fault: memory access or instruction error.
  • The board sometimes doesn't show up at all. Unplugging and plugging in again sometimes makes it work; using a different USB port on my host sometimes makes it work. I'm having dmesg -w open all the time: I can see not even a single line of logs in there in these cases. If the device is recognized as such, then I have the other problems described in the points above.
  • Often, it will also get stuck at some point. I can see the TX LED being constantly on (so I assume, something is transmitting a lot over serial console?) But nothing shows up in the serial console and the console gets stuck as well (I can't input anything). Then, after some seconds, I see a lot of failure logs in my dmesg.

Note that in the very first occurrence of the problem, however, the first 512 bytes of the file code.py were the byte 0xff (confirmed with hexdump), due to I/O error when writing to the flash.

The message about the memory access or instruction error actually suggests to open an issue, otherwise I would just ignored it (using a new board). But maybe some folks here have good clues and could help me fix it:

You are in safe mode because:
CircuitPython core code crashed hard. Whoops!
Hard fault: memory access or instruction error.
Please file an issue with your program at github.com/adafruit/circuitpython/issues.
Press reset to exit safe mode.

Description

No response

Additional information

dmesg logs when it starts directly to safe-mode with a CIRCUITPY drive present. Then I'm using the REPL a little bit, until at timestamp 64438 it resets the device, the REPL gets stuck:

[64370.299004] usb 2-1: new full-speed USB device number 35 using xhci_hcd
[64370.673718] usb 2-1: New USB device found, idVendor=239a, idProduct=8038, bcdDevice= 1.00
[64370.673776] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[64370.673819] usb 2-1: Product: Metro M4 Airlift Lite
[64370.673863] usb 2-1: Manufacturer: Adafruit Industries LLC
[64370.673930] usb 2-1: SerialNumber: (RETRACTED)
[64370.692531] cdc_acm 2-1:1.0: ttyACM0: USB ACM device
[64370.698486] usb-storage 2-1:1.2: USB Mass Storage device detected
[64370.701216] scsi host3: usb-storage 2-1:1.2
[64370.715058] input: Adafruit Industries LLC Metro M4 Airlift Lite Keyboard as /devices/pci0000:00/0000:00:06.0/usb2/2-1/2-1:1.3/0003:239A:8038.000A/input/input16
[64370.767341] input: Adafruit Industries LLC Metro M4 Airlift Lite Mouse as /devices/pci0000:00/0000:00:06.0/usb2/2-1/2-1:1.3/0003:239A:8038.000A/input/input17
[64370.768365] hid-generic 0003:239A:8038.000A: input,hidraw1: USB HID v1.11 Keyboard [Adafruit Industries LLC Metro M4 Airlift Lite] on usb-0000:00:06.0-1/input3
[64371.725381] scsi 3:0:0:0: Direct-Access     Adafruit Metro M4 Airlift 1.0  PQ: 0 ANSI: 2
[64371.727857] sd 3:0:0:0: Attached scsi generic sg0 type 0
[64371.728384] sd 3:0:0:0: [sdb] 4089 512-byte logical blocks: (2.09 MB/2.00 MiB)
[64371.730074] sd 3:0:0:0: [sdb] Write Protect is off
[64371.730124] sd 3:0:0:0: [sdb] Mode Sense: 03 00 00 00
[64371.732648] sd 3:0:0:0: [sdb] No Caching mode page found
[64371.732689] sd 3:0:0:0: [sdb] Assuming drive cache: write through
[64371.743685]  sdb: sdb1
[64371.744595] sd 3:0:0:0: [sdb] Attached SCSI removable disk
[64438.752633] usb 2-1: reset full-speed USB device number 35 using xhci_hcd
[64454.587060] usb 2-1: device descriptor read/64, error -110
[64470.460120] usb 2-1: device descriptor read/64, error -110
[64470.739071] usb 2-1: reset full-speed USB device number 35 using xhci_hcd
[64486.332090] usb 2-1: device descriptor read/64, error -110
[64502.203089] usb 2-1: device descriptor read/64, error -110
[64502.483074] usb 2-1: reset full-speed USB device number 35 using xhci_hcd
[64512.112261] usb 2-1: Device not responding to setup address.
[64521.952427] usb 2-1: Device not responding to setup address.
[64522.154971] usb 2-1: device not accepting address 35, error -71
[64522.330956] usb 2-1: reset full-speed USB device number 35 using xhci_hcd
[64531.961543] usb 2-1: Device not responding to setup address.
[64541.809557] usb 2-1: Device not responding to setup address.
[64542.034952] usb 2-1: device not accepting address 35, error -71
[64542.035098] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=133s
[64542.035166] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 21 00 00 08 00
[64542.035208] I/O error, dev sdb, sector 33 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[64542.035341] sd 3:0:0:0: [sdb] tag#0 timing out command, waited 60s
[64542.037941] usb 2-1: USB disconnect, device number 35
[64542.043947] Buffer I/O error on dev sdb1, logical block 4, async page read
[64542.242939] usb 2-1: new full-speed USB device number 36 using xhci_hcd
[64558.011100] usb 2-1: device descriptor read/64, error -110

Plugging out is the only way to fix this (sometimes), so the RESET button doesn't work in those cases at all.

Also, in case the drive was present and I mount it and do just anything with it (except simply ls, which works), I get the following dmesg logs (I cut the duplicate lines compared to the above log):

[ ... ]
[142930.392228] sd 5:0:0:0: [sdd] Attached SCSI removable disk
[143338.141609] FAT-fs (sdd1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[143419.796374] usb 4-11: reset full-speed USB device number 14 using xhci_hcd
[143435.645074] usb 4-11: device descriptor read/64, error -110
[143451.517051] usb 4-11: device descriptor read/64, error -110
[143451.733056] usb 4-11: reset full-speed USB device number 14 using xhci_hcd
[143467.389108] usb 4-11: device descriptor read/64, error -110
[143483.261000] usb 4-11: device descriptor read/64, error -110
[143483.477009] usb 4-11: reset full-speed USB device number 14 using xhci_hcd
[143488.779102] xhci_hcd 0000:00:07.0: Timeout while waiting for setup device command
[143494.411185] xhci_hcd 0000:00:07.0: Timeout while waiting for setup device command
[143494.618985] usb 4-11: device not accepting address 14, error -62
[143494.733095] usb 4-11: reset full-speed USB device number 14 using xhci_hcd
[143500.043185] xhci_hcd 0000:00:07.0: Timeout while waiting for setup device command
[143505.675029] xhci_hcd 0000:00:07.0: Timeout while waiting for setup device command
[143505.882962] usb 4-11: device not accepting address 14, error -62
[143505.885750] usb 4-11: USB disconnect, device number 14
[143505.886211] sd 5:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[143505.886271] sd 5:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 03 00 00 05 00
[143505.886326] I/O error, dev sdd, sector 3 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 2
[143505.891942] device offline error, dev sdd, sector 3 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[143505.892019] FAT-fs (sdd1): FAT read failed (blocknr 2)
[143505.900556] sdd: detected capacity change from 4089 to 0
[143506.043941] usb 4-11: new full-speed USB device number 15 using xhci_hcd
[143521.661056] usb 4-11: device descriptor read/64, error -110
[143537.533073] usb 4-11: device descriptor read/64, error -110
[143537.748971] usb 4-11: new full-speed USB device number 16 using xhci_hcd
[143553.405046] usb 4-11: device descriptor read/64, error -110
[143569.276993] usb 4-11: device descriptor read/64, error -110
[143569.379125] usb usb4-port11: attempt power cycle
[143569.758957] usb 4-11: new full-speed USB device number 17 using xhci_hcd
[143574.795077] xhci_hcd 0000:00:07.0: Timeout while waiting for setup device command

Note that also fsck didn't work, since it's a writing operation. It actually works to use, except writing. That's why I heavily assume that the flash is broken: any write operation gets completely stuck.

Also, I got a j-link mini now, so I would be happy to try out fixing the board with it, I just have never used it before.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions