-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feather M4 express: bytes above BOOTPROT being zeroed #95
Comments
Hi @heymanrl, thank you for all the investigation you've done on this. We've seen the issue with blanking the very first bit of memory before on the SAMD51 and set BOOTPROT to work around this issue. (This was when CircuitPython ran after the bootloader.) This is the first time I've heard of blanking the memory just after the protected area. My assumption was that it is a mix of software and hardware that cause the issue. Perhaps the DMA controller is the issue and it's only used in some sketches. Or maybe its a bug with the USB peripheral. I doubt it's the software alone because both the UF2 and CircuitPython code runs on the SAMD21 without issue. The next step to me in my mind would be to replicate it with lower level primitives and hone in on the lowest level register write that causes the issue. Reaching out to MicroChip would also be good. Maybe they've seen this issue too. Unfortunately, I don't think we at Adafruit have the cycles to debug this though. It is rare enough to impact very few people, very sporadically. It sounds like you've got a very thorough testing setup for this. Let us know if we can do anything to support your investigation. |
@heymanrl In https://forums.adafruit.com/viewtopic.php?f=57&t=158718&start=15#p784997, You mentioned If you leave the OLED code in, but remove the OLED device, do you ever get this problem? (I assume the code fails early because it can't talk to the OLED.) I'll study the code later to see what might get it into a state where it wants to write. |
I can try with the OLED device removed. I have repeatedly set BOOTPROT to zero, memory location starting at 0x0000 gets cleared. Set BOOTPROT to 16K clears location 0x4000; set to 24K clears 0x6000; set to 120K clears 0x1E000. I can't trace into the bootloader code but assume its because I need to create two projects in AS7 Solution, (one the bootloader and the other my code). After that, I should be able to track it down with the ICE, look at the Call Stack and maybe even break on a data change at the suspect memory locations. If anyone can help me get the bootloader added to my AS7 solution, I think I can find the problem. |
@heymanrl I spent a little more time on this. Thank you for all your debugging. I have some questions and some speculations. As @tannewt mentioned, we have seen something similar before, here: adafruit/circuitpython#869. In that case BOOTPROT was not set at all. The first 8 bytes of flash were zeroed, and also at least the 512th byte (hard to tell because it's surrounded by zeros). If indeed these are the same problem (though the number of bytes zeroed is different, maybe?), then this problem may not be CircuitPython, as we originally thought. Your experimental results setting different values of BOOTPROT are very intriguing. The only code that deals with the BOOTPROT value is in Questions for you, if you are able to respond:
Thanks again for your perseverance in examining this. I cannot suggest how to get your code and the bootloader into a single AS7 solution: I'm not familiar with Atmel Studio. But if we could set a watchpoint on the smashed locations, we could then perhaps catch the offending write, and narrow it down to the program or the bootloader code. |
How many bytes do you see zeroed just above the BOOTPROT region? If I understand correctly, this happens only with the program above, and you narrowed it down to a particular line in the program, as described in the forum thread. Using Blink or slight variations on your SSD1306 program cause the problem to disappear. Is that correct? Yes. The "my version 7" submitted has some oled display commands inactive and does not fail. The "my version 3" submitted has all oled display commands active and fails. How are you power-cycling automatically? Are you removing 5V power completely, and how are you doing that, in terms of hardware? Yes, I cycle power input to the 'Bat' pin. I use an IR4427 FET driver IC. The IR4427 can source/sink around 1.5 amps. It accepts 3.3v TTL logic level inputs (but can handle inputs up to Vsupply). I use +9vdc as the supply voltage. I connect one of the channel outputs through a 3.9v zener (to drop the voltage to around 5vdc) to the Feather M4 'Bat' input. I then drive the IR4427 with a 1 Hz, 50% duty cycle signal from a function generator. That cycles power 500ms ON and 500ms OFF. a. toggling the EN pin on the Feather, which enables/disables the 3.3V regulator (which powers the OLED board as well), and Thanks again for your perseverance in examining this. I cannot suggest how to get your code and the bootloader into a single AS7 solution: I'm not familiar with Atmel Studio. But if we could set a watchpoint on the smashed locations, we could then perhaps catch the offending write, and narrow it down to the program or the bootloader code. Exactly. That's the best plan going forward. |
For reference, "version 7" code that does not crash, copied from https://forums.adafruit.com/viewtopic.php?f=57&t=158718&start=15#p783840 |
Has there been any progress with this use? I am seeing something very similar to heymanrl with my Feather M4 Express. After a number of power cycles the feather 'forgets' its sketch and needs to be reflashed. |
This has moved up on my priority list; I want to set up a test rig like heymanrl's. |
Any updates on this? This issue is still massively affecting us.
…On Mon, 20 Jan 2020, 21:07 Dan Halbert, ***@***.***> wrote:
This has moved up on my priority list; I want to set up a test rig like
heymanrl's.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#95?email_source=notifications&email_token=AB7P3SG4HOBN2DZ62E2S5HTQ6ZKH5A5CNFSM4JYP3Z72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJOID6A#issuecomment-576487928>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB7P3SH2EU2OWAVGPS7O7K3Q6ZKH5ANCNFSM4JYP3Z7Q>
.
|
No updates, but still high priority. Getting CircuitPython 5.0.0 done is first priority. @PrinceAli321 Could you provide any other clues? Does it affect only certain programs? Are you using Arduino or CircuitPython? Do you have other devices connected or can it happen on a bare board? Does it happen only on power cycle or can it happen when the board is reset? Any simple programs you can supply that exhibit the behavior, together with what is connected to the board (if anything), will help in debugging this. Thanks. |
@heymanrl I have started working on this. I am running
I am seeing 16 bytes zeroed at offset 0x4000 (16384) (just past the bootloader), and 8 more bytes zeroed at offset 0x4200 (16384+512). But that's just one sample of the failure. UPDATE: |
Pinging @PrinceAli321 again. Could you provide any other clues? Does it affect only certain programs? Are you using Arduino or CircuitPython? Do you have other devices connected or can it happen on a bare board? Does it happen only on power cycle or can it happen when the board is reset? Any simple programs you can supply that exhibit the behavior, together with what is connected to the board (if anything), will help in debugging this. Thanks. |
@heymanrl I can get v3 to fail after, usually after a few dozen to a few hundred power cycles. And now more interesting, I tried your v7 and it also failed (after 157 cycles), which is not your experience, but is interesting. I was suspicious of it having much to do with the sketch, and that confirms it. I also instrumented the v3 sketch to catch any zeroing, and it failed without hitting any of my checks, so that seems to confirm it is a bootloader problem. I'm trying various ways of instrumenting the bootloader, but it's difficult due to the power cycling, which upsets the J-Link. Will continue. This running diary is to help me keep track of things as well. |
Hi Dhalbert, Things my sketch have in common with @heymanrl's are:
I've not seen this issue on any other feather projects. I did notice that when power-cycling the feather the pic can be held up by the inputs from A3/A4 and when rebooted with load on A3/A4 the feather ended up in a similar corrupted state (this time the NEO light was stuck on white, otherwise same symptoms). After that i tried sequencing the power to ensure inputs where low while the board started up which seemed to cure the 'white light' problem, but still after power-cycling over and over i'll see the red-light of death. I've only ever noticed the problem on power-cycles. Hope that helps, let me know if you want to know more |
@PrinceAli321 Thank you! Are you using any external boards such as a display? I'm trying to see if the display is the culprit, or it's pin reading. I have been making minor variations to @heymanrl's program with seemingly large effects (like skipping a pin read). |
Nope, not using any external boards/display. Just the 2 DACS and ADCs
really.
…On Tue, 10 Mar 2020, 21:43 Dan Halbert, ***@***.***> wrote:
@PrinceAli321 <https://github.com/PrinceAli321> Thank you! Are you using
any external boards such as a display? I'm trying to see if the display is
the culprit, or it's pin reading. I have been making minor variations to
@heymanrl <https://github.com/heymanrl>'s program with seemingly large
effects (like skipping a pin read).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#95?email_source=notifications&email_token=AB7P3SE7FMBHAO6RNPQQF5TRG2X7FA5CNFSM4JYP3Z72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEONJCOA#issuecomment-597332280>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB7P3SHHECGYYX5ZKFC5LADRG2X7FANCNFSM4JYP3Z7Q>
.
|
Hello, |
Hello, |
@haukehaseler Thanks for your suggestions, which are very helpful. I am actively working on this, and have also been in touch with MicroChip. I am looking at brownout settings and the RCAUSE settings. @PrinceAli321 is not using any external boards but still has the problem, so it's not always extra capacitance, though that may increase the chances of a problem. I have been pruning down @heymanrl's test programs in various ways. Removing various uses of GPIO, DAC, or I2C seem to make the problem go away, which is very odd. If the problem were strictly in the bootloader then the user program should not matter. Power-down, not just power-up, may be part of the issue. Please feel free to continue to speculate. |
Dear Dan,
first of all, thank you for spending so much effort on this problem. Here is what I can report up to now:
- I managed to trigger the fault by automatically power-cycling my board. The fault appears after a varying number of power cycles (between 100 and 1000). My program (C++, based on the Arduino framework), makes excessive use of i2c, SPI, GPIOs and timer interrupts.
- The one thing that works in my case (“works” means that the program memory fault did not appear after over 10000 power cycles) has to to with float initialisation. I corrected all instance of
float a = 1;
float b = 2.0;
to
float a = 1.0f;
float b = 2.0f;
You said I should feel free to speculate, so here come the speculations:
- The problem occurs at power-down, not at startup. It only occurs when a particular instruction is executed at the time of power loss.This would explain the seemingly random occurrence of the problem.
- This instruction has to do with the floating point unit. This would explain the effect I described above.
- Since flash memory is erased, and only the bootloader contains code to do that, the execution “jumps” into the bootloader by mistake. If this can happen at all, it sounds like a brown-out related problem.
- Maybe the FPU is affected by brown-outs first.
The things I will try to test next are:
- create a program that fails faster (more often, ideally always)
- enable the two brown-out-detectors
- disable the FPU
I hope this helps,
best regards,
Hauke
… On 20. Mar 2020, at 14:06, Dan Halbert ***@***.***> wrote:
@haukehaseler <https://github.com/haukehaseler> Thanks for your suggestions, which are very helpful. I am actively working on this, and have also been in touch with MicroChip. I am looking at brownout settings and the RCAUSE settings.
@PrinceAli321 <https://github.com/PrinceAli321> is not using any external boards but still has the problem, so it's not always extra capacitance, though that may increase the chances of a problem.
I have been pruning down @heymanrl <https://github.com/heymanrl>'s test programs in various ways. Removing various uses of GPIO, DAC, or I2C seem to make the problem go away, which is very odd. If the problem were strictly in the bootloader then the user program should not matter. Power-down, not just power-up, may be part of the issue.
Please feel free to continue to speculate.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#95 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AO3HNN6ZRB5CQWWXVPTH6DDRINS6TANCNFSM4JYP3Z7Q>.
|
@haukehaseler Thank you. If you are willing to share your programs before and after changes, that would be great. I will look at the assembly output to check the differences. I worked on this more over the weekend, and am going to start working with the brown-out detectors and their hysteresis settings, and with inserting a short delay on power-up in the bootloader to ensure the power is more stable. Seemingly random changes in the user program sometimes seem to affect the probably of failure. I have wondered if it has to do with certain instructions being on certain memory boundaries. I have one board cycling another. I toggle the EN line (3.3V regulator disable) on a Feather M4 with a short power cycling program which also monitors the state of a data pin on the Feather M4. The user program sets that pin high in |
Hi everyone,
That problem is so weird that I would expect to see something written
about in some µC errata note ;-) I suppose someone checked them just in
case?
Just my 0.03$CAN (~0.02$US),
Martin
…On 2020-03-23 12:09, Dan Halbert wrote:
@haukehaseler <https://github.com/haukehaseler> Thank you. If you are
willing to share your programs before and after changes, that would be
great. I will look at the assembly output to check the differences.
I worked on this more over the weekend, and am going to start working
with the brown-out detectors and their hysteresis settings, and with
inserting a short delay on power-up in the bootloader to ensure the
power is more stable.
Seemingly random changes in the user program sometimes seem to affect
the probably of failure. I have wondered if it has to do with certain
instructions being on certain memory boundaries.
I have one board cycling another. I toggle the EN line (3.3V regulator
disable) on a Feather M4 with a short power cycling program which also
monitors the state of a data pin on the Feather M4. The user program
sets that pin high in |setup()|. The cycling program will stop when it
detects that the pin is low, so it's easy to leave it running and then
see exactly when it failed.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#95 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AESWY6AU5VYBSPEL4R4GBT3RI6CVFANCNFSM4JYP3Z7Q>.
|
There's nothing in the errata that's like this, unfortunately. I opened a case with MicroChip and they said they'd only seen this on an M0+ on a board that didn't have a decoupling cap on VDDcore. Our boards have plenty of decoupling caps, but not necessarily the exact ones in their sample schematics. I have ordered a SAME54 Xplained Pro MicroChip dev board, to see if it can be reproduced on that official board. The chip is essentially the same. |
Well, thanks for looking into this.
As an owner of a Metro M0 Express who had random reboots, I wonder if it
was only my bad newbie programming that was the cause now. Probably it
was that I thought I was a better programmer than I really was, and did
things I didn't fully grasp ;-) At least, I got a J-Link mini EDU to try
to find out! I think I fixed it by better managing timing and buffers.
Anyway, good luck and success!
Martin
…On 2020-03-23 22:05, Dan Halbert wrote:
There's nothing in the errata that's like this, unfortunately. I
opened a case with MicroChip and they said they'd only seen this on an
M0+ on a board that didn't have a decoupling cap on VDDcore. Our
boards have plenty of decoupling caps, but not necessarily the exact
ones in their sample schematics.
I have ordered a SAME54 Xplained Pro MicroChip dev board, to see if it
can be reproduced on that official board. The chip is essentially the
same.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#95 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AESWY6B4RTDVOXXQZSEGPLLRJAIPTANCNFSM4JYP3Z7Q>.
|
Hello everyone, |
@haukehaseler Wow, thank you for finding that forum post! I had done a lot of searching but not stumbled upon it. We do set BOD33 as you showed in CircuitPython, but it is not done in the bootloader. We had seen the spurious flash when CircuitPython was running too, but those failures might have been due to a brief while the bootloader is running, before CircuitPython started. So we should set BOD33 as soon as possible in the bootloader. This was one of the things I was going to try, and I'll now make it the highest priority. |
Yes, I now enabled the BOD in the bootloader and yes, I took the relevant three lines of code from a CircuitPython github page (laziness, thy name is programmer). |
Some initial testing shows raising the BOD33 level to 2.7V appears to fix the flash write problem! I don't have a test bootloader yet for you all yet, though, because this or perhaps some other changes I am also doing are breaking double-tap. Stay tuned. |
Here is a test bootloader for all of you to try. It uses the BOD33 brownout detector circuitry to busy-wait until the voltage has stabilized above 2.7V for at least 100msecs. Once that point is reached, reset-on-brownout below 2.7V is enabled. I've been running this with @heymanrl's v3.ino for over 18000 cycles without failure. Simply setting the brownout-on-reset immediately on startup doesn't work as well; that can cause multiple resets while powering-up, which confuses the double-click detection software. Unfortunately we also cannot set the fuses to enable the brownout detector automatically, because of a SAMD51 erratum: if BOD33 is enabled in the fuses, it can make it impossible to connect a debugger to the chip. Here's a bootloader updater to try for Feather M4. Unzip the file below to get a |
Complete details provided in this post:
https://forums.adafruit.com/viewtopic.php?f=57&t=158718
Feather M4 Express does not launch user code because memory locations at the beginning of unprotected FLASH memory are being cleared (set to 0x0) during power-on.
Arduino IDE v1.8.10
Adafruit Feather M4 Express; part #3857
Adafruit FeatherWing OLED 128x32; part #2900
Adafruit bootloader v3.7.0
Adafruit_SSD1306 library v2.0.2
No additional hardware is required. Just stack a Feather and OLED, install 'my version 3' then cycle power on/off. The failure usually occurs within 100 on/off power cycles but it is random and may require thousands of power-on-cycles to observe the failure. I automated cycling power at 500ms ON, 500ms OFF.
'my version 3' application code provided below
[EDIT: turned into an attachment for easier downloading -- @dhalbert]
v3.ino.txt
The text was updated successfully, but these errors were encountered: