-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MatrixPortal hangs and disconnects from USB after reading URL multiple times #6205
Comments
Slightly more minimal example:
|
Can confirm this. |
Same issue. This code will not run properly without access to my Spotify api. |
same issue. I tried resetting the Network connection every couple minutes as well, but the issue still persists. Running on the latest 7.2.4 uf2 |
It's not a fix, but an option you have is to try the hardware watchdog, which would reset the whole microcontroller (the one where CircuitPython resides) if your program stops running normally. Before doing this, be comfortable with entering safe mode on your board, because an errant watchdog reset can make it difficult to modify your code.py. You'd initialize the watchdog with a timeout: from microcontroller import watchdog as w
from watchdog import WatchDogMode
w.timeout=2.5 # timeout in seconds
w.mode = WatchDogMode.RESET
w.feed() and within your top-level loop you'd keep the watchdog happy with |
I battled this as well and worked with staff to figure out the issue. We tried replacing hardware too which helped failure rates but it still fails. Trying the watchdog thing now and it seems to be stable so far, I just started a rather long print so we'll see if it's still running in the morning. |
For those of you who had this problem, what version of the NINA-FW firmware are you using on the ESP32 that's on the MtarixPortal? That is printed out by the simple Internet test given here: https://learn.adafruit.com/adafruit-matrixportal-m4/internet-connect. The key value is |
Firmware vers. bytearray(b'1.2.2\x00') |
1 similar comment
Firmware vers. bytearray(b'1.2.2\x00') |
Same here:
|
I tried using the watchdog time out as suggested above. Just had it crash and reset automatically. Thanks for the suggestion @jepler |
I had to increase the timeout to 15, and make sure you have the feed call both there and in your loop otherwise it's not fed soon enough to enter the loop. Also, if your loop is longer than the timeout, you'll have to feed the watchdog somehow or shorten your loop sleep. |
It's worth testing whether NINA v1.7.4 improves things, though I couldn't definitively tie any NINA fixes to this problem. It seems odd to me that this is only reported on MatrixPortal and not other M4+Airlift configs like PyPortal. |
On this board the 'effective' watchdog timeout values are slightly above 1, 2, 4, 8, and 16 seconds; values are rounded up. |
OK, I have managed to get an exception on MatrixPortal, using CIrcuitPython 7.2.5 and running NINA-FW 7.1.1, after 1766 seconds doing simple fetches from a local webserver, using the example above: #6205 (comment).
There were upstream fixes in NINA-FW 1.7.2, so I will upgrade to 1.7.4 and see if I can reproduce. |
I don't think this is the same issue. I've seen various exceptions from the lower-level libraries, including this one. The issue I'm reporting is that it hangs - there's no exception raised. |
I upgraded to 1.7.4, and I am also now seeing hangs, sometimes after a couple of minutes sometimes much longer. I don't think this is necessarily due to the upgrade; it's just that happened to see a different error before. No panel is attached to the MatrixPortal, so that's not a factor. So it's disappointing the upgrade is not helping, but I have some leads. |
Same for me, it hangs and, importantly, the USB device is unmounted. |
I haven't cleanly reproduced this yet, but using CP 7.2.5 and NINA 1.6.1, I notice that occasionally it takes over a minute for "Retrieving data..." to complete (usually more like 1.5 seconds), and if I try to re-save update: I hit a case where it's stuck at "Retrieving data..." for about 15 minutes so far, but it hasn't disconnected from serial nor ejected CIRCUITPY, but control-C has no effect. |
I have been testing a simpler web fetcher that does not use the MaxtrixPortal/PortalBase library. It's been running for hours without crashing. It is like #6205 (comment) above, but without the library. For both I have been fetching from a simple local webserver: import board
import busio
from digitalio import DigitalInOut
import adafruit_requests as requests
import adafruit_esp32spi.adafruit_esp32spi_socket as socket
from adafruit_esp32spi import adafruit_esp32spi
import time
try:
from secrets import secrets
except ImportError:
print("WiFi secrets are kept in secrets.py, please add them there!")
raise
esp32_cs = DigitalInOut(board.ESP_CS)
esp32_ready = DigitalInOut(board.ESP_BUSY)
esp32_reset = DigitalInOut(board.ESP_RESET)
spi = busio.SPI(board.SCK, board.MOSI, board.MISO)
esp = adafruit_esp32spi.ESP_SPIcontrol(spi, esp32_cs, esp32_ready, esp32_reset)
requests.set_socket(socket, esp)
if esp.status == adafruit_esp32spi.WL_IDLE_STATUS:
print("ESP32 found and in idle mode")
print("Firmware vers.", esp.firmware_version)
print("MAC addr:", [hex(i) for i in esp.MAC_address])
print("Connecting to AP...")
while not esp.is_connected:
try:
esp.connect_AP(secrets["ssid"], secrets["password"])
except RuntimeError as e:
print("could not connect to AP, retrying: ", e)
continue
print("Connected to", str(esp.ssid, "utf-8"), "\tRSSI:", esp.rssi)
print("My IP address is", esp.pretty_ip(esp.ip_address))
URL = "http://192.168.1.222:8000"
secs = 0
while True:
print(secs, "Fetch start ", end = "")
requests.get(URL)
print("Fetch end")
time.sleep(1)
secs += 1 |
fwiw, ESP32SPI, on all-in-ones and separate co-processor configs, has been very stable for me though not using any of the higher-level networking or portal-type libraries, only esp32spi and requests, no crashes in a long, long time. But... I restarted the Matrix Portal and let the code above (minor tweaks to print more info) run again until it hung (but again it didn't eject CIRCUITPY or disconnect from serial). After a few hours I came back and it hadn't changed, but macOS Finder, once selected, was spinning beach ball then eventually timed out, then CIRCUITPY ejected and serial disconnected. update: Last night I started up equivalent code on PyPortal, using the PyPortal library, which also uses PortalBase. It's still running. In the process I noticed that even the networking portions of the libraries differ significantly between PyPortal and MatrixPortal, and I suspect the issue (or trigger to the issue) is in the differences. |
I assume MatrixPortal initializes the rgbmatrix display. One suspicion I have is that there's still a low-probability "crash during soft reset" bug in that code. Maybe a program which
would give interesting results. It could help confirm/disconfirm my suspicion. |
Has anyone found any clues for resolving this/ work arounds for now? or news of firmware update in development that may fix it? |
As noted above, when not using the Matrix Portal library, I don't see these crashes. We're not sure why the library is causing a problem, but you could try rewriting your program to use |
I'm not using Matrix Portal library; direct adafruit_requests post methods with similar network setup via Internet connect guide, leading to the same symptoms mentioned here (hang, USB disconnect). My flash just wiped itself clean as I press reset button. So unless there's a manual wipe button sequence I just pressed, it seems like other things are wrong with my board. Not sure what to do at this point. I'll reset firmware and try to repeat symptoms I guess. I might as well ask.: Should a Matrix Portal M4 be able to run 3 small post requests every 30 seconds with 5kb responses? Does this raise any red flags for hardware limits of MP M4? Should that be something it can manage? I suppose I should have studied that before pursuing this project. I've tried several ways to do basic requests, from MatrixPortal library to wifi manager to direct requests. I can't get this to consistently do anything with the internet longer than 3-4 hours max without hanging. And I'm left wondering if this is simply beyond the capability of the device? |
@dsohrabian We would be quite interested in the simplest example program you have that seems to cause trouble, including your descriptions of what is returned from the requests. 5kB reponses might eventually lead to memory fragmentation, depending on how you hang on to the results. But there also might be bugs in the ESP32 firmware. |
Thank you, that is interesting. I have a lot to learn about memory use. As far the transit site: it definitely tolerates requests every 30 seconds. The site is even hard-coded to auto-refresh on that interval if you leave it open on browser. It is meant for public wear and tear and probably doesn't get much traffic. I push it and fetch 4 times on 30-second interval (which is the ultimate need of this project, but 1 per 30 seconds freezes the MP4 too). So far, the transit server doesn't seem to mind. Have only gotten 200 status codes on responses so far. This version of my |
@dhalbert One thing I've reproduced a couple times: when I comment out all the |
Thanks, that is very helpful. |
Just to clarify, is just |
It freezes when I am instantiating a Matrix object and using it to display data derived from requests. I've noticed the import alone doesn't cause the glitch.
…On Tue, May 31, 2022 at 1:03 PM Dan Halbert ***@***.***> wrote:
The second I bring the LED display (via adafruit_matrixportal Matrix) into
the picture, alongside fetching data online, the system will inevitably
hang within couple hours.
Just to clarify, is just import adafruit_matrixportal enough to cause
trouble, or are you displaying things?
—
Reply to this email directly, view it on GitHub
<#6205 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIK6JB4GYWZZYHAHYSTPZ63VMZA7LANCNFSM5RXPXU7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@dsohrabian I tried your 30-second-interval code.txt above. I stopped it after round 740, so it ran for hours without difficulty. The fix I mentioned above for |
@dhalbert Thank you so much for testing out the code. I will give your esp32spi library version a shot. By the way, the code.py I shared only runs 1 request per 30 seconds. The project needs 4 requests per 30 seconds, but that crashes even sooner when feeding the data to the LED display. And I was trying to keep it as simple as possible for reproducing so cut the other requests. I was hoping this wouldn't need to be manually reset every 1-6 hours, and was shooting for more like every couple weeks or months at worst. Without engaging with the Were you able to debug the crash on your end at all or did it disconnect without any clues? |
It did not crash or disconnect at all: I just stopped it because it had run for about 6 hours, far longer than the 90 minutes you mentioned. So definitely try the new library version. I was also using CircuitPython 7.3.0 Update all the libraries: the Portal libraries have also been revised, though not as recently. I am running the latest (1.7.4) NINA-FW firmware as well. To determine the version, you can use the test program here, which will print it out: https://learn.adafruit.com/adafruit-matrixportal-m4/internet-connect#connect-to-wifi-3034771-4 |
I changed your code to have it do four fetches at a time (removed |
That sounds like a great lead! Please let me know if I can help in any way. I will be running tests to confirm again that the freeze doesn't happen when display setup and text update lines are commented out. I am running 1.7.4 for the ESP too. |
@dhalbert A correction for you: performing the import via |
Is there a memory leak in 7.3.0 I just tried updating from a 7.2.x version and i had roughly 30,000 print out when calling gc.mem_free(), and only printing 7000 when calling gc.mem_free() and i keep getting this in the logs "memory allocation failed, allocating 232 bytes" |
@mpicker90 Please open a new issue or a thread in the forums with the details, including your code. Make sure your libraries are all the latest version. |
|
Oops, misinterpreted your latest post. So some interaction is causing problems, not sure what. I will see what is happening when the example with the MatriPortal library hangs. |
Dan, I really appreciate the efforts to figure this out. Will be eagerly following the issue. While this is helping me learn some lower-level concepts, this is above my understanding. But I would be glad to help how I can. |
@dsohrabian -- so you are not blocked, here is a build to try that might run for longer, and not hang up. It does not use DMA at all for SPI: It might still sporadically have the other problem I mentioned, which is a communications issue with the ESP32 Airlift. You might see |
Closing, but can continue ancillary discussion here if not directly relevant to #6489. |
Thank you, I will try the no DMA option and report back if it made a difference.
I've noticed this bug too and early on it misled me as the possible root cause. Glad you are aware of it too. It didn't happen often. And luckily it prompts an error exception which seems it can be handled with |
@dhalbert The project now works with this non-DMA version 🙏 Thank you so much for making the bootloader for me. And the I will hang on to this no-DMA version. It looks like you fixed it for the next release, so I'll plan on upgrading. |
I was just going to write to you. I fixed it in a different way, which I think will work too. It is not the same fix as what I gave you. See #6498 and adafruit/samd-peripherals#42. Try the build artifacts: https://github.com/adafruit/circuitpython/actions/runs/2516315728. Scroll down to see the artifacts. Unzip the file for your board and get the .uf2 you need. |
I used the latest version of the uf2 file and the newest library and still got the following error message: Getting time for timezone Europe/Zurich thanks |
Hi Marc,
I get that too still. It's a separate issue that I think needs to be explored. My workaround is to write a try except block for that error and use esp.reset() and esp.connect to reconnect to internet in the except
block. That has been working for me well.
except RuntimeError:
esp.reset()
esp.connect_AP(config["wifi_ssid"], config["wifi_password"])
…On Tue, Jun 28, 2022, 12:42 PM gadjodilo83 ***@***.***> wrote:
I was just going to write to you. I fixed it in a different way, which I
think will work too. It is not the same fix as what I gave you. See #6498
<#6498> and
adafruit/samd-peripherals#42
<adafruit/samd-peripherals#42>. Try the build
artifacts:
https://github.com/adafruit/circuitpython/actions/runs/2516315728. Scroll
down to see the artifacts. Unzip the file for your board and get the .uf2
you need.
I used the latest version of the uf2 file and the newest library and still
got the following error message:
Getting time for timezone Europe/Zurich
Zurückverfolgung (jüngste Aufforderung zuletzt):
Datei "code.py", Zeile 139, in
Datei "adafruit_portalbase/network.py", Zeile 247, in get_local_time
Datei "adafruit_portalbase/network.py", Zeile 216, in get_strftime
Datei "adafruit_requests.py", Zeile 818, in get
Datei "adafruit_requests.py", Zeile 674, in request
Datei "adafruit_esp32spi/adafruit_esp32spi_socket.py", Zeile 138, in recv
Datei "adafruit_esp32spi/adafruit_esp32spi_socket.py", Zeile 217, in
available
Datei "adafruit_esp32spi/adafruit_esp32spi.py", Zeile 776, in
socket_available
Datei "adafruit_esp32spi/adafruit_esp32spi.py", Zeile 332, in
_send_command_get_response
Datei "adafruit_esp32spi/adafruit_esp32spi.py", Zeile 299, in
_wait_response_cmd
Datei "adafruit_esp32spi/adafruit_esp32spi.py", Zeile 274, in
_wait_spi_char
RuntimeError: Error response to command
thanks
marc
—
Reply to this email directly, view it on GitHub
<#6205 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIK6JB3XQYOVB2CQ53CJ2KTVRMTP5ANCNFSM5RXPXU7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, the |
thanks for the reference, will test it! :) |
CircuitPython version
Code/REPL
Behavior
After some time passes (random, but typically less than an hour), the device stops responding.
The last thing printed is "Retrieving data..." (which is a print that can't be turned off in the CircuitPython libraries). The LED matrix is stuck on red (so the call to
fetch
hasn't returned.)After some time passes, USB disconnects. The board has to be hard-reset to recover.
There is no Python backtrace or any other error message printed via the serial port.
Description
No response
Additional information
I tested this initially with a text file hosted on an Apache server on the local LAN. I then tested with example.com just to rule out anything weird on my Apache instance. (So you ought to be able to repro this with various URLs, basically)
You can probably omit most of the exception handling from my example code. I've found that sometimes the MatrixPortal doesn't want to connect to wifi or the response gets screwy on occasion, so I handle those. However, I haven't hit those exceptions when performing this test, so it's likely not needed.
The text was updated successfully, but these errors were encountered: