-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wi-Fi Radio [Dis-]Connection handling: exceptions, reconnection, logging #3837
Comments
I'm all for expanding what we reconnect on. Perhaps we need to invert the test and only not reconnect on the reason we get when we've triggered the disconnect. |
I'm not sure if there is a reason that is 1:1 with manual disconnect, so maybe that's better as a variable. There's also the possibility of manual disconnect coinciding with some network issue that also triggers a disconnect (probably rare). This mentions Reasons for various disconnect causes, but doesn't mention a Reason associated with with a manual disconnect. |
@anecdata I think the simplest thing would be to trigger a disconnect and see what reason you get back. :-) |
I haven't found a way to do that from CircuitPython, so I suppose it's about time to begin the trek up the learning curve of how to make changes to the core ;-) hmm, we never actually call |
The only way I can think of is to set the |
There may be external causes for Most other disconnection reasons should be fair game for automatic reconnection attempts. |
A couple of observations in testing so far:
usually followed (currently) by I'm working on a PR to flow more disconnect reason info back to the user to help users and support folks to differentiate various failure scenarios within "Unknown failure". The logging is reasonably thorough, but requires a DEBUG build. Maybe until ESP32-S2 is considered Stable, we could enable logging by default, at least for Warnings and more severe? Addendum: it's useful to note in the docs which Reasons arise solely from the AP sending a frame with that reason, vs. Reasons that may arise from internally detecting a condition within the station. |
This sounds good to me!
Would you mind adding them? I don't know the difference myself. |
re: logging ...the build process is mostly a black box to me, but maybe there's a way to do it there in the automated builds. I can start to look into that, any guidance would be helpful. Espressif does provide APIs to set a lower per-module severity threshold at runtime than the one set at compile time (I think that would involve a new circuitpython API). re: Reason causes ...I'm not sure where you were thinking to add to the docs (always happy to), I only meant that in Espessif's docs), they itemize those Reasons that arise internally from low-level IDF logic vs. those that arise from standard IEEE 802.11 reasons (often corresponding to a specific management frame) vs. those that could arise from either cause. |
I think it's a setting you can set with menuconfig. However, I just realized we actually disable the debug UART in non-debug builds so that it can be used by user code. I think that's still best. Maybe we just need better error messages in circuitpython? |
I'm inclined to close this issue. Current status: Logging
Auto-Reconnect vs. Exception We now automatically try to re-connect for all but three disconnection reasons (PR 3992):
The first case is usually intended, but we may in the future want to distinguish between intended and unintended The latter two reasons will trigger exceptions. We may want to tweak these reasons as we get more experience. Some codes that are currently retried may turn out to be fatal and should raise an exception. |
Let's close for now. We can always open a new issue when we have additional specific improvements in mind. |
I'm not sure if this really belongs as an issue (there are a couple related already), but I wonder if we need to handle more reasons for disconnection with auto-reconnect, meaningful exception message, more logging coverage, or other handling.
I'm not sure the full distinction between the <100 codes and the >=200 codes. The >=200 codes seem to be more fatal, but we do reconnect on a 205. It may help to explicitly itemize which reasons:
Some users are experiencing failure to connect initially or to re-connect, reasons sometimes unknown. Clearly there are config and physical reasons why a connection may be frail, and I suppose we don't want to bog down the core with unnecessary strings (can we at least return the numeric codes?), but here's a survey of disconnection reasons, and where the core reacts to them (at least that I've found so far):
Disconnection reasons (code, docs):
‡ = disconnection reasons seen in the field with a small fleet of original esp32 over about a six-month period
The text was updated successfully, but these errors were encountered: