Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32S2 time.sleep(60) fails but time.sleep(15) works #4061

Closed
jerryneedell opened this issue Jan 25, 2021 · 19 comments
Closed

ESP32S2 time.sleep(60) fails but time.sleep(15) works #4061

jerryneedell opened this issue Jan 25, 2021 · 19 comments
Assignees
Labels
espressif applies to multiple Espressif chips network
Milestone

Comments

@jerryneedell
Copy link
Collaborator

jerryneedell commented Jan 25, 2021

May be related to adafruit/Adafruit_CircuitPython_Requests#63

On a SAOLA WROVER
If I run the code below with time.sleep(15) it executes normally
but with time.sleep(60)
it executes normally for the first pass
throws and error ENOTCONN on the second pass but recovers
then fails on the thirds and subsequent passes

Am I doing something wrong?
has time.sleep() changed?


Adafruit CircuitPython 6.2.0-beta.0-3-g97f5d218a on 2021-01-24; Saola 1 w/Wrover with ESP32S2
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.

code.py output:
<Network> Needell Airport -57 6
<Network> Needell Airport -41 1
<Network> central2.4 -76 7
<Network> 31A -77 7
<Network> PC -89 10
None
ip 10.0.0.111
Counter: 0
Sending to Adafruit IO...
Data sent!
Counter: 1
Sending to Adafruit IO...
Error sendind data - try again  [Errno 128] ENOTCONN
Counter: 1
Sending to Adafruit IO...
Data sent!
Counter: 2
Sending to Adafruit IO...
Error sendind data - try again  [Errno 128] ENOTCONN
Counter: 2
Sending to Adafruit IO...
Error sendind data - try again  Repeated socket failures
Counter: 2
Sending to Adafruit IO...
Error sendind data - try again  Repeated socket failures
Counter: 2
Sending to Adafruit IO...

here is the code

import ipaddress
import wifi
import socketpool
import time
import adafruit_requests
import ssl
from adafruit_io.adafruit_io import IO_HTTP, AdafruitIO_RequestError
import board
import busio
import digitalio

try:
    from secrets import secrets
except ImportError:
    print("WiFi secrets are kept in secrets.py, please add them there!")
    raise

for network in wifi.radio.start_scanning_networks():
    print(network, network.ssid, network.rssi, network.channel)

wifi.radio.stop_scanning_networks()

print(wifi.radio.connect(secrets["ssid"], secrets["password"]))

print("ip", wifi.radio.ipv4_address)

pool = socketpool.SocketPool(wifi.radio)
requests = adafruit_requests.Session(pool, ssl.create_default_context())


aio_username = secrets["aio_username"]
aio_key = secrets["aio_key"]

# Initialize an Adafruit IO HTTP API object
io = IO_HTTP(aio_username, aio_key, requests)

try:
    feed = io.get_feed("test")
except AdafruitIO_RequestError:
    # If no feed exists, create one
    feed = io.create_new_feed("test")

counter = 0
while True:
    data = "%d" % (counter)
    try:
        print("Counter: {0}".format(counter))
        print("Sending to Adafruit IO...")
        io.send_data(feed["key"], counter)
        print("Data sent!")
        counter += 1
    except Exception as e:
        print("Error sendind data - try again ", e)
        continue
    time.sleep(60)
@jerryneedell jerryneedell changed the title ESP32S2 time.sleep(60 fails but time.sleep(15) works ESP32S2 time.sleep(60) fails but time.sleep(15) works Jan 25, 2021
@dhalbert dhalbert added this to the 6.2.0 milestone Jan 25, 2021
@dhalbert dhalbert added espressif applies to multiple Espressif chips network labels Jan 25, 2021
@tannewt
Copy link
Member

tannewt commented Jan 25, 2021

I don't think this is a sleep issue. I think it has to do with @hierophect's socket changes changing the exception when a socket times out and closes. By increasing the sleep, you are giving the socket time to close. I suspect the fix is to make requests catch the error, not to change CircuitPython, because I know @hierophect did a bunch of work comparing the failure modes with CPython.

@hierophect
Copy link
Collaborator

@anecdata @tannewt I'm going to track the ENOTCONN problems in this thread since it's for Circuitpython as opposed to Requests which I'm not as familiar with. Currently ENOTCONN is thrown during failed send() calls, which is not the right application of that error. I'm not sure what to replace it with yet, Cpython isn't super clear on that front.

Is the failure of some send() calls a normal part of this application? If too many are failing, it might be the result of some of the non-blocking changes causing lwip_send to return -1 because of a timing issue or something similar.

@hierophect
Copy link
Collaborator

hierophect commented Jan 26, 2021

@jerryneedell I get a TypeError: This library requires a WiFiManager object. when executing your code. was out of date library

@hierophect
Copy link
Collaborator

I've been working on this more in my #4049 PR build. Here are the results I get, with extra debug information added from the C level:

None
ip 192.168.10.117
Init IO
GetIO
New Socket:60 errno:9 Bad file number
New handle index:0
Wrapping socket:60
Counter: 0
Sending to Adafruit IO...
Data sent!
Counter: 1
Sending to Adafruit IO...
Error sending data - try again  Failed SSL handshake
Counter: 1
Sending to Adafruit IO...
New Socket:62 errno:9 Bad file number
New handle index:1
Wrapping socket:62
Data sent!
Counter: 2
Sending to Adafruit IO...
Error sending data - try again  Failed SSL handshake
Counter: 2
Sending to Adafruit IO...
New Socket:-1 errno:23 Too many open files in system
Error sending data - try again  Out of sockets
Counter: 2
Sending to Adafruit IO...
New Socket:-1 errno:23 Too many open files in system
Error sending data - try again  Out of sockets
Counter: 2
Sending to Adafruit IO...
New Socket:-1 errno:23 Too many open files in system
Error sending data - try again  Out of sockets

Successful sends are spaced out, but when failures occur, they happen in a constant stream with no delays. There are a couple things I find confusing about this.

  • When creating sockets one at a time in a separate test, I observe that we have a max of 4 (backed up by the datasheet) and the numbers ascend from 60 to 63. I don't know why in this test it jumps from 60 to 62 - this implies to me that extra sockets are being created in LWIP without having corresponding Python objects attached to them, but I don't know where this is happening (the "New Socket" line appears whenever I call LWIP_Socket, and we only see it twice here.)
  • The SSL handshake failure (this is the true error being reported on Send() failure, not ENOTCONN) is weirdly deterministic, in that it always happens on the second and fourth attempt. If it was just a fluke, I'd expect it to happen at different times.
  • I'm not sure if the request library is supposed to be responsible for closing these sockets after an error?
  • I don't think the bad file number errno means anything when the sockets are successful but I could be wrong?

@hierophect
Copy link
Collaborator

I can also confirm none of this happens with time.sleep(15) either - it just uses socket 60 happily over and over with no failures or new sockets created.

@hierophect
Copy link
Collaborator

hierophect commented Jan 29, 2021

I think this might actually be a Requests library error after all? Does this line seem like it could be a potential socket leak? The debug output suggests that sockets are not being closed properly - I think what's happening here is that the first call creates a new socket, then the second one tries to re-use that socket and fails, but doesn't close it. I still don't know why the sockets are incrementing by 2 internally, though...

@hierophect
Copy link
Collaborator

hierophect commented Jan 30, 2021

This turned out to be a 4(!) part issue:

  • The very first problem was that Failed SSL handshake was being mis-reported as [Errno 128] ENOTCONN by send(). That was my bad, it just needed some extra code to check the MbedTLS errors properly.
  • The first "real" problem, which @jerryneedell was encountering, was that only _SendFailed exceptions were being caught when Requests made a send_request. So when an SSL connection timed out, which was the case for the 60s delay, the resulting Failed SSL handshake error was being completely ignored, and the program would go on to try and do a recv_into on an invalid pipe and throw another Failed SSL handshake (accidentally shown as ENOTCONN). Issue here: SSL handshake failures are not caught Adafruit_CircuitPython_Requests#67
  • This exception, when caught in code.py, would cause a socket leak, causing the sockets to run out. I've also discovered that all TLS sockets actually use two sockets, one you know about and a secret one under the hood, so an ESP32-S2 has a max of just two TLS sockets before it runs out. This isn't an issue, just annoying.
  • Solving these problems (fixing the error message, and making the except: catch all errors) then results in using sockets as dict keys Adafruit_CircuitPython_Requests#66, seen as a mysterious Error sending data - try again <SSLSocket> message. Long story short this is because the hash was changing when the socket closed. More details in issue.

I still don't know if any of this is related to adafruit/Adafruit_CircuitPython_Requests#63. @anecdata you could try out some of these changes and see if they help with your problem.

For Circuitpython, the changes that need to happen are the fix to send() error messages, and changing the Socket get_hash methods to return their own id() values rather than the socket number (pending further conversation in #4101). I'll try and fold them into #4049 since trying to put them in separately would just cause messy merge issues.

@anecdata
Copy link
Member

I may be recalling wrong, but I thought default CONFIG_LWIP_MAX_SOCKETS was 10, and we bumped it down to 4 (maybe to save memory)? Having only 2 usable TLS sockets doesn't leave much headroom.

@hierophect
Copy link
Collaborator

@anecdata I think you're right, but I wasn't the one who lowered the max so I don't know how much of a memory drag it is.

@hierophect
Copy link
Collaborator

@jerryneedell can you retest this and see if it works for you now? My personal test works.

@anecdata
Copy link
Member

anecdata commented Feb 4, 2021

Two issues I'm encountering post-4049:

  1. HTTP doesn't work for me Repeated socket failures after ENOTCONN Adafruit_CircuitPython_Requests#63 (comment)
  2. Under certain circumstances, OSError: Failed SSL handshake occurs repeatedly (retry immediately will work), particularly if there has been some delay since the previous Request Repeated socket failures after ENOTCONN Adafruit_CircuitPython_Requests#63 (comment) (this is with a patched Requests to catch OSError so that sockets don't run out)

Addendum:
I realized the right call for HTTP I think should be:
http = requests.Session(socket)
but the same exception trace occurs.

@hierophect
Copy link
Collaborator

@anecdata I'm looking into the HTTP thing now, sorry for the delay, I've been having power cutouts from the weather. As for the second point, are we sure that isn't defined behavior? I'm not an SSL expert but can an SSL session just persist indefinitely?

@anecdata
Copy link
Member

anecdata commented Feb 4, 2021

The OSError: Failed SSL handshake seems to be correlated to no keep-alive, so either no Connection header, or Connection: close.

@jerryneedell
Copy link
Collaborator Author

@hierophect I just tried with tip of main and it still fails with time.sleep(60)-- do I need pull in a PR or update any libraries?

@hierophect
Copy link
Collaborator

@jerryneedell you'd need to go into Requests and search for except _SendFailed and replace it with except (_SendFailed, OSError):

@jerryneedell
Copy link
Collaborator Author

@hierophect That appears to have fixed it for my initial example -- made it through 3 cycles so far

@jerryneedell
Copy link
Collaborator Author

@jerryneedell you'd need to go into Requests and search for except _SendFailed and replace it with except (_SendFailed, OSError):

Is this change being incorporated into Requests?

@anecdata
Copy link
Member

anecdata commented Feb 4, 2021

I'll submit a PR

@hierophect
Copy link
Collaborator

I'm closing this issue now as @jerryneedell's remaining issue is in the Requests library and is continued in this issue: adafruit/Adafruit_CircuitPython_Requests#67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
espressif applies to multiple Espressif chips network
Projects
None yet
Development

No branches or pull requests

5 participants