-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP8266 send returns WOULD_BLOCK error when busy #9051
Conversation
@SeppoTakalo @VeijoPesonen @kjbracey-arm @KariHaapalehto , please review |
Should this or already was tested with the client ? |
@teetak01 is working on this: https://github.com/ARMmbed/mbed-client-testapp/pull/1156 |
@@ -479,6 +481,11 @@ int ESP8266Interface::socket_send(void *handle, const void *data, unsigned size) | |||
|
|||
status = _esp.send(socket->id, data, size); | |||
|
|||
if (status == NSAPI_ERROR_WOULD_BLOCK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If event
is the thing that leads to sigio, this should really be timed, or you're just immediately re-entering, probably. I would expect this to be using "call_in" for the event
, whereas the other timeout stuff in the PR could just be checking elapsed time in the send call, I think. (although I'm not 100% sure what that's doing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the status == NSAPI_ERROR_WOULD_BLOCK will only be returned from ESP8266::send function if both _busy and _busy_timeout_reached are set.
To set the _busy_timeout_reached flag we need to call the oob_busy_timeout() function which is only called in ESP8266Interface::_oob_busy_timeout() and that is called via a call_in from _oob_busy_detected().
So indirectly - this is timed. If the send returns with a valid return error within 1/10s the fact that there was a busy flag will be ignored.
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, not sure. Seems too complicated, and not sure there's a need.
If the device does promptly return BUSY or OK after each command, then can just finish on either of those. Timeout shouldn't matter if it's working.
After you get BUSY, check overall elapsed time to determine whether you want to try again, or think it's time to return EWOULDBLOCK. Just look at Kernel::get_ms_count()
to note time - don't need a callback, or any special handling in the busy OOB (just set busy flag and abort wait for OK).
@@ -951,6 +974,11 @@ void ESP8266::_oob_busy() | |||
MBED_ERROR(MBED_MAKE_ERROR(MBED_MODULE_DRIVER, MBED_ERROR_CODE_ENOMSG), \ | |||
"ESP8266::_oob_busy() AT timeout\n"); | |||
} | |||
if (!_busy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may as well just abort the current command here, right? If it says busy we know it's not going to say "OK". It the outer loop wants to retry immediately it can check elapsed time and reconsider. Else fall back to EWOULDBLOCK and retry after timed sigio.
There's probably never any need to be waiting inside the driver itself, unless we really are expecting another command to work if we just wait a few milliseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we abort here then do we need to set the 1/10s timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeouts would ideally be a "something's gone wrong" situation, and should never be hit. Timeout on recv("SEND OK") mattered before when we weren't detecting the BUSY, but if we now positively reckon we'll see BUSY or SEND OK or ERROR(?), timeout shouldn't really matter. Can be high if we never hit it in practice.
efb7ebe
to
4ac8c27
Compare
@kjbracey-arm can you please review latest commit? |
4ac8c27
to
9571f46
Compare
@@ -573,12 +575,22 @@ nsapi_error_t ESP8266::send(int id, const void *data, uint32_t amount) | |||
if (_serial_rts == NC) { | |||
while (_parser.process_oob()); // Drain USART receive register | |||
} | |||
_busy = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done before calling send(), after acquiring the mutex. Another request might have triggered the busy OOB so can't rely on the fact that this would be false.
if (_error) { | ||
_error = false; | ||
} | ||
if (_busy && _busy_timeout_reached) { | ||
tr_debug("returning NSAPI_ERROR_WOULD_BLOCK"); | ||
_busy = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary as each who is checking for busy must set this to false before making a request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise could follow the same pattern as with
nsapi_error_t ESP8266::connect(const char *ap, const char *passPhrase)
void ESP8266::oob_busy_timeout() | ||
{ | ||
tr_debug("oob_busy_timeout called"); | ||
if (_busy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary check because busy has occurred if this function gets called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we might have gotten "busy..", registered the callback and then managed to actually send the data, which would clear the _busy flag. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the lock is released in between?
|
||
void ESP8266Interface::_oob_busy_detected() | ||
{ | ||
_global_event_queue->call_in(ESP8266_OOB_BUSY_TIMEOUT_MS, callback(this, &ESP8266Interface::_oob_busy_timeout)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An event should be added to the queue only if we are certain that "busy s..." or "busy p..." has occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_oob_busy_detected is called from ESP8266::_oob_busy via a callback.
In the _oob_busy callback I can see only three way - "busy s...", "busy... p" or MBED_ERROR which would halt the system.
So I think the requirement is fulfilled?
@michalpasztamobica could you address the review comments please |
@michalpasztamobica, thank you for your changes. |
fb45c4e
to
0a66812
Compare
if (_busy && _busy_timeout_reached) { | ||
tr_debug("returning NSAPI_ERROR_WOULD_BLOCK"); | ||
_busy_timeout_reached = false; | ||
_smutex.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_timeout() is missing from here. Doesn't make a difference as the default and SEND-timeout are basically the same...
I'm still not really understanding the approach - can you look at my comments from a couple of hours ago? |
@@ -576,6 +578,7 @@ nsapi_error_t ESP8266::send(int id, const void *data, uint32_t amount) | |||
for (unsigned i = 0; i < 2; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should become unnecessary.
@@ -520,6 +522,11 @@ int ESP8266Interface::socket_send(void *handle, const void *data, unsigned size) | |||
|
|||
status = _esp.send(socket->id, data, size); | |||
|
|||
if (status == NSAPI_ERROR_WOULD_BLOCK) | |||
{ | |||
event(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be the function which gets thrown to the event queue
0a66812
to
f8ecba1
Compare
@VeijoPesonen could you double check the latest commits please |
f8ecba1
to
0e0cd95
Compare
And @kjbracey-arm if you like. |
CI started |
It seems that after the recent refactoring the fix is not doing what we wanted it to any more: I could previously see (with the complex callback implementation) that -3001 (WOULD_BLOCK) was returned, while now it is back to DEVICE_ERROR... I am not sure, but perhaps I introduced a bug with the recent changes? |
CI started |
@cmonr looks like CI needs to be stopped again :( |
Stopping both jobs... |
Yes, I am sorry, but the logs are showing that something is wrong now. |
@michalpasztamobica It's all good. Better to know sooner than after the jobs have completed 😄 |
d03f870
to
d6e385b
Compare
Where did you spot that? |
Test run: FAILEDSummary: 1 of 1 test jobs failed Failed test jobs:
|
Looking at the code, if I abort the parser it might trigger the return from recv etc. inside ESP8266::send, which would mean that the function is returning, but _busy flag wasn't set yet. not sure how the context would be switched. |
The abort isn't synchronous, it just sets a flag. It doesn't cause early return from your OOB handler. (It was a backwards-compatible alternative to giving the OOB handlers a return value - the parser checks to see if they called abort after they return). |
19:12:07 17:12:06.972 | D1 <-- DutThread: [01277][DBG ][ESPA]: returning WOULD_BLOCK Ok our software runs fine. It seems the flag setting order made a difference in the end... |
CI started |
@michalpasztamobica So is this working consistently for us for both older and latest versions of the driver ? |
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
@adbridge this fix is a part of the new mbed-os ESP8266 driver only. I did not apply this patch to the old mbed-os ESP8266 driver. Perhaps you meant the newer and older AT firmware version (1.3.0, 1.6.0, 1.7.0?) I think we should also agree on what "working" means. |
A couple of questions for future investigation:
|
@kjbracey-arm , you are right, after looking into the code I also do not see how reordering the two events made a difference. I assumed that there is some blocking operation awaiting a change of _abort variable, but it is not the case... I extracted the hashes used to run the test. Addressing the second point.
Then in the examples documentation I found this:
Looking at our driver, we should never run into a situation where we input more bytes than defined with +CIPSEND. We could truncate it to match the 2048B buffer, but never exceed it. @VeijoPesonen , please correct me if I am wrong. |
What documentation says and what the device actually does are not the same thing. Therefore I assume that “busy s” might be sometimes just a generic error code from device. |
Description
In case ESP8266 reports "busy" status we need to return WOULD_BLOCK error from the send function.
Pull request type