New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpu/esp: esp_now_netdev fixes and optimizations #10700
cpu/esp: esp_now_netdev fixes and optimizations #10700
Conversation
Simplifies the _recv receive function structure of esp_now_netdev according to the possible parameter values.
Newest version of API requires to drop the frame when if parameter len is smaller than the received packet.
Although it should not be possible to reenter the function esp_now_recv_cb, this is avoided by an additional boolean variable. It can not be realized by mutex because it would reenter from same thread context. If macro NDEBUG is not defined, an assertion is used.
Since the esp_now_recv_cb function is not called directly as an ISR through interrupts, it is not executed in the interrupt context. _recv can therefore be called directly. The treatment of the recv_event is no longer necessary.
Since it is not possible to reenter the function `esp_now_recv_cb`, and the functions netif::_ recv and esp_now_netdev::_ recv are called directly in the same thread context, neither a mutual exclusion has to be realized nor have the interrupts to be deactivated.
Since it is not possible to reenter the function `esp_now_recv_cb`, and the functions netif::_ recv and esp_now_netdev::_ recv are called directly in the same thread context we only need a buffer which contains one frame and its source mac address.
Since it is not possible to reenter the function `esp_now_recv_cb`, and the functions netif::_ recv and esp_now_netdev::_ recv are called directly in the same thread context we can read directly from the `buf` and don't need a receive buffer anymmore.
I don't have the hardware to test this :-/ |
@TimoRoth Sorry, I used the wrong issue by mistake when I was asking you to take a look to the changes for esp_now. |
I know, but it would be really great if you could take a look to the changes in @TimoRoth will test it. |
Timer restart was moved from esp_now_scan_peers_done to esp_now_scan_peers_start to avoid that the scan for peers isn't triggered anymore if the esp_now_scan_peers_done callback isn't called because of errors.
@TimoRoth Could you test the changes? I would like to make some progress with these changes since they increase the stability a lot. |
This comment has been minimized.
This comment has been minimized.
With master I get something a but different... node1 seems to transmit very fast. NODE0
NODE1
|
@MrKevinWeiss Strange, I have stress tested over hours, but I have never seen that the UART interface is affected. UART is realized in Hardware with a buffer oft 128 characters. |
I am not too worried about it, I am running things through a usb hub. Keep in mind it is a different board (node1=esp32-wrover-32). I will investigate! |
Yup, just my usb ports, switched to the ones directly connected to the cpu and have 0% packet loss with 1000 pings. The testing shows a noticeable improvement! 👍 |
@MrKevinWeiss Thanks for testing 😄 |
How do we continue with this PR? PR #9917 depends these changes. |
* If the NDEBUG macro is undefined, an assertion is used instead for | ||
* debugging purposes. | ||
*/ | ||
assert(!_in_recv_cb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems redundant in light of the following if
, or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added intentionally rather to see when this happens in stress tests during the development process than to suppress it silently.
Should I remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhm, I'see ... the assert will result in a crash, you could a DEBUG
message inside the if
below if you want to avoid the crash an just see when (and how often) this happens.
TL;DR: depends on you intention and use case: if you want to the app to crash first time this happens use assert
(so leave as is) otherwise better use DEBUG
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with a DEBUG message is that you would miss it in the set of other debug messages, especially during stress tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smlng I decided to remove it.
/* | ||
* Since we are not in the interrupt context, we do not have to pass | ||
* `NETDEV_EVENT_ISR` first. We can call the receive function directly. | ||
* But we have to unlock the mutex and enable interrupts before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments is from the time when a mutex was used. I forgot to change it it when I removed 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove then, afterwards this seems good to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. Thank you for the review.
I did a really quick test of the changes, and they indeed do increase the stability to the point where I was unable to make anything bad or unexpected happen. Looking forward to this and in turn the esp8266 PR being merged. |
assert was added intentionally for debugging purposes. For released version it is enough that function returns.
@TimoRoth Thanks for testing again 😄 |
@TimoRoth thx for reporting, with the tests by @MrKevinWeiss I'd say this is good to go then! |
Thank you all for your support. |
Contribution description
This PR provides a fix of problem 2 in issue #10682.
As described in #10682, a single ping without data was not possible anymore after heavy load with ping timeouts. The reason was that ISR events got lost and messed up the ringbuffer used in
esp_now_netdev
.This PR is rather an evolution than only a fix. All different steps are realized as separate commits which contain the following changes:
_recv
function ofesp_now_netdev
simplified according to the possible parameter values._recv
function ofesp_now_netdev
updated to drop the frame as documented in new API if the parameterlen
is smaller than the received frame.NETDEV_EVENT_ISR
events inesp_now_netdev
fixed.In further extensive stress tests the following could be figured out:
esp_now_recv_cb
is executed in the context of thewifi
thread (this was already known).wifi
thread.wifi
thread then processes the events sequentially and executes registered callback functions likeesp_now_recv_cb
asynchronously.NETDEV_EVENT_ISR
approach in GNRC.This results in the following facts:
esp_now_recv_cb
will not be reentered.esp_now_recv_cb
is executed in the context of the highest priority thread and is therefore not interrupted or superseded by other threads.These facts allow the following optimizations which are provided step by step as separate commits.
NETDEV_EVENT_ISR
events are not needed. The execution ofnetif::_recv
and thusesp_now_netdev::_recv
can be triggered directly withNETDEV_EVENT_RX_COMPLETE
on the receiption of a frame inesp_now_recv_cb
.esp_now_recv_cb
andesp_now_netdev::_recv
are executed sequentially in same thread context.esp_now_netdev::_recv
is executed immediately inesp_now_recv_cb
and in same thread context, there is no need for a ring buffer anymore.wifi
thread.Although function
esp_now_recv_cb
will not be reentered, it is secured by an additional boolean variable to avoid potential data inconsistencies. If the NDEBUG macro is undefined, an assertion is used instead for debugging purposes.Testing procedure
Use at least two or more ESP32 nodes and flash them with
examples/gnrc_networking
, e.g.:Use the
ping
command to ping from each node another node to produce heavy load, either with two nodesor with three nodes
A further test is bombarding one node from two other nodes:
Issues/PRs references
This PR fixes #10682 problem 2, the stucked gnrc buffer problem remains.
This PR is a follow-on PR of #10581, #10679, #10680.