Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/esp32: lwIP netdev #11946

Merged
merged 4 commits into from Aug 7, 2019
Merged

Conversation

gschorcht
Copy link
Contributor

@gschorcht gschorcht commented Aug 1, 2019

Contribution description

The changes in this PR allow to use an esp_wifi network device of ESP32 with lwIP.

Testing procedure

Compile tests/lwip and flash an ESP32 node with:

USEMODULE=esp_wifi CFLAGS='-DESP_WIFI_SSID=\"<your_ssid>\" -DESP_WIFI_PASS=\"<your_passphrase>\"' make BOARD=esp32-wroom-32 -C tests/lwip flash

Start netcat on a Linux machine in same LAN:

nc -6l 12345

Use a terminal to connect to the ESP32 node and use the following commands

> tcp connect fe80::c73e:f07d:d95d:ebf9 12345
> tcp send 00
Success: send 1 byte over TCP to server
> tcp disconnect

where the address should be replaced with the address of your Linux machine.

Issues/PRs references

Fixes #11910
Depends on PR #11964
Depends on PR #11967

@gschorcht gschorcht requested a review from miri64 August 1, 2019 05:53
@gschorcht gschorcht added Area: network Area: Networking Platform: ESP Platform: This PR/issue effects ESP-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) labels Aug 1, 2019
Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise it looks good, however, I don't have the hardware to test. @MichelRottleuthner can you have a look?

@miri64 miri64 added Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines and removed Reviewed: 3-testing The PR was tested according to the maintainer guidelines labels Aug 1, 2019
@MichelRottleuthner
Copy link
Contributor

Tried it here and compared to master it works better - but something seems to be broken.
I.e. instead of (master):

tcp connect <pc_lladdr> <port>
Error: unable to connect

I can do the following:

tcp connect <pc_lladdr> <port>
tcp send 313233
Success: send 3 byte over TCP to server

On the PC I do the following:

socat -dd TCP6-LISTEN:12345 stdout
2019/08/01 15:27:24 socat[30856] N listening on AF=10 [0000:0000:0000:0000:0000:0000:0000:0000]:12345
2019/08/01 15:31:37 socat[30856] N accepting connection from AF=10 [fe80:0000:0000:0000:32ae:a4ff:fed3:35ac]:49153 on AF=10 [fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx]:12345
2019/08/01 15:31:37 socat[30856] N using stdout for reading and writing
2019/08/01 15:31:37 socat[30856] N starting data transfer loop with FDs [6,6] and [1,1]
1232019/08/01 15:32:33 socat[30856] W read(6, 0x55c562462680, 8192): Connection reset by peer
2019/08/01 15:32:33 socat[30856] N socket 1 to socket 2 is in error
2019/08/01 15:32:33 socat[30856] N socket 1 (fd 6) is at EOF
2019/08/01 15:32:33 socat[30856] N exiting with status 0

With wireshark I can see that there are a lot of (5000) TCP Dup ACK and Spurious Retransmissions before socat reports the error. UDP seems to work ok.

(I used socat instead of netcat didn't work at all for me, but that could be because of the many different netcat versions out there)

@miri64
Copy link
Member

miri64 commented Aug 1, 2019

This seems to be rather an lwIP problem than a problem with the device integration. However if you see the packets that means the device integration works, no?

@miri64
Copy link
Member

miri64 commented Aug 1, 2019

Tried it here and compared to master it works better - but something seems to be broken.
I.e. instead of (master):

No surprise there ^^ device support is always better than no device support ;-)

@MichelRottleuthner
Copy link
Contributor

@miri64 I agree. Just wanted to point out what my results were. And I prefer to say if something was different to it just works(TM). Also I lack the expertise on our lwip integration to judge if this actually comes from lwip or the device integration^^

@MichelRottleuthner MichelRottleuthner added the Reviewed: 3-testing The PR was tested according to the maintainer guidelines label Aug 1, 2019
Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

@miri64
Copy link
Member

miri64 commented Aug 1, 2019

Please squash @gschorcht

@miri64 miri64 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Aug 1, 2019
@gschorcht
Copy link
Contributor Author

I can observe these duplicate ACKs too. Strange, the ACK repeated a thousand times is not the ACK for the data packet, but the last ACK from the three-way handshake that originated from the initiating RIOT node.

@gschorcht
Copy link
Contributor Author

Hm, I tried it with native and a tap device. It does seem to have the problem with the duplicate ACKs. It works exactly as exepcted. Maybe, the problem is even related to the esp_wifi netdev.

@gschorcht
Copy link
Contributor Author

@miri64 @MichelRottleuthner After investigating the problem of duplicate ACKs for two days, I finally found the cause. In fact, the problem was the esp_wifi network device driver. Function _esp_wifi_send did not return the right value. The problem was so obvious 😎 and easy to fix.

I fixed it with commit 7941149. Now, lwIP is working with esp_wifi as expected. Only one ACK, no packet retransmission and so on.

If you prefer, I could provide the fix within a separate PR and could rebase this PR to it.

BTW, when I was investigating the problem, I realized that there are still other problems in esp_wifi, e.g., in NETDEV_EVENT_ISR handling that might lead to complete blocking under very special conditions. Therefore, I've already refactored the implementation of esp_wifi in the meantime. The new version will be very similar to the ESP8266 implementation and would allow further code deduplication. I will open a PR in the next few days.

@gschorcht
Copy link
Contributor Author

@MichelRottleuthner Thank you for figuring out the problem with the duplicate ACKs. It was a very useful. May I ask you to test it again with the fix?

@miri64 miri64 self-assigned this Aug 5, 2019
@MichelRottleuthner
Copy link
Contributor

I tested again. No more duplicate acks and now a graceful disconnect after sending data and executing tcp disconnect.

If you prefer, I could provide the fix within a separate PR and could rebase this PR to it.

I don't mind but I'd prefer to have the fix in a separate commit so it is not completely mixed with the lwip related netdev stuff.

@gschorcht gschorcht added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed State: waiting for other PR State: The PR requires another PR to be merged first CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Aug 6, 2019
@gschorcht
Copy link
Contributor Author

gschorcht commented Aug 6, 2019

@miri64 After fixing and cleaning up cpu/esp32/Makefile.include (Thanks to @MichelRottleuthner for testing), the light for the rebased and squashed version of this PR green now.

@@ -122,6 +122,10 @@ ifeq ($(QEMU), 1)
CFLAGS += -DQEMU
endif

ifneq (,$(filter lwip,$(USEMODULE)))
CFLAGS += -DTCPIP_THREAD_PRIO=2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is required, then this shouldn't be hidden in the platform code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is platform specific. By default the lwIP's tcpip thread has a priority of 1. However, in ESP32, the thread that handles hardware interrupts from the WiFi interface has also a priority of 1. From my point of view, the hardware driver should have a higher priority than the TCP/IP thread to be sure that hardware is always handled in time.

I figured it out and changed it when I was investigating the problem with the duplicate ACKs. This change was already covered by the test by @MichelRottleuthner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no doubts that it works, I just fear that it might cause problems later on, with people not expecting that something related to an external package is configured within a cpu Makefile. How about flipping it around and putting it in pkg/lwip/Makefile.include? Including some explanation why it is required.

@@ -122,6 +122,10 @@ ifeq ($(QEMU), 1)
CFLAGS += -DQEMU
endif

ifneq (,$(filter lwip,$(USEMODULE)))
CFLAGS += -DTCPIP_THREAD_PRIO=2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no doubts that it works, I just fear that it might cause problems later on, with people not expecting that something related to an external package is configured within a cpu Makefile. How about flipping it around and putting it in pkg/lwip/Makefile.include? Including some explanation why it is required.

@gschorcht
Copy link
Contributor Author

Do you think it is better to place ESP32 specific make code in pkg/lwip/Makefile.include? Hm, I do not feel very comfortable with that.

@miri64
Copy link
Member

miri64 commented Aug 6, 2019

Do you think it is better to place ESP32 specific make code in pkg/lwip/Makefile.include? Hm, I do not feel very comfortable with that.

This is not unheard of: nimble e.g. pulls in architecture dependent modules in its Makefile:

ifeq (nrf52,$(CPU_FAM))
USEMODULE += nimble_drivers_nrf52
endif

Both approaches have an ugly side for sure, but I feel far more comfortable if we add a platform-specific config to an external package's configuration than configuring a platform for an external package.

@gschorcht
Copy link
Contributor Author

gschorcht commented Aug 6, 2019

Maybe, a better place could be here


where the overridden default definitions reside in lwIP

+#ifdef CPU_ESP32
+#define TCPIP_THREAD_PRIO   2
+#endif

@miri64
Copy link
Member

miri64 commented Aug 6, 2019

Maybe, a better place could be here

where the overridden default definitions reside in lwIP

+#ifdef CPU_ESP32
+#define TCPIP_THREAD_PRIO   2
+#endif

Right. With lwip we have this ability. It still follows my argument about config semantics, so you may decide which of the two approaches (pkg/lwip/Makefile or pkg/lwip/include/lwipopts.h) you prefer.

@miri64
Copy link
Member

miri64 commented Aug 6, 2019

In either case, don't forget documenting the "why" ;-)

The option value length of Ethernet addresses can be more than 6 byte in lwIP. Therefore, the max_len parameter is check to be greater than or equal to ETHERNET_ADDR_LEN.
To be able to access the single esp_wifi network device from lwIP adaptation layer, static keyword was removed from esp_wifi_dev variable.
The changes allow to use an esp_wifi network device of ESP32 with lwIP.
@gschorcht gschorcht force-pushed the cpu/esp32/lwip_netdev branch 2 times, most recently from 5767797 to d27dfb3 Compare August 6, 2019 17:22
@miri64
Copy link
Member

miri64 commented Aug 6, 2019

Thanks! Please squash 5610802, b40a119, and d27dfb3 into one commit.

 To avoid priority conflicts with the WiFi hardware driver thread which has priority of 1, the default thread priority of lwIP's TCP/IP thread is decreased to 2.
@gschorcht
Copy link
Contributor Author

gschorcht commented Aug 7, 2019

Squashed. Unfortunately, Travis failed due to connection errors to github.com. How can we trigger Travis restart the static tests?

@miri64
Copy link
Member

miri64 commented Aug 7, 2019

Squashed. Unfortunately, Travis failed due to connection errors to github.com. How can we trigger Travis restart the static tests?

Travis sems fine so did you restart yourself? You can do that by logging in with your GitHub account via OAuth. Maintainership should grant you permission to hit the restart button then.

Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Code looks fine and was tested by @MichelRottleuthner.

@miri64 miri64 merged commit 1dc6bde into RIOT-OS:master Aug 7, 2019
@gschorcht
Copy link
Contributor Author

@miri64 @MichelRottleuthner Thanks for reviewing and testing it.

@gschorcht gschorcht deleted the cpu/esp32/lwip_netdev branch August 7, 2019 06:16
@gschorcht gschorcht restored the cpu/esp32/lwip_netdev branch September 12, 2019 16:37
@kb2ma kb2ma added this to the Release 2019.10 milestone Sep 16, 2019
@gschorcht gschorcht deleted the cpu/esp32/lwip_netdev branch October 19, 2019 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ESP Platform: This PR/issue effects ESP-based platforms Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TCP lwIP Error Connecting Sock Problem on ESP32
4 participants