Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX sockets + lwIP: bad file descriptor #11212

Closed
pipex opened this issue Mar 19, 2019 · 12 comments · Fixed by #11576
Closed

POSIX sockets + lwIP: bad file descriptor #11212

pipex opened this issue Mar 19, 2019 · 12 comments · Fixed by #11576
Assignees
Labels
Area: network Area: Networking Area: pkg Area: External package ports Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@pipex
Copy link

pipex commented Mar 19, 2019

Description

I've been testing the POSIX TCP/IP implementation in RIOT and I've come up with a few problems for the same code depending on the platform.

For testing I've created a simple echo server/client combining the posix sockets example and the LWIP TCP example. I tested the code on linux and it runs without issue (see the branch echo-v2-linux).

Steps to reproduce the issue

Testing the code on different platforms I get the following

  • On native
$  dist/tools/tapsetup/tapsetup -c 2
$ make all 

# For tap0 
$ PORT=tap0 make term
> ifconfig 
ET_00:  inet6 fe80::d864:daff:fed3:909
> echo server start 1234
Success: started TCP server on port 1234

# For tap1
$ PORT=tap0 make term
> echo send fe80::d864:daff:fed3:909 1234 hello
Success: connected to TCP server on port 1234
client send: Bad file descriptor

# If I enter the wrong port
> echo send fe80::d864:daff:fed3:909 12345 hello
Success: connected to TCP server on port 12345
client send: Bad file descriptor
  • On samr21-xpro
# For both boards
$ BOARD=samr21-xpro make all flash

# On board 1
$ BOARD=samr21-xpro PORT=<tty1> make term
> echo server start
INFO # Success: started TCP server on port 1234

# On board2
> echo send <addr> 1234 hello
INFO # Error in connect: Invalid argument
INFO # errno is unknown error: 22

# Same error with wrong port
  • iotlab-m3 I get the same issue that with samr21-xpro

I figured that the error 22 was the result of a an uninitialized netif on the call to _sockaddr_to_ep inside _bind_connect leading to a NULL return in _netif_to_bind_addr

However if y change the following code

diff --git a/sys/posix/sockets/posix_sockets.c b/sys/posix/sockets/posix_sockets.c
index ae7195b32..8c7f16708 100644
--- a/sys/posix/sockets/posix_sockets.c
+++ b/sys/posix/sockets/posix_sockets.c
@@ -222,6 +222,7 @@ static int _sockaddr_to_ep(const struct sockaddr *address, socklen_t address_len
             }
             struct sockaddr_in *in_addr = (struct sockaddr_in *)address;
             out->family = AF_INET;
+            out->netif = SOCK_ADDR_ANY_NETIF;
             out->addr.ipv4_u32 = in_addr->sin_addr.s_addr;
             out->port = ntohs(in_addr->sin_port);
             break;
@@ -233,6 +234,7 @@ static int _sockaddr_to_ep(const struct sockaddr *address, socklen_t address_len
             }
             struct sockaddr_in6 *in6_addr = (struct sockaddr_in6 *)address;
             out->family = AF_INET6;
+            out->netif = SOCK_ADDR_ANY_NETIF;
             memcpy(&out->addr.ipv6, &in6_addr->sin6_addr, sizeof(out->addr.ipv6));
             out->port = ntohs(in6_addr->sin6_port);
             break;

I get the "Bad file number" message on iotlab-m3 and a hard fault on samr21-xpro.

Am I missing something obvious? I still don't have such a good grasp of the lwip API to see what is wrong with the code.

Expected results

Similar to the result on echo-v2-linux

# Server
$ gcc echo.c main.c 
$ ./a.out server start 12345
Success: started TCP server on port 12345
TCP Client [::1]: 42963 connected
Received TCP data from client [::1]:42963:
Received: "hello"
Received TCP data from client [::1]:42963:
Disconnected
TCP connection to [::1]:42963 reset, starting to accept again

# Client 
$ ./a.out send ::1 12345 hello
Success: connected to TCP server on port 12345
hello

Actual results

Bad file descriptor on iotlab-m3 and native, and hardfault on samr21-xpro.

Versions

Operating System Environment
-----------------------------
       Operating System: Mac OS X 10.14.3
                 Kernel: Darwin 18.2.0 x86_64 i386

Installed compiler toolchains
-----------------------------
             native gcc: Apple LLVM version 10.0.0 (clang-1000.10.44.4)
      arm-none-eabi-gcc: arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 8-2018-q4-major) 8.2.1 20181213 (release) [gcc-8-branch revision 267074]
                avr-gcc: missing
       mips-mti-elf-gcc: missing
             msp430-gcc: missing
   riscv-none-embed-gcc: missing
   xtensa-esp32-elf-gcc: missing
   xtensa-lx106-elf-gcc: missing
                  clang: Apple LLVM version 10.0.0 (clang-1000.10.44.4)

Installed compiler libs
-----------------------
   arm-none-eabi-newlib: "3.0.0"
    mips-mti-elf-newlib: missing
riscv-none-embed-newlib: missing
xtensa-esp32-elf-newlib: missing
xtensa-lx106-elf-newlib: missing
               avr-libc: missing (missing)

Installed development tools
---------------------------
                  cmake: missing
               cppcheck: missing
                doxygen: missing
                 flake8: missing
                    git: git version 2.17.2 (Apple Git-113)
                   make: GNU Make 3.81
                openocd: Open On-Chip Debugger 0.10.0+dev-00730-ge2430759 (2019-03-06-17:21)
                 python: Python 3.7.2
                python2: missing
                python3: Python 3.7.2
             coccinelle: missing
@miri64 miri64 self-assigned this Mar 20, 2019
@miri64 miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking Area: pkg Area: External package ports labels Mar 20, 2019
@miri64
Copy link
Member

miri64 commented Mar 20, 2019

I'll have a look at this in the next few days.

@miri64
Copy link
Member

miri64 commented May 23, 2019

Sorry for the delay... :-( I'm not really understanding what the problem is here. You say on samr21-xpro and that you have the same results on iotlab-m3 and native in "Actual results" but your initial description seems to hint at a different error?

Is the problem that the error numbers have different values on different platforms? Since those are provided by the libc we don't have any influence on that. I recommend to use the defines to identify them.

@pipex
Copy link
Author

pipex commented May 24, 2019

Hi Martine. Yes, you're right that the description is unclear.

What I meant is that I get the same results on samr21-xpro and iotlab-m3 BEFORE performing the following change (i think it's a bug)

diff --git a/sys/posix/sockets/posix_sockets.c b/sys/posix/sockets/posix_sockets.c
index ae7195b32..8c7f16708 100644
--- a/sys/posix/sockets/posix_sockets.c
+++ b/sys/posix/sockets/posix_sockets.c
@@ -222,6 +222,7 @@ static int _sockaddr_to_ep(const struct sockaddr *address, socklen_t address_len
             }
             struct sockaddr_in *in_addr = (struct sockaddr_in *)address;
             out->family = AF_INET;
+            out->netif = SOCK_ADDR_ANY_NETIF;
             out->addr.ipv4_u32 = in_addr->sin_addr.s_addr;
             out->port = ntohs(in_addr->sin_port);
             break;
@@ -233,6 +234,7 @@ static int _sockaddr_to_ep(const struct sockaddr *address, socklen_t address_len
             }
             struct sockaddr_in6 *in6_addr = (struct sockaddr_in6 *)address;
             out->family = AF_INET6;
+            out->netif = SOCK_ADDR_ANY_NETIF;
             memcpy(&out->addr.ipv6, &in6_addr->sin6_addr, sizeof(out->addr.ipv6));
             out->port = ntohs(in6_addr->sin6_port);
             break;

AFTER the change, I get a different error: "Bad file descriptor" on iotlab-m3 and native, and "hardfault" on samr21-xpro.

@miri64
Copy link
Member

miri64 commented May 24, 2019

Ah. Indeed this is a bug and should be fixed. I'll try to find out, why the samr21-xpro is crashing.

@miri64 miri64 changed the title POSIX TCP API: bad file descriptor POSIX sockets + lwIP: bad file descriptor May 24, 2019
@miri64
Copy link
Member

miri64 commented May 24, 2019

22 is EINVAL in newlib btw

@miri64
Copy link
Member

miri64 commented May 24, 2019

I'll try to find out, why the samr21-xpro is crashing.

The stack is to small for all the IP stacked on IP. If I patch your Makefile

diff --git a/src/Makefile b/src/Makefile
index e36691b..2784a3a 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -59,6 +59,7 @@ endif
 
 CFLAGS += -DSO_REUSE
 CFLAGS += -DLWIP_SO_RCVTIMEO
+CFLAGS += -DTHREAD_STACKSIZE_MAIN="(2*THREAD_STACKSIZE_DEFAULT)"
 #CFLAGS += -DLWIP_SOCK_TCP_ACCEPT_TIMEOUT=500
 #CFLAGS += -DPOSIX_SETSOCKOPT

I get a bad file descriptor for samr21-xpro as well.

@miri64
Copy link
Member

miri64 commented May 24, 2019

For the EINVAL bug there is now #11575 (I decided to use memset() instead to make it more future proof; I don't expect struct _sock_tl_ep to change, but who knows ;-)).

@miri64
Copy link
Member

miri64 commented May 24, 2019

So, now for the question, why write() returns EBADF in the first place.

As you may have noticed, that socket() uses the vfs module to create the file descriptor for the socket

int fd = vfs_bind(VFS_ANY_FD, 0, &socket_ops, s);

For write it basically just wraps around send().

static inline ssize_t socket_write(vfs_file_t *filp, const void *buf, size_t n)
{
return socket_sendto(filp->private_data.ptr, buf, n, 0, NULL, 0);
}
static const vfs_file_ops_t socket_ops = {
.close = socket_close,
.fcntl = NULL, /* TODO: provide when needed */
.fstat = socket_fstat,
.lseek = socket_lseek,
.read = socket_read,
.write = socket_write,
};

edit: somehow github excepted this comment too early, will continue in separate comment

@miri64
Copy link
Member

miri64 commented May 24, 2019

Appearently the flags for this check

RIOT/sys/vfs/vfs.c

Lines 325 to 328 in f6ee0ac

if (((filp->flags & O_ACCMODE) != O_WRONLY) & ((filp->flags & O_ACCMODE) != O_RDWR)) {
/* File not open for writing */
return -EBADF;
}

are not set accordingly.

@pipex
Copy link
Author

pipex commented May 24, 2019

That's great, thanks for your help. I don't think I could've solved it myself.

I'll patch the Makefile and wait for the PRs to be merged.

Thanks again!

@miri64
Copy link
Member

miri64 commented Aug 2, 2019

#11575 still needs to be merged for this issue to be considered fixed

@miri64 miri64 reopened this Aug 2, 2019
@miri64
Copy link
Member

miri64 commented Jul 4, 2020

#11575 was merged.

@miri64 miri64 closed this as completed Jul 4, 2020
@miri64 miri64 added this to the Release 2020.07 milestone Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Area: pkg Area: External package ports Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants