Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening socket connections fails with invalid argument #2523

Closed
jamal opened this issue Sep 25, 2017 · 11 comments
Closed

Opening socket connections fails with invalid argument #2523

jamal opened this issue Sep 25, 2017 · 11 comments

Comments

@jamal
Copy link

jamal commented Sep 25, 2017

  • Your Windows build number: Microsoft Windows [Version 10.0.15063]

  • What you're doing and what's happening: Unable to open socket connection, receive "invalid argument" error. I first noticed this while running a Go service I was working on the night before which started failing to open a connection on a local port. I then found that executing anything that tried to open a socket connection would fail (such as telnet).

An example:

$ telnet google.com 80
Trying 172.217.3.206...
Trying 2607:f8b0:400a:809::200e...
telnet: Unable to connect to remote host: Invalid argument

and another from a Go example that simply calls net.Dial (code here https://gist.github.com/jamal/4c93c2ccf1b28c1bbe3e2eb935280c91):

$ go run dial.go
panic: dial tcp 172.217.3.206:80: connect: invalid argument

As I mentioned, this was working just fine the night before. The only thing I can think of is that this bash session has been running for quite some time which I hadn't done before (probably close to a week) inside of a tmux session.

@jamal
Copy link
Author

jamal commented Sep 25, 2017

Something odd I noticed as well. The bash window was running, but was completely frozen. I couldn't quit it, bash.exe was not listed in tasklist / task manager. And if I opened a new bash window it would just be black. I was using tmux inside WSL Terminal (https://github.com/goreliu/wsl-terminal) which may also be doing something quirky. After restarting, things worked as expected.

@therealkenc
Copy link
Collaborator

Your Windows build number: Microsoft Windows [Version 10.0.15063]

That's pretty old by WSL standards, so upgrade to Fall Creators if you can, and see if that kicks your system in the pants enough to resolve.

@kghost
Copy link

kghost commented Dec 23, 2017

I experienced exactly same problem today, All connect call return Invalid argument, even to 127.0.0.1.

And even worser, Not only WSL fails, Win32 side also fails.

As soon as I closed all wsl processes, the problem disappeared.

@sunilmut
Copy link
Member

sunilmut commented Jan 2, 2018

@kghost - We will need more information, with repro-steps and Windows version. See instructions.

@Tao-T
Copy link

Tao-T commented Feb 22, 2018

@sunilmut - It seems that WSL would fail to release bound ports when TCP connections in SYN state are refused.

>netstat -qn |findstr BOUND
TCP    127.0.0.1:12400        0.0.0.0:0              BOUND
TCP    [::1]:12400            [::]:0                 BOUND

Once some applications struggle to build those connections, the port number always increases. It just reaches 65535 before the global TCP connection error happens.

Win10 version: 10.0.16299.248
(similar phenomenon before I rolled back from Insider 17083, 17074 - w/o netstat -qn)

@sunilmut
Copy link
Member

@Tao-T - Do you have a repro that you can share?

@kghost
Copy link

kghost commented Feb 24, 2018

@sunilmut AFAIK, it can't be reproduced easily. Leave WSL instances opened for a long time may trigger the bug even w/o any WSL actions. I have encountered it a few times per month.

@Tao-T
Copy link

Tao-T commented Feb 24, 2018

@sunilmut - the python2 script below should repro it in several minutes (with default settings)

#!/usr/bin/env python
# repro global tcp error in WSL, tcp ports are not released if SYN failed

import sys, os
import time
import errno
import select
import socket

if os.environ.get('TEST_TCP6'):
    af = socket.AF_INET6
    phost = '::1'
else:
    af = socket.AF_INET
    phost = '127.0.0.1'

pport_start = 12400
if len(sys.argv)>1:
    pport_end = min(pport_start + int(sys.argv[1]), pport_start + 1000)
else:
    pport_end = pport_start + 120

target_addrs = [ (phost, x) for x in xrange(pport_start, pport_end) ]
poller = select.epoll()
mapping = {}

print 'peer ports [ %d, %d )' % (pport_start, pport_end)

reconns = target_addrs[:]

t0 = time.time()
t_pre = t0
n = 0
while 1:
    if len(reconns) == 0 and time.time() - t_pre > 3.0:
        print "test done: empty reconns list last long"
        exit()
    for paddr in reconns:
        sock = socket.socket(af)
        sock.setblocking(False)
        mapping[sock.fileno()] = (paddr, sock)
        try:
            print "try conn", paddr
            n+=1
            sock.connect(paddr)
        except socket.error as e:
            if e.errno == errno.EINPROGRESS:
                poller.register(sock.fileno(), select.EPOLLOUT|select.EPOLLWRNORM)
            else:
                sys.stdout.write('\naddr {}: e {} {} - counter {} tspan {}\n'.format(paddr, errno.errorcode[e.errno], e.strerror, n, time.time() - t0))
                exit()

        else:
            print paddr, "direct done"
        print 'sock', sock.getsockname()
    reconns[:] = []
    rpolls = poller.poll()
    for fd,evs in rpolls:
        paddr, sock = mapping[fd]
        if evs & (select.EPOLLHUP |select.EPOLLERR):
            print paddr, "hangup evs", evs
            poller.unregister(fd)
            sock.close()
            reconns.append(paddr)
            del mapping[fd]
        elif evs & (select.EPOLLOUT|select.EPOLLWRNORM):
            print paddr, "connected"
            poller.unregister(fd)
        else:
            sys.stdout.write('\naddr {}:unhandled ev {} - counter {}\n'.format(paddr, evs, n))
            exit()
    if len(rpolls):
        t_pre = time.time()

@rockdaboot
Copy link

Same problem with a C program using non-blocking sockets.

The first call to send() (after connect()) returns error with errno set to 32 (Broken Pipe). All known *nix have errno set to either ENOTCONN, EINPROGRESS or EAGAIN. A retry after poll / select then succeeds.

This behavior is unexpected and breaks existing software.

BTW, when using TCP Fast Open (old Linux style: sendto() instead of connect(), the sendto() returns error with errno set to 22 (invalid argument). In this case it indicates a real problem - looks like TFO is not or differently implemented.

@sunilmut
Copy link
Member

sunilmut commented Apr 9, 2018

@rockdaboot - Looks like what you are seeing is this

Copy link
Contributor

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants