Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

event loop loop forever: make lots of sock_connec at the same time #378

Closed
Catstyle opened this issue Jan 9, 2021 · 2 comments · Fixed by #391
Closed

event loop loop forever: make lots of sock_connec at the same time #378

Catstyle opened this issue Jan 9, 2021 · 2 comments · Fixed by #391

Comments

@Catstyle
Copy link

Catstyle commented Jan 9, 2021

  • uvloop version: 0.14.0
  • Python version: 3.8.3
  • Platform: ubuntu 20.04 on wsl2
  • Can you reproduce the bug with PYTHONASYNCIODEBUG in env?: yes
  • Does uvloop behave differently from vanilla asyncio? How?: yes, asyncio works well

server side: accept connections and do nothing, just close it after a few seconds, simulating heartbeat timeout
client side: connect to server side a lots in the same time, e.g. -c 5000

it succeeded the first time, failed the second time
i cannot even terminate it, it loop forever, all i can do is to kill -9
traceback logs:

01-09 17:00:36.379 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.379 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.379 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.379 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.380 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.380 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.380 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.387 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.387 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
01-09 17:00:36.387 ERROR [pid|7281] Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state

after some checks, seems like the failed sock did not remove the writer correctly(or not in time?)
fut's done_callback is scheduled after the first time loop._sock_connect_cb invoked
at that time related writer is not removed and the socket fd is reused under heavy pressure?
then the same callback is invoked but for different fd?

checked asyncio, it check fut.done(), and uvloop check fut.cancelled()

@Catstyle Catstyle changed the title event loop event loop loop forever: make lots of sock_connec at the same time Jan 9, 2021
@fantix fantix added this to Triage Needed in Fantix's Sorting Board Jan 20, 2021
@fantix
Copy link
Member

fantix commented Jan 21, 2021

Hey, thanks for the report! But please provide a reproducible sample code. I've tried several batches of 10k concurrent connections on the same server but cannot reproduce the issue.

@fantix fantix moved this from Triage Needed to Question, Discussion & Others in Fantix's Sorting Board Jan 21, 2021
@Catstyle
Copy link
Author

finally got some time
i wrote some naive codes, just to illustrate the issue

server.py

import asyncio
import socket

import uvloop
uvloop.install()

count = 0
backlog = 1024


async def connection_made(sock, loop):
    global count
    count += 1
    print(f'{loop.time()}: {count}, {sock} connected')
    await asyncio.sleep(1000)


def accept(sock):
    loop = asyncio.get_running_loop()
    for _ in range(backlog):
        try:
            conn, addr = sock.accept()
        except (BlockingIOError, InterruptedError, ConnectionAbortedError):
            return
        conn.setblocking(False)
        asyncio.create_task(connection_made(conn, loop))


async def main():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setblocking(False)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
    sock.bind(('localhost', 56789))
    sock.listen(backlog)

    loop = asyncio.get_running_loop()
    loop.add_reader(sock.fileno(), accept, sock)

    await asyncio.sleep(1000)


if __name__ == '__main__':
    asyncio.run(main())

client.py

import asyncio
import socket
import sys

import uvloop
uvloop.install()

count = 0


async def connect(addr, loop):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setblocking(False)
    await loop.sock_connect(sock, addr)
    return sock


async def wrapper(loop):
    while 1:
        try:
            sock = await connect(('localhost', 56789), loop)
        except BlockingIOError:
            await asyncio.sleep(0)
        else:
            break

    global count
    count += 1
    print(f'{loop.time()}: {count}, {sock} connected')
    await asyncio.sleep(1000)


async def main(concurrency):
    tasks = []
    loop = asyncio.get_running_loop()
    for x in range(concurrency):
        tasks.append(asyncio.create_task(wrapper(loop)))

    await asyncio.gather(*tasks)


if __name__ == '__main__':
    # python client.py 1000
    asyncio.run(main(int(sys.argv[1])))

concurrency = 1000, works well

267544.168: 996, <socket.socket fd=315, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 48500), raddr=('127.0.0.1', 56789)> connected
267544.168: 997, <socket.socket fd=314, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 48498), raddr=('127.0.0.1', 56789)> connected
267544.168: 998, <socket.socket fd=313, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 48496), raddr=('127.0.0.1', 56789)> connected
267544.168: 999, <socket.socket fd=311, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 48494), raddr=('127.0.0.1', 56789)> connected
267544.168: 1000, <socket.socket fd=312, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 48492), raddr=('127.0.0.1', 56789)> connected

concurrency = 5000, oops

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 82, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 1072, in uvloop.loop.Loop._sock_connect_cb
asyncio.exceptions.InvalidStateError: invalid state
Exception in callback Loop._sock_connect
handle: <Handle Loop._sock_connect>
Traceba^Z
[1]+  Stopped                 python client.py 5000

using vanilla asyncio, works well

267761.563129303: 4996, <socket.socket fd=3588, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 38836), raddr=('127.0.0.1', 56789)> connected
267761.563196003: 4997, <socket.socket fd=3592, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 38838), raddr=('127.0.0.1', 56789)> connected
267761.563289903: 4998, <socket.socket fd=3598, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 38840), raddr=('127.0.0.1', 56789)> connected
267761.563356603: 4999, <socket.socket fd=3604, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 38842), raddr=('127.0.0.1', 56789)> connected
267761.563436503: 5000, <socket.socket fd=3602, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 38844), raddr=('127.0.0.1', 56789)> connected

some kernel config
net.core.netdev_max_backlog = 1000
net.ipv4.tcp_max_syn_backlog = 512

@fantix fantix moved this from Question, Discussion & Others to Triage Needed in Fantix's Sorting Board Jan 27, 2021
fantix added a commit to fantix/uvloop that referenced this issue Feb 8, 2021
CPython fixed the same issue in python/cpython#10419. Seems like under
pressure, more write callbacks may happen before _remove_writer() is
called, so we should check for done().

Fixes MagicStack#378
fantix added a commit to fantix/uvloop that referenced this issue Feb 9, 2021
CPython fixed the same issue in python/cpython#10419. Seems like under
pressure, more write callbacks may happen before _remove_writer() is
called, so we should check for done().

Fixes MagicStack#378
fantix added a commit to fantix/uvloop that referenced this issue Feb 9, 2021
Use _SyncSocketWriterFuture for cancellation, so that we could remove
the writer directly in the callback.

Fixes MagicStack#378
Closes MagicStack#389
fantix added a commit to fantix/uvloop that referenced this issue Feb 9, 2021
Use _SyncSocketWriterFuture for cancellation, so that we could remove
the writer directly in the callback.

Fixes MagicStack#378
Closes MagicStack#389
fantix added a commit that referenced this issue Feb 9, 2021
Use _SyncSocketWriterFuture for cancellation, so that we could remove
the writer directly in the callback.

Fixes #378
Closes #389
@fantix fantix removed this from Triage Needed in Fantix's Sorting Board Feb 9, 2021
This was referenced Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants