ensure we don't loop trying to write to a channel thats not connected (fix 100% CPU) #419

djay · 2023-09-11T17:05:51Z

fixes #418

This fixes a race condition where a quick forced socket close will make getpeername to fail and connected == False. If the request then has an error close_after_flush is set and this will result in a 100% CPU loop.
This wiil ensure the channel is still closed in this case.

TODO:

add some more specific tests for double close
test for closing a error request thats not connected
test for maintainance not cleaning up an unconnected task
change the fix to only prevent write specifically when already closed, not when not connected
undo fix that closes if getpeername fails.

djay · 2023-09-19T03:55:42Z

src/waitress/channel.py


    def handle_write(self):
        # Precondition: there's data in the out buffer to be sent, or
        # there's a pending will_close request

-        if not self.connected:
+        if not self.connected and not self.close_when_flushed:


Not sure if we should also prevent a return if total_outbufs_len > 0 as this could also be a case where you want to write to get an error and close the channel

@digitalresistor thoughts?

djay · 2023-09-19T03:58:10Z

src/waitress/channel.py

@@ -67,6 +67,9 @@ def __init__(self, server, sock, addr, adj, map=None):
        self.outbuf_lock = threading.Condition()

        wasyncore.dispatcher.__init__(self, sock, map=map)
+        if not self.connected:
+            # Sometimes can be closed quickly and getpeername fails.
+            self.handle_close()


@digitalresistor @d-mauer I'm still not sure on this fix. I think I read somewhere how windows can sometimes fail on getpeername?
The other fix will still prevent the looping bug by letting it write and error out. This one will close it before it wastes the app time if indeed the connection really is closed

If getpeername() fails on Windows then it would get self.connected set to False anyway, this would cause the bug. So trying to keep going after getpeername() failed is not sustainable.

connect = False doesn't cause the bug alone. It also needs the request to be malformed as that prevents both reading and writing and maintainance from cleaning it up. So in someways the real bug is in handle_write

if not self.connected: return

The test i put in shows that in the most likely scenario that makes this occur it's trying to close the channel but is prevented from doing so by the above line.

The more I think about it @d-maurer is correct that this should be changed to self.closed or something that explicitly prevents a close from happening twice. Thats the safest minimal change.

ensure we don't keep trying to write to a channel thats not connected

2c82632

djay mentioned this pull request Sep 11, 2023

100% cpu in mainthread due to not closing properly? (channel.connected == False) #418

Open

djay changed the title ~~ensure we don't keep trying to write to a channel thats not connected~~ ensure we don't loop trying to write to a channel thats not connected Sep 12, 2023

djay added 2 commits September 12, 2023 14:02

add test

26c3855

use writable to avoid loop instead

1b9e35c

djay changed the title ~~ensure we don't loop trying to write to a channel thats not connected~~ ensure we don't loop trying to write to a channel thats not connected (fix 100% CPU) Sep 13, 2023

djay added 12 commits September 14, 2023 10:58

add test for loop due to getpearname failing

04056ab

make test shutdown right after accept

5130bec

try keepalive to make connection close quickly

ea3cb3a

simplify test

a71f5c7

reproduce but with 0 sleep.

deea18a

reproduces loop but doesn't explain how data got written

f95d396

reproduce loop with send continue but fake EWOULDBLOCK

fb5f7e9

reproduce loop with request error quick close.

23e7b05

comment out parts of test not used

9c56472

fix looping

97620b3

remove test that didn't work

a3fda8c

clean up test and make pass

73a25c9

djay commented Sep 19, 2023

View reviewed changes

djay added 2 commits September 19, 2023 13:58

fix comments

f078e45

also fix maintenance not cleaning up broken channel

ae1bf12

djay marked this pull request as draft September 25, 2023 04:37

djay added 4 commits February 5, 2024 10:51

change fix to mark when closed and always end loop

fcb35a4

Merge branch 'main' into write_when_not_connected_loop

6ecfe4d

add git ignore

cb1b196

add extra tests to show the loop behaviour

b33848b

djay marked this pull request as ready for review February 5, 2024 06:45

digitalresistor mentioned this pull request Mar 3, 2024

Remove race condition when creating new HTTPChannel #435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure we don't loop trying to write to a channel thats not connected (fix 100% CPU) #419

ensure we don't loop trying to write to a channel thats not connected (fix 100% CPU) #419

djay commented Sep 11, 2023 •

edited

djay Sep 19, 2023

djay Sep 19, 2023

djay Sep 19, 2023 •

edited

digitalresistor Sep 19, 2023

djay Sep 21, 2023

ensure we don't loop trying to write to a channel thats not connected (fix 100% CPU) #419

Are you sure you want to change the base?

ensure we don't loop trying to write to a channel thats not connected (fix 100% CPU) #419

Conversation

djay commented Sep 11, 2023 • edited

djay Sep 19, 2023

Choose a reason for hiding this comment

djay Sep 19, 2023

Choose a reason for hiding this comment

djay Sep 19, 2023 • edited

Choose a reason for hiding this comment

digitalresistor Sep 19, 2023

Choose a reason for hiding this comment

djay Sep 21, 2023

Choose a reason for hiding this comment

djay commented Sep 11, 2023 •

edited

djay Sep 19, 2023 •

edited