Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure we don't loop trying to write to a channel thats not connected (fix 100% CPU) #419

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

djay
Copy link

@djay djay commented Sep 11, 2023

fixes #418

This fixes a race condition where a quick forced socket close will make getpeername to fail and connected == False. If the request then has an error close_after_flush is set and this will result in a 100% CPU loop.
This wiil ensure the channel is still closed in this case.

TODO:

  • add some more specific tests for double close
  • test for closing a error request thats not connected
  • test for maintainance not cleaning up an unconnected task
  • change the fix to only prevent write specifically when already closed, not when not connected
  • undo fix that closes if getpeername fails.

@djay djay changed the title ensure we don't keep trying to write to a channel thats not connected ensure we don't loop trying to write to a channel thats not connected Sep 12, 2023
@djay djay changed the title ensure we don't loop trying to write to a channel thats not connected ensure we don't loop trying to write to a channel thats not connected (fix 100% CPU) Sep 13, 2023

def handle_write(self):
# Precondition: there's data in the out buffer to be sent, or
# there's a pending will_close request

if not self.connected:
if not self.connected and not self.close_when_flushed:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should also prevent a return if total_outbufs_len > 0 as this could also be a case where you want to write to get an error and close the channel

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digitalresistor thoughts?

@@ -67,6 +67,9 @@ def __init__(self, server, sock, addr, adj, map=None):
self.outbuf_lock = threading.Condition()

wasyncore.dispatcher.__init__(self, sock, map=map)
if not self.connected:
# Sometimes can be closed quickly and getpeername fails.
self.handle_close()
Copy link
Author

@djay djay Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digitalresistor @d-mauer I'm still not sure on this fix. I think I read somewhere how windows can sometimes fail on getpeername?
The other fix will still prevent the looping bug by letting it write and error out. This one will close it before it wastes the app time if indeed the connection really is closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getpeername() fails on Windows then it would get self.connected set to False anyway, this would cause the bug. So trying to keep going after getpeername() failed is not sustainable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connect = False doesn't cause the bug alone. It also needs the request to be malformed as that prevents both reading and writing and maintainance from cleaning it up. So in someways the real bug is in handle_write

if not self.connected:
    return

The test i put in shows that in the most likely scenario that makes this occur it's trying to close the channel but is prevented from doing so by the above line.

The more I think about it @d-maurer is correct that this should be changed to self.closed or something that explicitly prevents a close from happening twice. Thats the safest minimal change.

@djay djay marked this pull request as draft September 25, 2023 04:37
@djay djay marked this pull request as ready for review February 5, 2024 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

100% cpu in mainthread due to not closing properly? (channel.connected == False)
2 participants