Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wsrelay] Give connection tasks time to clean up #14097

Merged
merged 1 commit into from
Jun 23, 2023

Conversation

relrod
Copy link
Member

@relrod relrod commented Jun 7, 2023

SUMMARY

When we close/cancel a connection to a web node, give the task time to clean up after itself and cleanly exit. Otherwise, the Python GC might clean up the task too early and this leads to ugly log messages like this: "Task was destroyed but it is pending!"

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

@relrod relrod marked this pull request as ready for review June 7, 2023 20:50
@fosterseth
Copy link
Member

were you able to reproduce the scenario that logged the "Task was destroyed.." message?

@relrod
Copy link
Member Author

relrod commented Jun 8, 2023

I wasn't able to reproduce it, but I am fairly confident that this would cause that in some conditions. Even if there's something else causing that, this is still something that I think needs to be fixed.

@relrod
Copy link
Member Author

relrod commented Jun 8, 2023

https://bugs.python.org/issue44665 - This is kind of a reference, if you need a citation 😄

And it specifically mentions the error we're seeing. The issue itself is about adding asyncio documentation to prevent doing exactly what the current code is doing (killing off the strong reference to the task before it is completed).

This seems to be a well-known phenomenon in python asyncio, you have to keep a strong reference around to tasks you create with create_task() or bad things can happen.

awx/main/wsrelay.py Outdated Show resolved Hide resolved
@AlanCoding
Copy link
Member

Needs a rebase.

I also kind of liked the callback method in the python bug you linked task.add_done_callback(lambda t: running_tasks.remove(t)), since it would allow instance ejection from the tracking dictionary. But this interest is almost entirely academic, as there's no good functional reason for such a change.

You should consider my comment about running these concurrently, but either way there is probably little risk because removing a node is an infrequent task and this is unlikely to choke because the main loop gets stalled waiting for the cancellations.

@AlanCoding
Copy link
Member

I see the error mentioned in issues #14057 and #14003 and both mention a fairly ancient resolution of the k8s log size issue. So I think we're good.

When we close/cancel a connection to a web node, give the task time to
clean up after itself and cleanly exit. Otherwise, the Python GC might
clean up the task too early and this leads to ugly log messages like
this: "Task was destroyed but it is pending!"

Signed-off-by: Rick Elrod <rick@elrod.me>
@relrod relrod force-pushed the wsrelay-collect-your-garbage branch from 4249f33 to 272b91c Compare June 22, 2023 15:49
@relrod relrod requested a review from AlanCoding June 22, 2023 16:34
Copy link
Member

@AlanCoding AlanCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, very async

@relrod relrod merged commit 255c2e4 into ansible:devel Jun 23, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants