Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix resource leak and improved wait_for_ready #2675

Merged

Conversation

RunDevelopment
Copy link
Member

I noticed that reloading the frontend would sometimes cause the host server to not respond anymore even though the worker was fine. The cause of this issue was a resource leak with our _session client session. Turns out, everything an aiohttp.ClientSession returns is a resource that must be disposed of. We were not doing that in a few methods, and aiohttp doesn't do auto-clean up apparently, so after a few leaks the entire backend host server would become unresponsive as _session would always timeout.

Leaks were hard to reproduce btw. I had to do 5 leaky requests to the worker's /nodes in parallel for to consistently trigger the bug.

The fix was to properly manage resource everywhere. I also improved wait_for_ready to not spam the worker with requests by (1) remembering whether we successfully connected to it before and (2) making processing concurrent calls to wait_for_ready sequentially which benefits from (1).

Copy link
Member

@joeyballentine joeyballentine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this might fix the weird stuff I was seeing the past few days

@joeyballentine joeyballentine merged commit 5b4fe11 into chaiNNer-org:main Mar 12, 2024
14 checks passed
@RunDevelopment RunDevelopment deleted the fix-backend-resource-leak branch March 12, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants