Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix 100% cpu usage on linux while using sqs #1189

Merged
merged 1 commit into from
May 15, 2020
Merged

Conversation

and800
Copy link
Contributor

@and800 and800 commented May 13, 2020

fix celery/celery#5299

kombu uses cURL multi_socket API, which is non-blocking callback-based API. But it doesn't work well in linux, because of epoll weird file descriptor management. When you use epoll, you must always remove FD from epoll before closing FD, but when curl decides that it doesn't need some particular socket, it will close that socket and after that it will inform you that the socket is closed - so you don't have a chance to remove it from epoll in time. See https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html for more info about this API.

In our case we periodically query an SQS queue, and curl performs the queries via keep-alive https connection. But sometimes (about once per minute or two) SQS sends a response with Connection: close header, then curl closes the socket.

I suggest this workaround: before every curl invocation, let's remove from our selector every curl-related file descriptor. And after we get control back from curl, we insert those back (most of the time curl will update socket states, so we should not forget to consider it). So as a result we are going to work with epoll in old-fashioned select/poll way.

@auvipy
Copy link
Member

auvipy commented May 13, 2020

thanks for handling this, would need to update test too

@auvipy auvipy added this to the 4.6.0 milestone May 13, 2020
@and800
Copy link
Contributor Author

and800 commented May 13, 2020

yup, that's still a draft, not completely sure about it

@auvipy
Copy link
Member

auvipy commented May 13, 2020

OK keep working on it

@and800 and800 force-pushed the master branch 2 times, most recently from 0f4de1d to 5b25033 Compare May 14, 2020 16:14
@and800 and800 marked this pull request as ready for review May 14, 2020 17:33
@and800
Copy link
Contributor Author

and800 commented May 14, 2020

It seems to work well @auvipy

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Celery worker using 100% CPU around epoll w/ prefork+SQS but still consuming tasks
2 participants