Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not always retry write to pipe if we get blocking errors #519

Closed
wants to merge 1 commit into from
Closed

do not always retry write to pipe if we get blocking errors #519

wants to merge 1 commit into from

Conversation

etamme
Copy link

@etamme etamme commented May 21, 2015

If you have an rabbitmq event subscription the event_rabbitmq module will shm_alloc rmq events and write pointers of the structs to the pipe.

In the event that the node you have connected to goes down, the pipe will start to fill and when it reaches its max capacity (65535 bytes) or approximately 2700 events based on the 8 byte pointer, the write call with return EAGAIN causing the while loop to become an infinite loop until pointers start getting pulled off the pipe. This causes massive CPU consumption, as well as blocking any process that generates an event to event_rabbitmq.

Attempts to publish to rabbit MQ timeout after 3 minutes based on a default system tcp timeout, so only one event every 3 minutes will be pulled from the pipe while the amqp node is down.

You can recreate this issue by setting up a proxy with an event_rabbitmq subscription, adding an iptables rule to block access to the amqp node and send traffic to the proxy that would generate an event till you hit the max pipe size ~2703 events pointers.

This commit changes the logic to simply retry the write 3 times, then abort.

…auses the proxy to lock up and consume cpu
@razvancrainea razvancrainea self-assigned this May 22, 2015
@josephfrazier
Copy link

Note that event_xmlrpc has the same problem as well:

} while ((rc < 0 && (IS_ERR(EINTR)||IS_ERR(EAGAIN)||IS_ERR(EWOULDBLOCK)))

EDIT: Whoops, didn't read closely enough. The parentheses in the above example are arranged differently, so retries is honored regardless of error code.

@razvancrainea
Copy link
Member

The PR was committed in the master branch, with a few changes. If everything is ok now, let me know so I can backport it and close the PR.

Thanks,
Răzvan

@razvancrainea
Copy link
Member

Backported to 2.1. Closing the ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants