Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "max number of clients reached" errors and perf improvements #797

Merged
merged 7 commits into from
May 19, 2020

Conversation

timgl
Copy link
Collaborator

@timgl timgl commented May 18, 2020

Changes

  • Add broker_pool_limit=0 (see explanation)
  • Merge in minor improvements in number of queries
  • Cache Team calls for api token

This should solves Daz's issues

Before, Celery would be spinning up too many connections to Redis if many events were inserted.
See load test:
image
Error message:
image

Load testing

With 1 Standard-1x web dyno I was able to achieve on average 20/sec
image

With 2 Standard-2x web dynos I was able to achieve 44/sec
image

After caching Teams, no real difference though lower response times and memory load
image

Each time, Redis would use about 21 connections so the advice would be to get a Redis server with slightly more connections.

Checklist

  • All querysets/queries filter by Team (if applicable)
  • Backend tests (if applicable)

@timgl timgl requested a review from EDsCODE May 18, 2020 19:03
@timgl timgl temporarily deployed to posthog-redis-connectio-lfxp5v May 18, 2020 19:04 Inactive
@timgl timgl changed the title Try broker_pool_limit Fix "max number of clients reached" errors and perf improvements May 18, 2020
@timgl timgl temporarily deployed to posthog-redis-connectio-lfxp5v May 18, 2020 21:41 Inactive
Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm. can't/not sure how to mimic the conditions for client failure locally

try:
team_id = Team.objects.only('pk').get(api_token=token).pk
if team_id:
TEAM_ID_CACHE[token] = team_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this cache ever get flushed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does when the server gets restarted. However the token is fairly static, I don't think there'd be many situations you'd change it

@@ -20,6 +20,10 @@
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()

# Make sure Redis doesn't add too many connections
# https://stackoverflow.com/questions/47106592/redis-connections-not-being-released-after-celery-task-is-complete
app.conf.broker_pool_limit = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. curious if needing to open and close connections will become a bottleneck anyway but the load testing seems to hold up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes I think it does slow it down a little bit but at least it works, especially with the cheaper Redis instances

@timgl timgl merged commit b69411f into master May 19, 2020
@timgl timgl deleted the redis-connections-error branch May 19, 2020 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants