Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retries: add support for "silent" retries #295

Merged
merged 7 commits into from Mar 30, 2020
Merged

retries: add support for "silent" retries #295

merged 7 commits into from Mar 30, 2020

Conversation

Bogdanp
Copy link
Owner

@Bogdanp Bogdanp commented Mar 29, 2020

No description provided.

@Bogdanp
Copy link
Owner Author

@Bogdanp Bogdanp commented Mar 29, 2020

cc @takhs91

@@ -484,7 +484,7 @@ def process_message(self, message):

if isinstance(e, RateLimitExceeded):
self.logger.error("Rate limit exceeded in message %s: %s.", message, e)
Copy link

@jcass77 jcass77 Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a similar vein, I've always wondered why RateLimitExceeded is logged as an error?

I often use ConcurrentRateLimiter as a distributed mutex lock if I want only one actor to be able to access a resource or perform some activity at a time. This usually results in a flood of RateLimitExceeded errors which seems unusual as actors are expected to back-off until they obtain the lock.

In the end I ended up muting all of the RateLimitExceeded exceptions, and added a custom logging middleware to ensure that message failures are logged once all of the retry attempts have ultimately been exhausted:

class LogFailuresMiddleware(dramatiq.Middleware):
    def after_process_message(self, broker, message, *, result=None, exception=None):
        """
        Ensure that message failures (e.g. as a result of timeout or retry limits) are logged.

        Note that a Dramatiq message 'fails' only after all of its retry attempts have been exhausted (i.e. failures
        on individual retries will not be logged, only the last one that causes Dramatiq to give up).
        """
        if message.failed:
            logger.error(
                f"'{message}' failed after {message._message.options.get('retries', 0)} retries - giving up!",
                extra={"result": str(result), "exception": str(exception)},
            )

Copy link

@jcass77 jcass77 Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...maybe the canonical approach once this is merged will be to:

  1. set the raise_on_failure parameter to false when acquiring the mutex
  2. raise the new Retry exception provided by this PR if the lock cannot be acquired so that retry attempts are incremented but nothing is logged.
  3. include a custom middleware (like above) for logging message failures once retries are exhausted

@Bogdanp, any comments on this?

Copy link
Owner Author

@Bogdanp Bogdanp Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Retries middleware already logs a warning when a message's retries are exhausted so I think your proposed middleware may be unnecessary. I've lowered the log level for the RateLimitExceeded case to debug since you're right that it is unnecessarily noisy in the general case.

Bogdanp added 4 commits Mar 30, 2020
Azure's having networking issues and there's no wheel for pylibmc for
3.8 so installing libmemcached-dev is making the builds unnecessarily
flaky.
This should help detect any downstream issues with dependencies.
They're less flaky than they used to be but they still cause failures.
@Bogdanp Bogdanp merged commit 90f2194 into master Mar 30, 2020
21 of 30 checks passed
@Bogdanp Bogdanp deleted the silent-retries branch Mar 30, 2020
@dimrozakis dimrozakis mentioned this pull request Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants