New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retries: add support for "silent" retries #295
Conversation
cc @takhs91 |
dramatiq/worker.py
Outdated
@@ -484,7 +484,7 @@ def process_message(self, message): | |||
|
|||
if isinstance(e, RateLimitExceeded): | |||
self.logger.error("Rate limit exceeded in message %s: %s.", message, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a similar vein, I've always wondered why RateLimitExceeded
is logged as an error?
I often use ConcurrentRateLimiter
as a distributed mutex lock if I want only one actor to be able to access a resource or perform some activity at a time. This usually results in a flood of RateLimitExceeded
errors which seems unusual as actors are expected to back-off until they obtain the lock.
In the end I ended up muting all of the RateLimitExceeded
exceptions, and added a custom logging middleware to ensure that message failures are logged once all of the retry attempts have ultimately been exhausted:
class LogFailuresMiddleware(dramatiq.Middleware):
def after_process_message(self, broker, message, *, result=None, exception=None):
"""
Ensure that message failures (e.g. as a result of timeout or retry limits) are logged.
Note that a Dramatiq message 'fails' only after all of its retry attempts have been exhausted (i.e. failures
on individual retries will not be logged, only the last one that causes Dramatiq to give up).
"""
if message.failed:
logger.error(
f"'{message}' failed after {message._message.options.get('retries', 0)} retries - giving up!",
extra={"result": str(result), "exception": str(exception)},
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...maybe the canonical approach once this is merged will be to:
- set the
raise_on_failure
parameter to false when acquiring the mutex - raise the new
Retry
exception provided by this PR if the lock cannot be acquired so that retry attempts are incremented but nothing is logged. - include a custom middleware (like above) for logging message failures once retries are exhausted
@Bogdanp, any comments on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Retries middleware already logs a warning when a message's retries are exhausted so I think your proposed middleware may be unnecessary. I've lowered the log level for the RateLimitExceeded
case to debug
since you're right that it is unnecessarily noisy in the general case.
Azure's having networking issues and there's no wheel for pylibmc for 3.8 so installing libmemcached-dev is making the builds unnecessarily flaky.
This should help detect any downstream issues with dependencies.
They're less flaky than they used to be but they still cause failures.
No description provided.