Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipped messages can get inconsistent results stored #440

Closed
FinnLidbetter opened this issue Nov 2, 2021 · 8 comments · Fixed by #449
Closed

Skipped messages can get inconsistent results stored #440

FinnLidbetter opened this issue Nov 2, 2021 · 8 comments · Fixed by #449

Comments

@FinnLidbetter
Copy link
Contributor

FinnLidbetter commented Nov 2, 2021

If a message is dropped because of the Age Limit middleware, then getting the result of that message will be inconsistent depending on how soon after the message is dropped an attempt to get the result is made.

Not sure what the resolution should be here. I think it should either consistently store a None result, or consistently store an exception that is not Exception("unknown").

What OS are you using?

Ubuntu 20.04

What version of Dramatiq are you using?

1.11.0

What did you do?

Use a broker with the AgeLimit and Results middleware. Have a message hit its age limit and get its result in a blocking get_result call. Get the result for the same message again a second or two after the blocking get_result returns a result.

What did you expect would happen?

The result should not change.

What happened?

On the first get_result, the result is None.
On the second get_result, retrieving the result raises a ResultFailure exception with exception Exception("unknown")

This appears to be because:

  1. When the Age Limit expires, the AgeLimit middleware sets the failed flag on the MessageProxy before the message is processed
    def before_process_message(self, broker, message):
    actor = broker.get_actor(message.actor_name)
    max_age = message.options.get("max_age") or actor.options.get("max_age", self.max_age)
    if not max_age:
    return
    if current_millis() - message.message_timestamp >= max_age:
    self.logger.warning("Message %r has exceeded its age limit.", message.message_id)
    message.fail()
    return
  2. In worker.py this causes and emit_after of "process_message" with a result of None
    self.broker.emit_after("process_message", message, result=res)
  3. The Results middleware's after_process_message will then store this None result and no exception
    self.backend.store_result(message, result, result_ttl)
  4. After this happens, worker.py still has some post-processing of the message to do. Since the message was marked with failed, post_process_message will do an emit_after of "nack"
    self.broker.emit_after("nack", message)
  5. Since there was no exception, the results middleware will then replace the result at the message key in the result store with
    def after_nack(self, broker, message):
    store_results, result_ttl = self._lookup_options(broker, message)
    if store_results and message.failed:
    exception = message._exception or Exception("unknown")
    self.backend.store_exception(message, exception, result_ttl)

Sample Code

import dramatiq
import time

from dramatiq.brokers.rabbitmq import RabbitmqBroker
from dramatiq.results.backends import RedisBackend
from dramatiq.results import Results

result_backend = RedisBackend()
broker = RabbitmqBroker()
broker.add_middleware(Results(backend=result_backend))
dramatiq.set_broker(broker)

@dramatiq.actor(store_results=True, max_age=1)
def sample_actor():
  return 42

if __name__ == "__main__":
    message = sample_actor.send()
    result_1 = message.get_result(block=True, timeout=2000)
    print(result_1)
    time.sleep(2)
    # An Exception("unknown") will be raised here.
    result_2 = message.get_result()
    print(result_2)
@FinnLidbetter
Copy link
Contributor Author

If an approach can be agreed upon, I would be happy to submit a PR to fix it.

It seems like the Age Limit middleware should at least be raising SkipMessage(), to trigger any after_skip_message hooks instead of after_process_message. But I'm not sure that this is sufficient to fix the problem. Since SkipMessage is caught in worker.py, if the message is still marked as failed, but has no exception, then I think it will still cause Exception("unknown") to be stored by the Results middleware when the message is nack'd due to message failure.

@Bogdanp
Copy link
Owner

Bogdanp commented Nov 8, 2021

Maybe AgeLimit should both raise SkipMessage and set an exception on the message? I think in that situation, you'll always end up with a consistent result (i.e. an exception) and that exception will at least describe the problem.

@FinnLidbetter
Copy link
Contributor Author

Ok, what about raising SkipMessage in the AgeLimit middleware and then changing the except SkipMessage behaviour in _WorkerThread's process_message to stuff SkipMessage exceptions into the message, much like how BaseExceptions are stuffed into messages?

dramatiq/dramatiq/worker.py

Lines 495 to 504 in fa732f4

except SkipMessage:
self.logger.warning("Message %s was skipped.", message)
self.broker.emit_after("skip_message", message)
except BaseException as e:
# Stuff the exception into the message [proxy] so that it
# may be used by the stub broker to provide a nicer
# testing experience. Also used by the results middleware
# to pass exceptions into results.
message.stuff_exception(e)

If that sounds good, I can submit a PR for this.

@Bogdanp
Copy link
Owner

Bogdanp commented Nov 9, 2021

Yes, I think that would be fine.

FinnLidbetter added a commit to FinnLidbetter/dramatiq that referenced this issue Nov 11, 2021
Fixes Bogdanp#440

Stuff SkipMessage exceptions into the message proxy to ensure that
the result is consistent when used in conjunction with the Results
middleware.
Additionally, move the message.fail() call into the SkipMessage
handling to ensure that skipped messages are nack'd rather than
ack'd.

Note that if failing the message is left to the middleware,
then unintentionally omitting message.fail() can lead to no result
ever being stored by the results middleware for the message.
If it should be left to the middleware to decide to fail the skipped
message then it would be more sensible to have fail as the default.
This could be achieved by making the SkipMessage exception take an
optional "fail_message" parameter that defaults to True, and have
the message conditionally fail() inside the worker SkipMessage
exception handling. Custom middleware could then raise SkipMessage
with fail_message=False to override this behaviour and lead to the
message being skipped and ack'd rather than skipped and nack'd.
@FinnLidbetter
Copy link
Contributor Author

Hi @Bogdanp, any thoughts on the PR I put up for this? #449

@Bogdanp
Copy link
Owner

Bogdanp commented Jan 3, 2022

Hey @FinnLidbetter, I took a look at it a while back, but I need to think about it some more because it might not be the right approach after all. I'll try to find some time to look at it again this weekend or the next.

@FinnLidbetter
Copy link
Contributor Author

Ok no problem, that sounds good to me, thank you. Let me know if there's anything I can do to help.

@FinnLidbetter
Copy link
Contributor Author

Hi @Bogdanp, sorry to be a bother on this again. Have you had time to give this one some thought? I am happy to take another crack at this if you can articulate your concern with the approach used in #449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants