Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish with confirmation may raise error even when message was successfully published #80

Closed
enkelli opened this issue May 29, 2020 · 1 comment
Labels
Milestone

Comments

@enkelli
Copy link

enkelli commented May 29, 2020

Description

Hi, I think I've found one problem in delivery confirmation - in _publish_confirm method. There are three following important (and probably buggy) lines:

self._channel.write_frames(frames_out)
result = self._channel.rpc.get_request(confirm_uuid, raw=True)
self._channel.check_for_errors()

The first line sends data to RMQ, second one tries to get status of this action and the third line checks for errors. Where is the bug? That many things may happen between the write_frames and delivery checks. I've checked briefly implementation of those two methods (rpc.get_request and check_for_errors) and it seems that there is some other network communication that may fail. Which results to delivery-failed response while the messages was actually published.
This may be really unpleasant for a system that e.g. retries in case of a failed publish and duplicate messages are undesirable.

How you can verify this:

I've verified my theory using docker RMQ, but you can of course use custom RMQ server and somehow interrupt internet connection, see below.

  • Run rabbitmq docker (rabbitmq:3-management, expose ports 5672 and 15672 and run tests to ensure that it works. I did run only some tests: amqpstorm/tests/functional/generic_tests.py, using nose)
  • Tests should pass so we have to hack source files to prove my theory.
  • Go to the mentioned _publish_confirm method and put time.sleep(30) (30, more or less, so you have enough time to disable connection) after write_frames, before delivery checks.
  • I ran only one test now - nosetests amqpstorm/tests/functional/generic_tests.py -m test_functional_publish_and_get_a_large_message. I also wrapped publish call in this test to try-except block so I could print the thrown error in except block. it was AMQPConnectionError("Connection was closed by remote server: CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"
  • Now when test and source code is adjusted, you can run that one test, eventually open RMQ management in browser (you have to enable ports in docker) and wait a few seconds, the published message should appear there.
  • Now stop the running docker container. It has to be stopped before sleep, put to that publish method, expires.
  • And wait, when sleep finishes, it should print the error message.

I hope the description is clear. Otherwise, let me know, I'll try to answer your questions.

Definition of done

Error is raised only when a message was not delivered to the RMQ server.

@eandersson
Copy link
Owner

eandersson commented May 29, 2020

I believe this should solve this @enkelli. Can you +1 up it if it looks good to you?
https://github.com/eandersson/amqpstorm/pull/81/files

Based on testing it looks like the error check is only needed when the mandatory flag is set. I changed the code to only check for errors when that flag is set. I also changed it so that it only looks for publishing errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants