Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL Retry #14

Closed
thejuan opened this issue Apr 26, 2016 · 17 comments
Closed

SSL Retry #14

thejuan opened this issue Apr 26, 2016 · 17 comments
Assignees
Labels

Comments

@thejuan
Copy link

thejuan commented Apr 26, 2016

I regularly get [SSL: BAD_WRITE_RETRY] bad write retry (_ssl.c:1647) errors when running AMQP-Storm using SSL.

Looking at the code I think its because it's not handling SSL Retry exceptions from the socket

http://stackoverflow.com/questions/2997218/why-am-i-getting-error1409f07fssl-routinesssl3-write-pending-bad-write-retr

i'll get a stack trace to see if its on write or on do_handshake

@thejuan
Copy link
Author

thejuan commented Apr 26, 2016

Sorry, this was pre-upgrade to latest

@thejuan thejuan closed this as completed Apr 26, 2016
@eandersson
Copy link
Owner

Ah, so upgrading to 1.3.0 and it worked?

@thejuan
Copy link
Author

thejuan commented Apr 26, 2016

Yes. But I also changed my consumer logic to better match the scalable example.... So it may not have been present in the old release... It may have been my code :)

@eandersson
Copy link
Owner

Let me know if you run into any new issues. I haven't had too many chances to test the SSL implementation.

@thejuan
Copy link
Author

thejuan commented Apr 28, 2016

Still getting these errors consistently (2-3 times an hour) under ssl
Also get "bad record mac"


Traceback (most recent call last):
  File "Daemon.py", line 120, in publish
    success = self.publish_channel.basic.publish(routing_key=routing_key, body=body, exchange="amq.topic")
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/basic.py", line 160, in publish
    return self._publish_confirm(send_buffer)
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/basic.py", line 297, in _publish_confirm
    result = self._channel.rpc.get_request(confirm_uuid, True)
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/base.py", line 171, in get_request
    self._wait_for_request(uuid)
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/base.py", line 193, in _wait_for_request
    self._adapter.check_for_errors()
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/channel.py", line 223, in check_for_errors
    raise why
AMQPConnectionError: [SSL: BAD_WRITE_RETRY] bad write retry (_ssl.c:1646)
Traceback (most recent call last):
  File "/usr/src/app/Subscriptions/Amqp.py", line 18, in start
    channel.start_consuming()
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/channel.py", line 138, in start_consuming
    self.process_data_events(to_tuple=to_tuple)
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/channel.py", line 163, in process_data_events
    for message in self.build_inbound_messages(break_on_empty=True):
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/channel.py", line 181, in build_inbound_messages
    self.check_for_errors()
  File "/usr/local/lib/python2.7/site-packages/amqpstorm/channel.py", line 223, in check_for_errors
    raise why
AMQPConnectionError: [SSL: BAD_WRITE_RETRY] bad write retry (_ssl.c:1646)

@thejuan thejuan reopened this Apr 28, 2016
@thejuan
Copy link
Author

thejuan commented Apr 28, 2016

I've update all my SSL libraries (running official Python Docker image)
Will see if it disappears based on this

EDIT: Errors are still occurring but they are all now [Errno 14] Bad address
Seems to correlate with a schedule that pushes out messages
I have a single publish channel that all threads in an our Web API share

@eandersson
Copy link
Owner

eandersson commented Apr 28, 2016

I am at a conference, but I'll take a look and I'll try to have it fixed by Monday.

@eandersson eandersson self-assigned this Apr 28, 2016
@eandersson eandersson added this to the 1.3.X milestone Apr 28, 2016
@eandersson eandersson added the bug label Apr 28, 2016
@eandersson
Copy link
Owner

eandersson commented May 1, 2016

I am able to reproduce this, but could you paste your RabbitMQ logs as well? Also, are you running multiple channels (consumers)?

When I encountered this I saw the following in my RabbitMQ logs.

SSL: connection: tls_record.erl:170:Fatal error: bad record mac

I think that it may be related to thread-safety of some sort. The frames may not be delivered in the correct sequence for some reason when SSL is enabled.

@thejuan
Copy link
Author

thejuan commented May 2, 2016

I am running 5 consumer channels and a publisher channel. The publisher channel is only channel accessed by multiple threads.

I also see those in my rabbitmq logs
= REPORT==== 2016-05-01 23:52:02 UTC === SSL: connection: ssl_cipher.erl:292:Fatal error: bad record mac

RabbitMQ 3.6.0, Erlang 18.2

@eandersson
Copy link
Owner

I re-wrote the IO handling and was able to confirm that this is a threading issue. I am currently not quite sure on how I will solve this, without introducing unnecessary complexities.

eandersson added a commit that referenced this issue May 2, 2016
@eandersson
Copy link
Owner

eandersson commented May 2, 2016

I pushed a fix. I honestly wanted to avoid having to introduce more locking, but reliability > performance. Let me know if you can try it out.

@thejuan
Copy link
Author

thejuan commented May 3, 2016

Deployed it, will report after its had some time.

@thejuan
Copy link
Author

thejuan commented May 3, 2016

You could use an internal queue if you didn't want to lock on writes. Assuming python has a multi-thread lockless implementation :)

@thejuan
Copy link
Author

thejuan commented May 3, 2016

Looks like that has done the trick! Thanks

@thejuan thejuan closed this as completed May 3, 2016
@eandersson
Copy link
Owner

Glad it worked and thanks for confirming!

I'll test this out and release it officially in a couple of days, and I'll try to come up with a better solution for the next major release.

@thejuan
Copy link
Author

thejuan commented May 4, 2016

24 hours without single error. Still seems to be sending messages as well ;)

@eandersson
Copy link
Owner

eandersson commented May 4, 2016

Awesome. I officially pushed the version out to pypi as well.

Thanks again for reporting this!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants