Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ServiceBus] Send message didn't recovered after network disconnect. #15473

Closed
cropse opened this issue Nov 20, 2020 · 5 comments
Closed

[ServiceBus] Send message didn't recovered after network disconnect. #15473

cropse opened this issue Nov 20, 2020 · 5 comments
Assignees
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus

Comments

@cropse
Copy link

cropse commented Nov 20, 2020

  • Package Name: azure-servicebus
  • Package Version: 7.0.0b8
  • Operating System: Mac 10.15.5
  • Python Version: 3.6.10

Describe the bug
sending message was infinitely blocked when network interrupted

To Reproduce
Steps to reproduce the behavior:
with the function from example like this

def send_batch_message(sender):
    batch_message = sender.create_message_batch()
    print('disconnect network here')
    time.sleep(15)
    for _ in range(10):
        try:
            batch_message.add_message(ServiceBusMessage("Session Message inside a ServiceBusMessageBatch", session_id=SESSION_ID))
        except ValueError:
            # ServiceBusMessageBatch object reaches max_size.
            # New ServiceBusMessageBatch object can be created here to send more data.
            break
    sender.send_messages(batch_message)
    print('success')
  1. disconnect the network in time.sleep()
  2. then wait till it to sender.send_messages
  3. reconnect network
  4. infinite block

Expected behavior
should have mechanism to retry or raise exception at least

@ghost ghost added needs-triage This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Nov 20, 2020
@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Nov 20, 2020
@yunhaoling
Copy link
Contributor

yunhaoling commented Nov 24, 2020

hey @cropse , thanks for reaching out.

I tried to reproduce the issue locally but failed. I could get the error raised out.
I'm working on a Windows machine, what I did to disconnect the network is to:

  • when the program starts sleeping, I go into the Control Panel\Network and Internet\Network Connections disabling all the networks.

Could you tell me which platform you're on and how you disconnect the network?

---------My reproduce error log----------

In azure-servicebus 7.0.0b8, disconnecting the network would give me ServiceBusConnectionError, my error log is as follows:

ConnectionClose('ErrorCodes.UnknownError: Connection in an unexpected error state.')
Traceback (most recent call last):
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\uamqp\authentication\cbs_auth.py", line 76, in create_authenticator
    self._connection.container_id)
  File ".\src/cbs.pyx", line 73, in uamqp.c_uamqp.CBSTokenAuth.__cinit__
ValueError: Unable to open CBS link.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\azure\servicebus\_base_handler.py", line 247, in _do_retryable_operation
    return operation(**kwargs)
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\azure\servicebus\_servicebus_sender.py", line 187, in _send
    self._open()
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\azure\servicebus\_servicebus_sender.py", line 175, in _open
    self._handler.open(connection=self._connection)
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\uamqp\client.py", line 259, in open
    self._build_session()
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\uamqp\client.py", line 214, in _build_session
    on_attach=self._on_attach)
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\uamqp\authentication\cbs_auth.py", line 82, in create_authenticator
    "Please confirm target hostname exists: {}".format(connection.container_id, connection.hostname))
uamqp.errors.AMQPConnectionError: Unable to open authentication session on connection b'SBSender-ffa68263-b4db-4604-ae1f-09b10aac2f56'.
Please confirm target hostname exists: b'xxx.servicebus.windows.net'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../azure-sdk-for-python/sdk/servicebus/azure-servicebus/samples/sync_samples/session_send_receive.py", line 68, in <module>
    send_batch_message(sender)
  File "../azure-sdk-for-python/sdk/servicebus/azure-servicebus/samples/sync_samples/session_send_receive.py", line 45, in send_batch_message
    sender.send_messages(batch_message)
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\azure\servicebus\_servicebus_sender.py", line 370, in send_messages
    require_last_exception=True
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\azure\servicebus\_base_handler.py", line 251, in _do_retryable_operation
    last_exception = self._handle_exception(exception, **kwargs)
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\azure\servicebus\_base_handler.py", line 228, in _handle_exception
    raise error
azure.servicebus.exceptions.ServiceBusConnectionError: Failed to open handler: Unable to open authentication session on connection b'SBSender-ffa68263-b4db-4604-ae1f-09b10aac2f56'.
Please confirm target hostname exists: b'xx.servicebus.windows.net'.

As you may know, we have just released azures-servicebus 7.0.0 GA version in which we improved our error design by switching to an condition based approach.

I also tried to reproduce the issue in 7.0.0 which now would give me ServiceBusError, my error log is as follows:

ConnectionClose('ErrorCodes.UnknownError: Connection in an unexpected error state.')
end sleeping
Traceback (most recent call last):
  File "..\azure-sdk-for-python\sdk\servicebus\azure-servicebus\azure\servicebus\_base_handler.py", line 293, in _do_retryable_operation
    return operation(**kwargs)
  File "..\azure-sdk-for-python\sdk\servicebus\azure-servicebus\azure\servicebus\_servicebus_sender.py", line 238, in _send
    self._handler.send_message(message.message)
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\uamqp\client.py", line 725, in send_message
    running = self.do_work()
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\uamqp\client.py", line 397, in do_work
    return self._client_run()
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\uamqp\client.py", line 650, in _client_run
    self._connection.work()
  File "..\Anaconda3\envs\servicebus37\lib\site-packages\uamqp\connection.py", line 251, in work
    raise self._error
uamqp.errors.ConnectionClose: ErrorCodes.UnknownError: Connection in an unexpected error state.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../azure-sdk-for-python/sdk/servicebus/azure-servicebus/samples/sync_samples/session_send_receive.py", line 68, in <module>
    send_batch_message(sender)
  File "../azure-sdk-for-python/sdk/servicebus/azure-servicebus/samples/sync_samples/session_send_receive.py", line 45, in send_batch_message
    sender.send_messages(batch_message)
  File "..\azure-sdk-for-python\sdk\servicebus\azure-servicebus\azure\servicebus\_servicebus_sender.py", line 393, in send_messages
    require_last_exception=True
  File "..\azure-sdk-for-python\sdk\servicebus\azure-servicebus\azure\servicebus\_base_handler.py", line 297, in _do_retryable_operation
    last_exception = self._handle_exception(exception)
  File "..\azure-sdk-for-python\sdk\servicebus\azure-servicebus\azure\servicebus\_base_handler.py", line 252, in _handle_exception
    raise error
azure.servicebus.exceptions.ServiceBusError: Connection in an unexpected error state. Error condition: ErrorCodes.UnknownError.

@yunhaoling yunhaoling added the needs-author-feedback More information is needed from author to address the issue. label Nov 24, 2020
@cropse
Copy link
Author

cropse commented Nov 26, 2020

I'm using Mac as test operating system, so probably is the specific issue only on macOS.

here is entire code:

#!/usr/bin/env python

import time
from azure.servicebus import ServiceBusClient, ServiceBusMessage

CONNECTION_STR = ""
TOPIC_NAME = ''
SUBSCRIPTION_NAME = ''
SESSION_ID = ''


def send_batch_message(sender):
    batch_message = sender.create_message_batch()
    print('disconnect network here')
    time.sleep(15)
    for _ in range(10):
        try:
            batch_message.add_message(ServiceBusMessage("Session Message inside a ServiceBusMessageBatch", session_id=SESSION_ID))
        except ValueError:
            # ServiceBusMessageBatch object reaches max_size.
            # New ServiceBusMessageBatch object can be created here to send more data.
            break
    print('start to send message, reconnect your network!!')
    sender.send_messages(batch_message)
    print('success !!')


servicebus_client = ServiceBusClient.from_connection_string(conn_str=CONNECTION_STR, logging_enable=False, retry_total=10)
with servicebus_client:
    receiver = servicebus_client.get_subscription_receiver(
        topic_name=TOPIC_NAME,
        subscription_name=SUBSCRIPTION_NAME
    )
    sender = servicebus_client.get_topic_sender(TOPIC_NAME)
    with sender:
        send_batch_message(sender)

print("Receive is done.")

@ghost ghost added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Nov 26, 2020
@yunhaoling
Copy link
Contributor

thanks for the code @cropse, I'll try it on my mac

@yunhaoling
Copy link
Contributor

yunhaoling commented Dec 7, 2020

hey @cropse , confirming the issue on macOS.

However, the issue seems to be within the deeper C layer (azure-c-shared-utility) which is not able to detect the network is down on macOS.
I need more time into investigation.

If you want the code to work first, you could probably do some local check to mitigate:

from azure.servicebus.exception import OperationTimeoutError
with sender:
    try:
        while tried_time <= 3:
            sender.send_messages(message, timeout=10)
        else:
            # all retry failed, shutdown the sender, create a new sender and start the send flow again
    except OperationTimeoutError:
        tried_time += 1

@yunhaoling
Copy link
Contributor

hey @cropse, we have enhanced the underlying C library to be able to detect error IO status on MacOS.

uamqp v1.2.13 is released today to include the fix.
Please upgrade to the latest uamqp version (e.g. pip install uamqp --upgrade) to see if it solves your problem.

I'm closing the issue now, feel free to reopen if you still get trouble with the latest version.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Projects
None yet
Development

No branches or pull requests

4 participants