Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] ServiceClient's send() method got hung for 15 mins before it throws exception (java.io.IOException: Encountered exception during amqp connection: proton:io with description Operation timed out) #912

Closed
PremSahooESL opened this issue Sep 10, 2020 · 16 comments
Assignees
Labels
area-documentation fix checked in Fix checked into main or preview, but not yet released. IoTSDK

Comments

@PremSahooESL
Copy link

PremSahooESL commented Sep 10, 2020

AB#8236725

@PremSahooESL PremSahooESL changed the title [Bug Report] ServiceClient's send() method got hung for 15 mins before it throws execption. [Bug Report] ServiceClient's send() method got hung for 15 mins before it throws exception (java.io.IOException: Encountered exception during amqp connection: proton:io with description Operation timed out) Sep 10, 2020
@PremSahooESL
Copy link
Author

Scenario: In one of our IotHub client application, Multiple threads calling ServiceClient's send() method. all of a sudden send() method did not respond any thing till about 15 mins.

later as 15 mins passed, it threw exception (java.io.IOException: Encountered exception during amqp connection: proton:io with description Operation timed out)
.
Upon receiving above exception, IotHub client application closed the existing connection and reopened.
serviceClient.close();
serviceClient.open();

afterwards , it seemed all ok...it started sending messages to IotHub again.

@timtay-microsoft
Copy link
Member

Do you have any logs from the service client during all of this? And what version of the IoT service SDK are you using?

@timtay-microsoft timtay-microsoft added the customer input needed issue lacks enough data for a proper investigation label Sep 10, 2020
@timtay-microsoft
Copy link
Member

And are these all separate instances of service client being called in separate threads, or is it one instance being shared between multiple threads?

@vishal-kumar-solum
Copy link

vishal-kumar-solum commented Sep 10, 2020

we were using version 1.23.0 and only one instance is being shared between multiple threads (A singleton bean of ServiceClient is being created and shared among all the threads)

com.microsoft.azure.sdk.iot
iot-service-client
1.23.0

@timtay-microsoft
Copy link
Member

And have you seen this behavior more than once?

@PremSahooESL
Copy link
Author

We saw this behavior one time recently.

Sorry, during the issue occurred, SDK logs were disabled.

Is there any specific reason, why ServiceClient does not have timeout configuration?
As a part of improvement, is it possible to add 'timeout' as configurable?

@PremSahooESL
Copy link
Author

Also even if one instance of ServiceClinet is being shared between multiple threads in our application.
We have observed, when multiple threads call send() method during a peak time(around 100 calls per sec)
send() method takes so long time to respond.

is this behavior expected by SDK?

or do we need to surround send() method with 'synchronized' block in multiple threads env? Performance might be compromised in such case.

@timtay-microsoft
Copy link
Member

timtay-microsoft commented Sep 11, 2020

I can't quite tell from the exception you provided what happened, but as best as I can tell, there was a socket connect or read timeout within our AMQP library. Right now, there is no way to configure that option in our AMQP library, so there is no way to extend that configuration to you.

As for the general slowdown that you see more consistently, I can explain how the service client currently works and I should be able to help you optimize this a bit.

Right now, our service client will allow you to call send on a single instance from multiple different threads at once, but it has synchronization logic that effectively limits each service client instance to sending a single message at a time. So my first piece of advice is to use multiple instances of service client rather than one.

The send operation actually opens a connection, sends the messages, waits for the service to ack the message, and then tears down the connection per invocation of serviceClient.send(...), so there isn't a reason to keep only one instance of the service client around.

This may be a bit confusing, and we consider it a design flaw in our current service client code, but there is no way to open the connection, and leave it open for multiple send operations. So the best advice right now is to just use more service client instances. I suspect that doing this will also stop you from hitting the exception you mentioned.

@timtay-microsoft timtay-microsoft removed the customer input needed issue lacks enough data for a proper investigation label Sep 11, 2020
@PremSahooESL
Copy link
Author

Thanks for your suggestion...We will try to modify our code accordingly.

I have one question as below.
is there any restrictions on number of service client instances by multiple threads at a time?

@timtay-microsoft
Copy link
Member

There is no SDK side limit to the number of service client instances per thread, no. You may hit timeouts or throttles from the service or hit port limits from your physical device, but that will depend on your machine and on the scale of your IoT Hub. Experimentation will help you find the right balance

@PremSahooESL
Copy link
Author

PremSahooESL commented Sep 16, 2020

After Making changes in code with multiple instances of ServiceClient.
Sometimes we could see the same exception which we reported earlier. But right after exception, our code retries and it works.
Also during the exception, it did not hang for 15 mins rather it throws exceptions within 1 or 2 mins.
Also the other issue, during peak, it was getting slow. we could not see it anymore.

Exception=java.io.IOException: Encountered exception during amqp connection: proton:io with description Operation timed out

By the way May I know what is the default timeout value for the above exception?

@timtay-microsoft
Copy link
Member

If you are using one service client instance per thread, and you are seeing the timeout within 1 to 2 minutes now, then that is the range of time to expect the timeout to take. My suspicion as to why you saw 15 minutes before was because some of your threads were stuck waiting in line to execute the synchronized part of the SDK code for ~13 minutes and then timed out during the actual amqp transaction.

So it sounds like the only bug here is that the SDK synchronizes some blocks of code without notifying you in the javadocs. Is that a fair assessment?

@timtay-microsoft timtay-microsoft self-assigned this Sep 21, 2020
@PremSahooESL
Copy link
Author

Thanks for your quick response. After performing a load test. Probably we can get some idea. We will keep updating here in case we observe any further abnormality in ServiceClient.

timtay-microsoft added a commit that referenced this issue Sep 22, 2020
…ut hidden synchronization

issue #912 makes it apparent that we need to notify users of this hidden "feature"
@timtay-microsoft
Copy link
Member

Sounds good. In the meantime, I'll go ahead and update our javadocs to include this information about synchronization

timtay-microsoft added a commit that referenced this issue Sep 22, 2020
…ut hidden synchronization (#923)

issue #912 makes it apparent that we need to notify users of this hidden feature
@timtay-microsoft timtay-microsoft added area-documentation fix checked in Fix checked into main or preview, but not yet released. and removed bug labels Sep 22, 2020
@timtay-microsoft
Copy link
Member

The javadocs have been fixed as of service client version 1.26.0, so I'm closing this issue

@az-iot-builder-01
Copy link
Collaborator

@PremSahooESL, @timtay-microsoft, @vishal-kumar-solum, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-documentation fix checked in Fix checked into main or preview, but not yet released. IoTSDK
Projects
None yet
Development

No branches or pull requests

4 participants