New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] ServiceClient's send() method got hung for 15 mins before it throws exception (java.io.IOException: Encountered exception during amqp connection: proton:io with description Operation timed out) #912
Comments
Scenario: In one of our IotHub client application, Multiple threads calling ServiceClient's send() method. all of a sudden send() method did not respond any thing till about 15 mins. later as 15 mins passed, it threw exception (java.io.IOException: Encountered exception during amqp connection: proton:io with description Operation timed out) afterwards , it seemed all ok...it started sending messages to IotHub again. |
Do you have any logs from the service client during all of this? And what version of the IoT service SDK are you using? |
And are these all separate instances of service client being called in separate threads, or is it one instance being shared between multiple threads? |
we were using version 1.23.0 and only one instance is being shared between multiple threads (A singleton bean of ServiceClient is being created and shared among all the threads) com.microsoft.azure.sdk.iot |
And have you seen this behavior more than once? |
We saw this behavior one time recently. Sorry, during the issue occurred, SDK logs were disabled. Is there any specific reason, why ServiceClient does not have timeout configuration? |
Also even if one instance of ServiceClinet is being shared between multiple threads in our application. is this behavior expected by SDK? or do we need to surround send() method with 'synchronized' block in multiple threads env? Performance might be compromised in such case. |
I can't quite tell from the exception you provided what happened, but as best as I can tell, there was a socket connect or read timeout within our AMQP library. Right now, there is no way to configure that option in our AMQP library, so there is no way to extend that configuration to you. As for the general slowdown that you see more consistently, I can explain how the service client currently works and I should be able to help you optimize this a bit. Right now, our service client will allow you to call send on a single instance from multiple different threads at once, but it has synchronization logic that effectively limits each service client instance to sending a single message at a time. So my first piece of advice is to use multiple instances of service client rather than one. The send operation actually opens a connection, sends the messages, waits for the service to ack the message, and then tears down the connection per invocation of serviceClient.send(...), so there isn't a reason to keep only one instance of the service client around. This may be a bit confusing, and we consider it a design flaw in our current service client code, but there is no way to open the connection, and leave it open for multiple send operations. So the best advice right now is to just use more service client instances. I suspect that doing this will also stop you from hitting the exception you mentioned. |
Thanks for your suggestion...We will try to modify our code accordingly. I have one question as below. |
There is no SDK side limit to the number of service client instances per thread, no. You may hit timeouts or throttles from the service or hit port limits from your physical device, but that will depend on your machine and on the scale of your IoT Hub. Experimentation will help you find the right balance |
After Making changes in code with multiple instances of ServiceClient. Exception=java.io.IOException: Encountered exception during amqp connection: proton:io with description Operation timed out By the way May I know what is the default timeout value for the above exception? |
If you are using one service client instance per thread, and you are seeing the timeout within 1 to 2 minutes now, then that is the range of time to expect the timeout to take. My suspicion as to why you saw 15 minutes before was because some of your threads were stuck waiting in line to execute the synchronized part of the SDK code for ~13 minutes and then timed out during the actual amqp transaction. So it sounds like the only bug here is that the SDK synchronizes some blocks of code without notifying you in the javadocs. Is that a fair assessment? |
Thanks for your quick response. After performing a load test. Probably we can get some idea. We will keep updating here in case we observe any further abnormality in ServiceClient. |
…ut hidden synchronization issue #912 makes it apparent that we need to notify users of this hidden "feature"
Sounds good. In the meantime, I'll go ahead and update our javadocs to include this information about synchronization |
The javadocs have been fixed as of service client version 1.26.0, so I'm closing this issue |
@PremSahooESL, @timtay-microsoft, @vishal-kumar-solum, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey |
AB#8236725
The text was updated successfully, but these errors were encountered: