-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client blocks indefinitely on high volume #1185
Comments
Hi @steamstreetjon, thank you for the bug report. Without more information it's a little difficult to know how to proceed. Could you please provide a bit more detail to help us diagnose the issue?
|
I'll answer a few of this, but I'd like some help in understanding the best way to give you what you need. The 500k is a single lambda that is provisioned as normal. There are certainly spikes in activity, but the lambdas are pretty standard, triggered by EventBridge events. The initial tell was that the Lambda was timing out. As I dug in further I could see that it was happening at a specific point. I see the log for the request, but then nothing until the Lambda times out. Since it's event bridge, the event is retried shortly after and the exact same details work fine. Here's a screenshot of the logs. I can share more logging details, just didn't want to be too public about it, but you can see it gets to the GetItem POST and then hangs until the lambda is complete. For reference, when it runs the second time a minute later, the 'healthy' version gets a response and carries on: I tried to wrap the code in a I'd really like to help track this down with you, so let me know if you'd like to do some debugging or if there is some logging to enable to help you understand it better. Also, here's the configuration of the client right now (I added the timeouts to see if I could get the client to fail):
|
Thanks for the additional info. It's particularly curious to me that Unfortunately, there's still not enough info present to assist in root causing. We have various tests of the SDK in our release pipeline, including some which test high concurrency and high throughput, which have not exhibited this issue before. To dive deeper on this, would you be willing to provide repro code and full trace-level logs for a succeeding and failing request? If you have an AWS Support plan, you can submit a support case through the AWS Console with your logs/code which will make it available for us internally. If not, you can attach or paste your logs/code in this issue after redacting sensitive info like access keys, internal endpoints, etc. |
It looks like this issue has not been active for more than 5 days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it. |
Describe the bug
Calls with the client (in this case to Dynamo), periodically block indefinitely.
Expected behavior
Calls shouldn't block and should return in a reasonable amount of time.
Current behavior
In my lambda that is invoked about 500,000/hr, there are periodic failures that occur when the SDK blocks indefinitely. This happens in only 5-10 invocations an hour, but that's enough that it makes the SDK unusable.
Steps to Reproduce
This is difficult to reproduce, as it only happens in high volume situations and very infrequently. But with no fix available, the failure rate is still too high to use the SDK. The problem occurs when using either the OKHttp client or the Crt client.
Possible Solution
There is no workaround. I have been unable to diagnose the problem, as the blocking occurs deep in the SDK.
Context
I can't use the Kotlin AWS SDK and would advise others to avoid it as well for any high volume application in Lambda.
AWS Kotlin SDK version used
1.0.37
Platform (JVM/JS/Native)
JVM
Operating System and version
Lambda Java 17
The text was updated successfully, but these errors were encountered: