Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cosmos hangs forever with CosmosEndToEndOperationLatencyPolicyConfig set #40786

Open
3 tasks done
lnist opened this issue Jun 24, 2024 · 2 comments
Open
3 tasks done
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team.

Comments

@lnist
Copy link

lnist commented Jun 24, 2024

Describe the bug
Certain operations cause the Cosmos SDK to hang forever and certain operations do not respect the timeout set by CosmosEndToEndOperationLatencyPolicyConfig.

It seems the hangs occur for operations that span partitions.

To Reproduce
See this example repository and test: https://github.com/lnist/cosmos-sdk-hang/blob/main/src/test/java/cosmosTimeouts.java

In the test you need to fill in the connection string and master key for cosmos.

The test utilizes WireMock to simulate a delay in accessing the cosmos backend. For this a self-signed certificate is used, since the Cosmos SDK insists on using HTTPS.

If you execute the tests then they are all expected to fail due to timeout from the Cosmos SDK. That does not happen.

The readAllContainers and properties tests both return the desired data, but it takes longer than the configured timeout of 1 second. They should fail instead.

The readNonDefaultPartitionKey, count, readAll, and writeBulk all respect the timeout of 1 second if the DELAY parameter is set to 2_000, but they hang forever (until the test timeout of 1 minutes) if the DELAY parameter is set to 10_000.

Note: The code includes a couple of configurations that I think are redundant, but they were used during extensive testing, so I did not want to change them. A quick test without them seems to indicate the issues are present with default parameters (except of course for the CosmosEndToEndOperationLatencyPolicyConfig)

Code Snippet
Add the code snippet that causes the issue.

Expected behavior
The API uses the configured timeout.

Setup (please complete the following information):

  • OS: Windows 11
  • IDE: IntelliJ
  • Library/Libraries: com.azure:azure-cosmos:4.61.1
  • Java version: 21
  • App Server/Environment: jupiter test runner
  • Frameworks: N/A

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. labels Jun 24, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @kushagraThapar @pjohari-ms @TheovanKraay.

@kushagraThapar
Copy link
Member

@tvaron3 please take a look at this, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

3 participants