Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos DB - availability strategy fix with query #36786

Conversation

FabianMeiswinkel
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel commented Sep 15, 2023

Description

This PR adds option to do hedging via availability strategy not just for point operations but also query operations.

When adding test coverage for this feature a few issues with regards to diagnostics in queries have been identified and also fixed in this PR:

  • Cardinality of CosmosDiagnosticsContext / Trace spans etc.
  • Before this PR a single query operation in the SDK returning a CosmosPagedFlux was always captured in a single CosmosDiagnosticsContext instance / and a single tracing span. The guiding principle is that there should be one ComsosDiagnosticsContext and trace span per user operation in the SDK. For queries it is not super clear whether you should treat creating a query and full draining it as a single user interaction or whether processing each page returned from the CosmosPagedFlux should be treated as the user interaction. For single partition queries returning only few records it is the same - but especially for long-running queries or for queries across dozens/hundreds of partitions it makes a big difference. When having only one CosmosDiagnosticsCOntext per query it means metrics/logging etc. will only happen at the end-of the query's lifecylce - it seems preferrable to instead emit them per page to allow monitoring while the long running query is still processing as well. After some offline discussions we decided to switch to emitting CosmosDiagnosticsContext per page - and this is also included in this PR.
  • Empty Page handling: This PR also fixes a gap in diagnostics for queries returning empty pages (either because a cross partition query does not return results for some of the partitions or when a query does not return any results). The per request statistics for these empty pages was missing - and as a result they would not have been reflected in metrics/diagnostics etc. This gap is also addressed in this PR
  • SequenceNumber - this PR adds a new property CosmosDiagnosticsContext.getSequnceNumber which is only populated for feed operations - the combination of CorrelatedActivityId and SequenceNumber will be able to reconstruct the order in which operations/traces etc. have happened.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

jeet1995 and others added 30 commits August 9, 2023 09:01
* Updated read hang grace and cancellation count threshold

* Updated CHANGELOG.md

* Updated CHANGELOG.md

* Updated CHANGELOG.md
* Updated release versions to 4.48.1 for azure-cosmos

* Updated release notes
…abianm/AvailabilityStrategyAllRegionExceptionFix
…6479)

* Refactoring the AvailabilityStrategyNoSuchElementFix

* Reacting to code review comments
…to users/fabianm/AvailabilityStrategyAllRegionExceptionFix
…to users/fabianm/AvailabilityStrategyAllRegionExceptionFix
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@FabianMeiswinkel FabianMeiswinkel merged commit 59ec07c into Azure:main Sep 29, 2023
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants