Optimized the partitioning strategy implementation details to avoid unnecessarily high RU usage #39438

FabianMeiswinkel · 2024-03-27T21:57:52Z

Description

This PR optimizes some of the partitioning strategy values to reduce chance that RU-consumption for queries or change feed requests is unnecessarily high when no EPK filtering is enabled for an account. Functionally using Spark partitions scoped to subsets of physical partitions is possible to increase parallelization but it comes with an overhead of increase RU usage - and in many cases that is unnecessary. So, this PR avoids this from happening in most cases (unless partition merge happened) for all partitioning strategies except for Aggressive and Custom.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

xinlian12

LGTM, thanks

sdk/cosmos/azure-cosmos-spark_3_2-12/Samples/Scala/NYC-Taxi-Data/01_Batch.scala

kushagraThapar

Changes look good, however, I am wondering if this is a breaking change? And should we explicitly call it out in the changelog instead of calling it out as optimized?

FabianMeiswinkel · 2024-04-02T21:40:47Z

/azp run java - cosmos - spark

azure-pipelines · 2024-04-02T21:40:56Z

Azure Pipelines successfully started running 1 pipeline(s).

… avoid unnecessarily high RU usage (Azure#39438)" This reverts commit ed43699.

Optimized the partitioning strategy implementation details.

c69ee8c

FabianMeiswinkel requested review from kushagraThapar, kirankumarkolli, xinlian12, milismsft, aayush3011, simorenoh, jeet1995 and Pilchie as code owners March 27, 2024 21:57

github-actions bot added the Cosmos label Mar 27, 2024

Added Changelog and fixed samples

d59acfd

xinlian12 approved these changes Mar 27, 2024

View reviewed changes

xinlian12 reviewed Mar 27, 2024

View reviewed changes

sdk/cosmos/azure-cosmos-spark_3_2-12/Samples/Scala/NYC-Taxi-Data/01_Batch.scala Show resolved Hide resolved

Update CosmosPartitionPlanner.scala

464b037

kushagraThapar approved these changes Apr 1, 2024

View reviewed changes

Update CosmosPartitionPlannerITest.scala

27f0452

FabianMeiswinkel merged commit ed43699 into Azure:main Apr 2, 2024
35 checks passed

xinlian12 pushed a commit to xinlian12/azure-sdk-for-java that referenced this pull request Apr 10, 2024

Revert "Optimized the partitioning strategy implementation details to…

f958a77

… avoid unnecessarily high RU usage (Azure#39438)" This reverts commit ed43699.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized the partitioning strategy implementation details to avoid unnecessarily high RU usage #39438

Optimized the partitioning strategy implementation details to avoid unnecessarily high RU usage #39438

FabianMeiswinkel commented Mar 27, 2024

xinlian12 left a comment

kushagraThapar left a comment

FabianMeiswinkel commented Apr 2, 2024

azure-pipelines bot commented Apr 2, 2024

Optimized the partitioning strategy implementation details to avoid unnecessarily high RU usage #39438

Optimized the partitioning strategy implementation details to avoid unnecessarily high RU usage #39438

Conversation

FabianMeiswinkel commented Mar 27, 2024

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

xinlian12 left a comment

Choose a reason for hiding this comment

kushagraThapar left a comment

Choose a reason for hiding this comment

FabianMeiswinkel commented Apr 2, 2024

azure-pipelines bot commented Apr 2, 2024