Conversation
…to users/fabianm/readManyByPK
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new readManyByPartitionKey API surface to the Java Cosmos SDK (sync + async) and wires it through the Spark connector to support PK-only reads (including partial HPK), with query-plan-based validation for custom queries.
Changes:
- Added public
readManyByPartitionKeyoverloads inCosmosAsyncContainer/CosmosContainerand an internalAsyncDocumentClient+RxDocumentClientImplimplementation that groups PKs by physical partition and issues per-range queries. - Introduced
ReadManyByPartitionKeyQueryHelperto compose PK filters into user-provided SQL and added a new config knob for per-partition batching. - Added Spark support (UDF + PK serialization/parsing helper + reader) and unit/integration tests for query composition and end-to-end behavior.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/docs/readManyByPartitionKey-design.md | Design doc describing the new API, query validation, and Spark integration approach. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentQueryExecutionContextFactory.java | Adds a helper method to fetch query plans through the gateway for validation. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java | Implements readManyByPartitionKey execution, validation, PK→range grouping, batching, and concurrency. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ReadManyByPartitionKeyQueryHelper.java | New helper to build SqlQuerySpec by appending PK filters and extracting table aliases. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Configs.java | Adds config/env accessors for max PKs per per-partition query batch. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/AsyncDocumentClient.java | Adds internal interface method for PK-only read-many. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosContainer.java | Adds sync readManyByPartitionKey overloads. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncContainer.java | Adds async readManyByPartitionKey overloads and wiring to internal client. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/ReadManyByPartitionKeyQueryHelperTest.java | Unit tests for SQL generation, alias extraction, and WHERE detection. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/ReadManyByPartitionKeyTest.java | Emulator integration tests for single PK + HPK, partial HPK, projections, and query validation. |
| sdk/cosmos/azure-cosmos-spark_3/src/test/scala/com/azure/cosmos/spark/ItemsPartitionReaderWithReadManyByPartitionKeyITest.scala | Spark integration test for reading by PKs and empty result behavior. |
| sdk/cosmos/azure-cosmos-spark_3/src/test/scala/com/azure/cosmos/spark/CosmosPartitionKeyHelperSpec.scala | Unit tests for PK string serialization/parsing helpers. |
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/udf/GetCosmosPartitionKeyValue.scala | Spark UDF to compute _partitionKeyIdentity values. |
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/ItemsPartitionReaderWithReadManyByPartitionKey.scala | Spark partition reader that calls new SDK API and converts results to rows. |
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/CosmosReadManyByPartitionKeyReader.scala | Spark reader that maps input rows to PKs and streams results via the partition reader. |
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/CosmosPartitionKeyHelper.scala | Helper for PK serialization/parsing used by the UDF and data source. |
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/CosmosItemsDataSource.scala | Adds Spark entry point to read-many by partition key, including PK extraction logic. |
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/CosmosConstants.scala | Adds _partitionKeyIdentity constant. |
Comments suppressed due to low confidence (1)
sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/ItemsPartitionReaderWithReadManyByPartitionKey.scala:1
- The error message has mismatched parentheses/quoting (
classOf<SparkRowItem])) which makes it harder to read and search for. Suggest correcting it to a clean, unambiguous string (e.g.,classOf[SparkRowItem]) to improve diagnosability.
// Copyright (c) Microsoft Corporation. All rights reserved.
…s/spark/ItemsPartitionReaderWithReadManyByPartitionKey.scala Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…s/spark/ItemsPartitionReaderWithReadManyByPartitionKey.scala Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ntation/RxDocumentClientImpl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ntation/ReadManyByPartitionKeyQueryHelper.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…to users/fabianm/readManyByPK
|
@sdkReviewAgent |
|
✅ Review complete (41:32) Posted 6 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
|
@sdkReviewAgent |
|
@sdkReviewAgent |
|
✅ Review complete (34:12) Posted 5 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
…/azure-sdk-for-java into users/fabianm/readManyByPK
|
@sdkReviewAgent |
…/azure-sdk-for-java into users/fabianm/readManyByPK
…/azure-sdk-for-java into users/fabianm/readManyByPK
Description
Adds a new readManyByPartitionKey API surface to the Java Cosmos SDK (sync + async) and wires it through the Spark connector to support PK-only reads (including partial HPK), with query-plan-based validation for custom queries.
Changes:
Added public readManyByPartitionKey overloads in CosmosAsyncContainer / CosmosContainer and an internal AsyncDocumentClient + RxDocumentClientImpl implementation that groups PKs by physical partition and issues per-range queries.
Introduced ReadManyByPartitionKeyQueryHelper to compose PK filters into user-provided SQL and added a new config knob for per-partition batching.
Added Spark support (UDF + PK serialization/parsing helper + reader) and unit/integration tests for query composition and end-to-end behavior.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines