Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark.cosmos.changeFeed.itemCountPerTriggerHint not honoured #483

Closed
roadruuner opened this issue Feb 13, 2023 · 1 comment
Closed

spark.cosmos.changeFeed.itemCountPerTriggerHint not honoured #483

roadruuner opened this issue Feb 13, 2023 · 1 comment

Comments

@roadruuner
Copy link

I am using Spark Structured Streaming with asuze-cosmos-spark_3-1_2-2 % "4.16.0"

I need to cap how much data I can read.
Tried setting the following:
spark.cosmos.changeFeed.itemCountPerTriggerHint = 10
spark.cosmos.read.maxItem = 10

spark.cosmos.changeFeed.startFrom = "Beginning"
spark.cosmos.changeFeed.mode = "Incremental"

Using
val changeFeedDF = spark.readStream
.schema(customSchema)
.format("cosmos.oltp.changeFeed")
.options(readConfig)
.load

@FabianMeiswinkel
Copy link
Member

Hi, itemCountPerTriggerHint will allow you to modify the max. memory/resource consumption per micro batch. it is only a hint because change feed in Cosmos DB will always include at least all documents of a single atomic transaction (all sharing the same LSN - log sequence number, because they were modified in the same atomic transaction). So, you will always get at least the documents for a single atomic transaction per physical partition. But from a memory footprint/resource consumption perspective that should be more than sufficient - because the number of document updates in a transaction is also capped (worst case a single bulk/batch might update around 1-5 thousand documents if they are really very small)

The right repository for the Cosmos Spark Connector for Spark 3.* is this repo. https://github.com/Azure/azure-sdk-for-java

Config details can be located here: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants