spark.cosmos.changeFeed.itemCountPerTriggerHint not honoured #483

roadruuner · 2023-02-13T16:59:33Z

I am using Spark Structured Streaming with asuze-cosmos-spark_3-1_2-2 % "4.16.0"

I need to cap how much data I can read.
Tried setting the following:
spark.cosmos.changeFeed.itemCountPerTriggerHint = 10
spark.cosmos.read.maxItem = 10

spark.cosmos.changeFeed.startFrom = "Beginning"
spark.cosmos.changeFeed.mode = "Incremental"

Using
val changeFeedDF = spark.readStream
.schema(customSchema)
.format("cosmos.oltp.changeFeed")
.options(readConfig)
.load

FabianMeiswinkel · 2023-02-13T17:10:51Z

Hi, itemCountPerTriggerHint will allow you to modify the max. memory/resource consumption per micro batch. it is only a hint because change feed in Cosmos DB will always include at least all documents of a single atomic transaction (all sharing the same LSN - log sequence number, because they were modified in the same atomic transaction). So, you will always get at least the documents for a single atomic transaction per physical partition. But from a memory footprint/resource consumption perspective that should be more than sufficient - because the number of document updates in a transaction is also capped (worst case a single bulk/batch might update around 1-5 thousand documents if they are really very small)

The right repository for the Cosmos Spark Connector for Spark 3.* is this repo. https://github.com/Azure/azure-sdk-for-java

Config details can be located here: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md

FabianMeiswinkel closed this as completed Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark.cosmos.changeFeed.itemCountPerTriggerHint not honoured #483

spark.cosmos.changeFeed.itemCountPerTriggerHint not honoured #483

roadruuner commented Feb 13, 2023

FabianMeiswinkel commented Feb 13, 2023

spark.cosmos.changeFeed.itemCountPerTriggerHint not honoured #483

spark.cosmos.changeFeed.itemCountPerTriggerHint not honoured #483

Comments

roadruuner commented Feb 13, 2023

FabianMeiswinkel commented Feb 13, 2023