[Cosmos] Full Fidelity Change Feed changes for pull model #30161

simorenoh · 2022-07-27T18:42:31Z

This PR contains the changes for using the pull model full-fidelity change feed, including tests and spark changes.

Porting from this last opened PR: #29799

Reference .Net SDK PR - Azure/azure-cosmos-dotnet-v3#3197

azure-sdk · 2022-07-27T18:52:08Z

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-cosmos

FabianMeiswinkel · 2022-07-28T09:23:29Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

+    private def getAttributeNode(objectNode: ObjectNode, attributeName: String): JsonNode = {
+      objectNode.get(attributeName) match {
+        case jsonNode: JsonNode => jsonNode
+        case _ => null


Don't use null in Scala - it is considered a No-Go (Option[T]) is used instead.

I don't actually think you need this helper method. objectNode.get("name") is all this one is doing (when using Option instead of null)

But objectNode.get just returns a JsonNode - could be anything from Object to Array - that is why before we had separate helper methods also validating the expected return type.

You can switch to moving objectNode.get instead but need to add validation that the returned JsonNode is of the expected type

got it, will do.

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

FabianMeiswinkel · 2022-07-28T09:25:59Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

+    def getChangeFeedLsn(objectNode: ObjectNode): String = {
+        getAttributeNode(objectNode, MetadataJsonBodyAttributeName) match {
+            case metadataNode: JsonNode =>
+                metadataNode.get(MetadataLsnAttributeName) match {


I think this is identical to getAttributeAsString ?

Actually the only difference between them is getAttributeAsString returns objectNode.toString() whereas this returns objectNode.asText(), which I think are different, because toString() is called on object node type, whereas asText() is called on value node type.

FabianMeiswinkel · 2022-07-28T09:26:44Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

@@ -113,6 +108,13 @@ private[cosmos] class CosmosRowConverter(
        new GenericRowWithSchema(values.toArray, schema)
    }

+    def fromObjectNodeToRowV1(schema: StructType,


I assume V1 is for ChangeFeed V1 - please add because in two weeks no-one can remember what V1 and V2 stand for otherwise

like fromObjectNodeToChangeFeedRowV1

ack, will do.

FabianMeiswinkel · 2022-07-28T09:29:07Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

+        var currentNode = getAttributeNode(objectNode, CurrentAttributeName)
+        if (currentNode == null || currentNode.isEmpty) {
+            currentNode = getAttributeNode(objectNode, PreviousRawJsonBodyAttributeName)
+        }


factor out finding the "right" payload node into separate function to avoid code duplication

ack, will do.

FabianMeiswinkel · 2022-07-28T09:30:07Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

@@ -735,6 +796,34 @@ private[cosmos] class CosmosRowConverter(
                Option(objectNode.get(name)).map(convertToSparkDataType(dataType, _, schemaConversionMode)).orNull
        }

+    private def convertStructToSparkDataTypeV1(schema: StructType,


Same comment regadring V1 vs. V2 with prefix as above

ack, will do.

FabianMeiswinkel · 2022-07-28T09:35:21Z

...zure-cosmos-spark_3_2-12/src/test/scala/com/azure/cosmos/spark/SparkE2EChangeFeedITest.scala

+    println("Input : ", inputArrayBuffer.mkString(","))
+    println("Output : ", outputArray.mkString(","))
+    if (inputArrayBuffer.length != outputArray.length) {
+      return false


Long story short - never use return in scala - this is one of the things in scala that confused the hack out of me initiially. return has different semantic based on from where the function is called - if you google for retrun and scala you'll find the details - long-story short - never use return, just amke sure the last line returns the value.

here to simplify I would simply rename the method to validateArraysUnordered and replcae the checks where you return false with an assert (the xxx shouldEqual yyy used in otherplaces)

ack, will do.

FabianMeiswinkel · 2022-07-28T09:36:41Z

sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/changefeed/Lease.java

@@ -42,7 +42,11 @@ public interface Lease {
     */
    String getTimestamp();

-    ChangeFeedState getContinuationState(
+    ChangeFeedState getIncrementalContinuationState(


please rename getContinuationStateWireFormatV0 vs. getContinuationStateWireFormatV1 or similar?

This one actually has nothing to do with change feed wire format. One is for getting the continuation state for incremental mode, and one for full fidelity mode.

I guess once I make the lease changes as well, then there will be a good separation of these continuation states based on different lease structure. I might need to create a new lease altogether, will update it then if that's okay with you.

FabianMeiswinkel · 2022-07-28T09:41:18Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

@@ -19,8 +16,9 @@ import org.apache.spark.sql.catalyst.expressions.{GenericRowWithSchema, UnsafeMa
 import org.apache.spark.sql.catalyst.util.ArrayData

 import java.io.IOException
-import java.time.{Instant, LocalDate, OffsetDateTime, ZoneOffset}
+import java.sql.{Date, Timestamp}


Please add corresponding Test coverage in CosmosRowConverterITest and CosmosRowConverterSpec - these were created by Matias - and are one of the best set of tests we have in all of the spark connector. Having extensive coverage of teh RowConverter functionality there has proven very useful

ack, will do.

FabianMeiswinkel

Overall looks good - tests missing in CosmosRowConverterITest/CosmosRowConverterSpec and two "scala coding violations" ( and ) summarizes my requested changes.

…ange feed new wire format

FabianMeiswinkel

LGTM now - thanks!

xinlian12 · 2022-07-28T19:37:54Z

...re-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/ChangeFeedPartitionReader.scala


-  private val changeFeedRequestOptions = {
+    private val changeFeedRequestOptions = {


nit: remove the extra space

...cosmos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/ChangeFeedTable.scala

xinlian12 · 2022-07-28T19:57:09Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

+      objectNode.get(MetadataJsonBodyAttributeName) match {
+        case metadataNode: JsonNode =>
+          metadataNode.get(TimeToLiveExpiredPropertyName) match {
+        case valueNode: JsonNode =>


nit: indent

xinlian12 · 2022-07-28T20:04:54Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

+      objectNode.get(MetadataJsonBodyAttributeName) match {
+        case metadataNode: JsonNode =>
+          metadataNode.get(OperationTypeAttributeName) match {
+          case valueNode: JsonNode =>


nit: indent

xinlian12 · 2022-07-28T20:05:45Z

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala

+    //  For multi-master, crts will be the latest resolution timestamp of any conflicts
+    private def parseTimestamp(objectNode: ObjectNode): Long = {
+        objectNode.get(MetadataJsonBodyAttributeName) match {
+            case metadataNode: JsonNode =>


what if objectNode does not have MetadataJsonBodyAttribute

it will always have the metadata json body attribute name.
private[spark] val defaultFullFidelityChangeFeedSchemaForInferenceDisabled = StructType(Seq(
StructField(RawJsonBodyAttributeName, StringType, nullable=true),
StructField(IdAttributeName, StringType, nullable=false),
StructField(TimestampAttributeName, LongType, nullable=false),
StructField(ETagAttributeName, StringType, nullable=false),
StructField(LsnAttributeName, LongType, nullable=false),
StructField(MetadataJsonBodyAttributeName, StringType, nullable=false),
StructField(PreviousRawJsonBodyAttributeName, StringType, nullable=true),
StructField(OperationTypeAttributeName, StringType, nullable=false),
StructField(CrtsAttributeName, LongType, nullable=false),
StructField(PreviousImageLsnAttributeName, LongType, nullable=true)
))

It is not nullable.

xinlian12 · 2022-07-28T20:41:12Z

...m/azure/cosmos/implementation/changefeed/common/ChangeFeedStartFromETagAndFeedRangeImpl.java

+            //  If there is no continuation token, we start from now (which is by default).
+            //  On REST level, change feed is using IfNoneMatch/ETag instead of continuation.
+            request.getHeaders().put(HttpConstants.HttpHeaders.IF_NONE_MATCH,
+                HttpConstants.HeaderValues.IF_NONE_MATCH_ALL);


why we do not need to populate here for Incremental mode?

For incremental, this header is not mandatory, and if not present it is default by beginning. For full fidelity, this header is mandatory.

xinlian12

LGTM, thanks

kushagraThapar · 2022-07-28T22:25:24Z

/azp run java - cosmos - tests

azure-pipelines · 2022-07-28T22:25:37Z

Azure Pipelines successfully started running 1 pipeline(s).

changes for FFCF pull model

b07a55e

simorenoh requested review from kushagraThapar, FabianMeiswinkel, kirankumarkolli, mbhaskar, simplynaveen20, xinlian12, milismsft and aayush3011 as code owners July 27, 2022 18:42

ghost added the Cosmos label Jul 27, 2022

simorenoh and others added 5 commits July 27, 2022 17:14

Update FullFidelityChangeFeedTest.java

b530f2b

spark changes

32becf2

Update CosmosRowConverter.scala

f404260

Update SparkE2EChangeFeedITest.scala

5a606c8

Setting change feed policy on container for full fidelity in testing

5fcf7c9

FabianMeiswinkel reviewed Jul 28, 2022

View reviewed changes

...mos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/CosmosRowConverter.scala Show resolved Hide resolved

FabianMeiswinkel reviewed Jul 28, 2022

View reviewed changes

FabianMeiswinkel requested changes Jul 28, 2022

View reviewed changes

simorenoh and others added 2 commits July 28, 2022 11:32

disable previous test until emulator in pipeline has relevant changes

cdc7b5b

Spark code review comments + additional testing for spark datatype ch…

8db13af

…ange feed new wire format

FabianMeiswinkel approved these changes Jul 28, 2022

View reviewed changes

xinlian12 reviewed Jul 28, 2022

View reviewed changes

...cosmos/azure-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/ChangeFeedTable.scala Show resolved Hide resolved

xinlian12 reviewed Jul 28, 2022

View reviewed changes

xinlian12 approved these changes Jul 28, 2022

View reviewed changes

kushagraThapar merged commit 2fb1ac6 into Azure:main Jul 29, 2022

simorenoh deleted the pull-model-ffcf branch December 21, 2022 14:53

simorenoh mentioned this pull request Dec 21, 2022

[Cosmos] update changelogs to reflect missed changes and new issue introduced #32678

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cosmos] Full Fidelity Change Feed changes for pull model #30161

[Cosmos] Full Fidelity Change Feed changes for pull model #30161

simorenoh commented Jul 27, 2022 •

edited by kushagraThapar

Loading

azure-sdk commented Jul 27, 2022

FabianMeiswinkel Jul 28, 2022

kushagraThapar Jul 28, 2022

FabianMeiswinkel Jul 28, 2022

kushagraThapar Jul 28, 2022 •

edited

Loading

FabianMeiswinkel Jul 28, 2022

kushagraThapar Jul 28, 2022

FabianMeiswinkel Jul 28, 2022

kushagraThapar Jul 28, 2022

FabianMeiswinkel Jul 28, 2022

kushagraThapar Jul 28, 2022

FabianMeiswinkel Jul 28, 2022

kushagraThapar Jul 28, 2022

FabianMeiswinkel Jul 28, 2022

kushagraThapar Jul 28, 2022

FabianMeiswinkel Jul 28, 2022

kushagraThapar Jul 28, 2022

FabianMeiswinkel left a comment

FabianMeiswinkel left a comment

xinlian12 Jul 28, 2022

xinlian12 Jul 28, 2022

xinlian12 Jul 28, 2022

xinlian12 Jul 28, 2022

kushagraThapar Jul 28, 2022

xinlian12 Jul 28, 2022

kushagraThapar Jul 29, 2022

xinlian12 left a comment

kushagraThapar commented Jul 28, 2022

azure-pipelines bot commented Jul 28, 2022


		private val changeFeedRequestOptions = {
		private val changeFeedRequestOptions = {

[Cosmos] Full Fidelity Change Feed changes for pull model #30161

[Cosmos] Full Fidelity Change Feed changes for pull model #30161

Conversation

simorenoh commented Jul 27, 2022 • edited by kushagraThapar Loading

azure-sdk commented Jul 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kushagraThapar Jul 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinlian12 left a comment

Choose a reason for hiding this comment

kushagraThapar commented Jul 28, 2022

azure-pipelines bot commented Jul 28, 2022

simorenoh commented Jul 27, 2022 •

edited by kushagraThapar

Loading

kushagraThapar Jul 28, 2022 •

edited

Loading