You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if this should be a CosmosDB bug or a CosmosDB Connector but a splitting partition should not cause the Spark job to fail. It seems to me that CosmosDB should split partitions behind the scenes, temporarily provisioning more RUs if necessary to do this, and using that reserved capacity to do that.
Failing that, it seems like the connector should backoff and retry the read a bit later, since partitioning seems to actually occur pretty quickly.
17/08/27 16:57:18 WARN TaskSetManager: Lost task 7.0 in stage 5.0 (TID 135, 10.244.4.8, executor 5): java.lang.IllegalStateException: com.microsoft.azure.documentdb.DocumentClientException: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:86)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:33)
at com.microsoft.azure.documentdb.internal.query.ProxyQueryExecutionContext.next(ProxyQueryExecutionContext.java:151)
at com.microsoft.azure.documentdb.internal.query.ProxyQueryExecutionContext.next(ProxyQueryExecutionContext.java:9)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at com.microsoft.azure.cosmosdb.spark.rdd.CosmosDBRDDIterator.next(CosmosDBRDDIterator.scala:208)
at com.microsoft.azure.cosmosdb.spark.rdd.CosmosDBRDDIterator.next(CosmosDBRDDIterator.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:120)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:112)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Caused by: com.microsoft.azure.documentdb.DocumentClientException: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.directconnectivity.GoneAndRetryWithRetryPolicy.shouldRetry(GoneAndRetryWithRetryPolicy.java:67)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeStoreClientRequest(RetryUtility.java:122)
at com.microsoft.azure.documentdb.internal.directconnectivity.ServerStoreModel.processMessage(ServerStoreModel.java:89)
at com.microsoft.azure.documentdb.DocumentClient$10.apply(DocumentClient.java:3021)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeDocumentClientRequest(RetryUtility.java:58)
at com.microsoft.azure.documentdb.DocumentClient.doQuery(DocumentClient.java:3027)
at com.microsoft.azure.documentdb.DocumentQueryClientInternal.doQuery(DocumentQueryClientInternal.java:40)
at com.microsoft.azure.documentdb.internal.query.AbstractQueryExecutionContext.executeRequest(AbstractQueryExecutionContext.java:214)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.executeOnce(DefaultQueryExecutionContext.java:137)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.fillBuffer(DefaultQueryExecutionContext.java:101)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:84)
... 20 more
Caused by: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.ErrorUtils.maybeThrowException(ErrorUtils.java:69)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.processResponse(HttpTransportClient.java:151)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.invokeStore(HttpTransportClient.java:121)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.invokeStore(HttpTransportClient.java:128)
at com.microsoft.azure.documentdb.internal.directconnectivity.TransportClient.invokeResourceOperation(TransportClient.java:12)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readFromStore(StoreReader.java:316)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.createStoreReadResult(StoreReader.java:211)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readOneReplica(StoreReader.java:180)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readSession(StoreReader.java:87)
at com.microsoft.azure.documentdb.internal.directconnectivity.ConsistencyReader.readSession(ConsistencyReader.java:78)
at com.microsoft.azure.documentdb.internal.directconnectivity.ConsistencyReader.read(ConsistencyReader.java:53)
at com.microsoft.azure.documentdb.internal.directconnectivity.ReplicatedResourceClient.invoke(ReplicatedResourceClient.java:59)
at com.microsoft.azure.documentdb.internal.directconnectivity.ServerStoreModel$1.apply(ServerStoreModel.java:84)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeStoreClientRequest(RetryUtility.java:113)
... 29 more
17/08/27 16:57:18 INFO TaskSetManager: Starting task 7.1 in stage 5.0 (TID 174, 10.244.3.5, executor 3, partition 7, PROCESS_LOCAL, 4783 bytes)
The text was updated successfully, but these errors were encountered:
I'm not sure if this should be a CosmosDB bug or a CosmosDB Connector but a splitting partition should not cause the Spark job to fail. It seems to me that CosmosDB should split partitions behind the scenes, temporarily provisioning more RUs if necessary to do this, and using that reserved capacity to do that.
Failing that, it seems like the connector should backoff and retry the read a bit later, since partitioning seems to actually occur pretty quickly.
17/08/27 16:57:18 WARN TaskSetManager: Lost task 7.0 in stage 5.0 (TID 135, 10.244.4.8, executor 5): java.lang.IllegalStateException: com.microsoft.azure.documentdb.DocumentClientException: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:86)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:33)
at com.microsoft.azure.documentdb.internal.query.ProxyQueryExecutionContext.next(ProxyQueryExecutionContext.java:151)
at com.microsoft.azure.documentdb.internal.query.ProxyQueryExecutionContext.next(ProxyQueryExecutionContext.java:9)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at com.microsoft.azure.cosmosdb.spark.rdd.CosmosDBRDDIterator.next(CosmosDBRDDIterator.scala:208)
at com.microsoft.azure.cosmosdb.spark.rdd.CosmosDBRDDIterator.next(CosmosDBRDDIterator.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:120)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:112)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Caused by: com.microsoft.azure.documentdb.DocumentClientException: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.directconnectivity.GoneAndRetryWithRetryPolicy.shouldRetry(GoneAndRetryWithRetryPolicy.java:67)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeStoreClientRequest(RetryUtility.java:122)
at com.microsoft.azure.documentdb.internal.directconnectivity.ServerStoreModel.processMessage(ServerStoreModel.java:89)
at com.microsoft.azure.documentdb.DocumentClient$10.apply(DocumentClient.java:3021)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeDocumentClientRequest(RetryUtility.java:58)
at com.microsoft.azure.documentdb.DocumentClient.doQuery(DocumentClient.java:3027)
at com.microsoft.azure.documentdb.DocumentQueryClientInternal.doQuery(DocumentQueryClientInternal.java:40)
at com.microsoft.azure.documentdb.internal.query.AbstractQueryExecutionContext.executeRequest(AbstractQueryExecutionContext.java:214)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.executeOnce(DefaultQueryExecutionContext.java:137)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.fillBuffer(DefaultQueryExecutionContext.java:101)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:84)
... 20 more
Caused by: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.ErrorUtils.maybeThrowException(ErrorUtils.java:69)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.processResponse(HttpTransportClient.java:151)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.invokeStore(HttpTransportClient.java:121)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.invokeStore(HttpTransportClient.java:128)
at com.microsoft.azure.documentdb.internal.directconnectivity.TransportClient.invokeResourceOperation(TransportClient.java:12)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readFromStore(StoreReader.java:316)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.createStoreReadResult(StoreReader.java:211)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readOneReplica(StoreReader.java:180)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readSession(StoreReader.java:87)
at com.microsoft.azure.documentdb.internal.directconnectivity.ConsistencyReader.readSession(ConsistencyReader.java:78)
at com.microsoft.azure.documentdb.internal.directconnectivity.ConsistencyReader.read(ConsistencyReader.java:53)
at com.microsoft.azure.documentdb.internal.directconnectivity.ReplicatedResourceClient.invoke(ReplicatedResourceClient.java:59)
at com.microsoft.azure.documentdb.internal.directconnectivity.ServerStoreModel$1.apply(ServerStoreModel.java:84)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeStoreClientRequest(RetryUtility.java:113)
... 29 more
17/08/27 16:57:18 INFO TaskSetManager: Starting task 7.1 in stage 5.0 (TID 174, 10.244.3.5, executor 3, partition 7, PROCESS_LOCAL, 4783 bytes)
The text was updated successfully, but these errors were encountered: