Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition split error should not fail task #116

Closed
timfpark opened this issue Aug 27, 2017 · 2 comments
Closed

Partition split error should not fail task #116

timfpark opened this issue Aug 27, 2017 · 2 comments
Labels
Projects
Milestone

Comments

@timfpark
Copy link
Member

I'm not sure if this should be a CosmosDB bug or a CosmosDB Connector but a splitting partition should not cause the Spark job to fail. It seems to me that CosmosDB should split partitions behind the scenes, temporarily provisioning more RUs if necessary to do this, and using that reserved capacity to do that.

Failing that, it seems like the connector should backoff and retry the read a bit later, since partitioning seems to actually occur pretty quickly.

17/08/27 16:57:18 WARN TaskSetManager: Lost task 7.0 in stage 5.0 (TID 135, 10.244.4.8, executor 5): java.lang.IllegalStateException: com.microsoft.azure.documentdb.DocumentClientException: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:86)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:33)
at com.microsoft.azure.documentdb.internal.query.ProxyQueryExecutionContext.next(ProxyQueryExecutionContext.java:151)
at com.microsoft.azure.documentdb.internal.query.ProxyQueryExecutionContext.next(ProxyQueryExecutionContext.java:9)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at com.microsoft.azure.cosmosdb.spark.rdd.CosmosDBRDDIterator.next(CosmosDBRDDIterator.scala:208)
at com.microsoft.azure.cosmosdb.spark.rdd.CosmosDBRDDIterator.next(CosmosDBRDDIterator.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:120)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:112)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Caused by: com.microsoft.azure.documentdb.DocumentClientException: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.directconnectivity.GoneAndRetryWithRetryPolicy.shouldRetry(GoneAndRetryWithRetryPolicy.java:67)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeStoreClientRequest(RetryUtility.java:122)
at com.microsoft.azure.documentdb.internal.directconnectivity.ServerStoreModel.processMessage(ServerStoreModel.java:89)
at com.microsoft.azure.documentdb.DocumentClient$10.apply(DocumentClient.java:3021)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeDocumentClientRequest(RetryUtility.java:58)
at com.microsoft.azure.documentdb.DocumentClient.doQuery(DocumentClient.java:3027)
at com.microsoft.azure.documentdb.DocumentQueryClientInternal.doQuery(DocumentQueryClientInternal.java:40)
at com.microsoft.azure.documentdb.internal.query.AbstractQueryExecutionContext.executeRequest(AbstractQueryExecutionContext.java:214)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.executeOnce(DefaultQueryExecutionContext.java:137)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.fillBuffer(DefaultQueryExecutionContext.java:101)
at com.microsoft.azure.documentdb.internal.query.DefaultQueryExecutionContext.next(DefaultQueryExecutionContext.java:84)
... 20 more
Caused by: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Partition key range is finishing split and is not available for reads/writes."]}
ActivityId: 5cbaf1b2-032c-4084-9bb4-ea7696f248df, Request URI: /apps/1240113d-9858-49b9-90cb-1219f9e1df77/services/081d9580-bcfd-4fd0-a8f6-8e4c66ae901c/partitions/47c830c4-5c3f-441c-a01c-0532c2b32a82/replicas/131481981197382710p//, StatusCode: Gone
at com.microsoft.azure.documentdb.internal.ErrorUtils.maybeThrowException(ErrorUtils.java:69)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.processResponse(HttpTransportClient.java:151)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.invokeStore(HttpTransportClient.java:121)
at com.microsoft.azure.documentdb.internal.directconnectivity.HttpTransportClient.invokeStore(HttpTransportClient.java:128)
at com.microsoft.azure.documentdb.internal.directconnectivity.TransportClient.invokeResourceOperation(TransportClient.java:12)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readFromStore(StoreReader.java:316)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.createStoreReadResult(StoreReader.java:211)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readOneReplica(StoreReader.java:180)
at com.microsoft.azure.documentdb.internal.directconnectivity.StoreReader.readSession(StoreReader.java:87)
at com.microsoft.azure.documentdb.internal.directconnectivity.ConsistencyReader.readSession(ConsistencyReader.java:78)
at com.microsoft.azure.documentdb.internal.directconnectivity.ConsistencyReader.read(ConsistencyReader.java:53)
at com.microsoft.azure.documentdb.internal.directconnectivity.ReplicatedResourceClient.invoke(ReplicatedResourceClient.java:59)
at com.microsoft.azure.documentdb.internal.directconnectivity.ServerStoreModel$1.apply(ServerStoreModel.java:84)
at com.microsoft.azure.documentdb.internal.RetryUtility.executeStoreClientRequest(RetryUtility.java:113)
... 29 more

17/08/27 16:57:18 INFO TaskSetManager: Starting task 7.1 in stage 5.0 (TID 174, 10.244.3.5, executor 3, partition 7, PROCESS_LOCAL, 4783 bytes)

@khdang khdang added the bug label Aug 29, 2017
@khdang
Copy link
Member

khdang commented Aug 29, 2017

This is the same as #57.

@khdang khdang added this to To Do in Ganymede Aug 30, 2017
@khdang khdang added this to the 0.0.5-alpha milestone Aug 30, 2017
@khdang khdang self-assigned this Aug 30, 2017
@khdang khdang moved this from To Do to In Progress (v0.0.5-alpha) in Ganymede Aug 30, 2017
@khdang khdang moved this from In Progress to To Do in Ganymede Jan 3, 2018
@nomiero
Copy link
Contributor

nomiero commented Feb 5, 2019

Fixed

@nomiero nomiero closed this as completed Feb 5, 2019
@khdang khdang removed their assignment Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

3 participants