Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_delta_merge_match_delete_only java.lang.OutOfMemoryError: GC overhead limit exceeded #10530

Closed
gerashegalov opened this issue Mar 1, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Mar 1, 2024

test_delta_merge_match_delete_only[10-['a', 'b']-False-(range(0, 5), range(0, 5))][DATAGEN_SEED=1709252634, INJECT_OOM, IGNORE_ORDER, ALLOW_NON_GPU(DeserializeToObjectExec,ShuffleExchangeExec,FileSourceScanExec,FilterExec,MapPartitionsExec,MapElementsExec,ObjectHashAggregateExec,ProjectExec,SerializeFromObjectExec,SortExec)] 24/03/01 01:05:02 WARN CloudStoreSpecificConf: Unknown cloud store file

Originally posted by @gerashegalov in #10522 (comment)

java.lang.OutOfMemoryError: GC overhead limit exceeded
2024-03-01T01:09:47.187Z] 24/03/01 01:09:46 ERROR MergeIntoCommandEdge: Fatal error in MERGE with materialized source in attempt 1.
[2024-03-01T01:09:47.187Z] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4421.0 failed 1 times, most recent failure: Lost task 0
.0 in stage 4421.0 (TID 18072) (10.2.128.15 executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
[2024-03-01T01:09:47.187Z] 
[2024-03-01T01:09:47.187Z] Driver stacktrace:
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3628)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3559)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3546)
[2024-03-01T01:09:47.187Z]      at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[2024-03-01T01:09:47.187Z]      at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[2024-03-01T01:09:47.187Z]      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3546)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1521)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1521)
[2024-03-01T01:09:47.187Z]      at scala.Option.foreach(Option.scala:407)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1521)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3875)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3787)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3775)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1245)
[2024-03-01T01:09:47.187Z]      at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[2024-03-01T01:09:47.187Z]      at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1233)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2959)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.collect.Collector.$anonfun$runSparkJobs$1(Collector.scala:286)
[2024-03-01T01:09:47.187Z]      at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[2024-03-01T01:09:47.187Z]      at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:282)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.collect.Collector.$anonfun$collect$1(Collector.scala:366)
[2024-03-01T01:09:47.187Z]      at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:363)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:117)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:124)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:126)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:114)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:94)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$computeResult$1(ResultCacheManager.scala:557)
[2024-03-01T01:09:47.187Z]      at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.qrc.ResultCacheManager.collectResult$1(ResultCacheManager.scala:545)
[2024-03-01T01:09:47.187Z]      at org.apache.spark.sql.execution.qrc.ResultCacheManager.computeResult(ResultCacheManager.scala:565)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:426)
[2024-03-01T01:09:47.188Z]      at scala.Option.getOrElse(Option.scala:189)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:419)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:313)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeCollectResult$1(SparkPlan.scala:519)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:516)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3628)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4553)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:3595)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.Dataset.$anonfun$withAction$3(Dataset.scala:4544)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:959)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4542)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:283)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:511)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:210)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1138)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:153)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:460)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4542)
[2024-03-01T01:09:47.188Z]      at org.apache.spark.sql.Dataset.collect(Dataset.scala:3595)
2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.commands.merge.LowShuffleMergeExecutor.$anonfun$findTouchedFilesForLowShuffleMerge$1(LowShuffleMergeExecutor.scala:586)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.commands.MergeIntoCommandBase.$anonfun$recordMergeOperation$6(MergeIntoCommandBase.scala:411)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.util.ThreadLocalTagger.withTag(QueryTagger.scala:62)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.util.ThreadLocalTagger.withTag$(QueryTagger.scala:59)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.util.QueryTagger$.withTag(QueryTagger.scala:127)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.commands.MergeIntoCommandBase.executeThunk$1(MergeIntoCommandBase.scala:410)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.commands.MergeIntoCommandBase.$anonfun$recordMergeOperation$8(MergeIntoCommandBase.scala:429)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.NoopProgressReporter$.withStatusCode(ProgressReporter.scala:52)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.util.DeltaProgressReporterEdge.withStatusCode(DeltaProgressReporterEdge.scala:30)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.util.DeltaProgressReporterEdge.withStatusCode$(DeltaProgressReporterEdge.scala:25)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.commands.DeltaDMLCommandEdge.withStatusCode(DeltaDMLCommandEdge.scala:41)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.commands.MergeIntoCommandBase.$anonfun$recordMergeOperation$7(MergeIntoCommandBase.scala:429)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag(DeltaLogging.scala:196)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:183)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.commands.DeltaDMLCommandEdge.withOperationTypeTag(DeltaDMLCommandEdge.scala:41)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:160)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:265)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:263)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.commands.DeltaDMLCommandEdge.recordFrameProfile(DeltaDMLCommandEdge.scala:41)
[2024-03-01T01:09:47.188Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:159)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:571)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:666)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:684)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:426)
[2024-03-01T01:09:47.188Z]      at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:196)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:424)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:25)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:470)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:455)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:25)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:661)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:580)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:25)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:571)
[2024-03-01T01:09:47.188Z]      at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:540)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:25)
[2024-03-01T01:09:47.188Z]      at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
[2024-03-01T01:09:47.189Z]      at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
[2024-03-01T01:09:47.189Z]      at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
[2024-03-01T01:09:47.189Z]      at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
[2024-03-01T01:09:47.189Z]      at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
[2024-03-01T01:09:47.189Z]      at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
[2024-03-01T01:09:47.189Z]      at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
[2024-03-01T01:09:47.189Z]      at com.databricks.sql.transaction.tahoe.commands.DeltaDMLCommandEdge.recordOperation(DeltaDMLCommandEdge.scala:41)
[2024-03-01T01:09:47.189Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:158)
[2024-03-01T01:09:47.189Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:148)
[2024-03-01T01:09:47.189Z]      at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:138)
[2024-03-01T01:09:47.189Z]      at com.databricks.sql.transaction.tahoe.commands.DeltaDMLCommandEdge.recordDeltaOperation(DeltaDMLCommandEdge.scala:41)
[2024-03-01T01:09:47.189Z]      at com.databricks.sql.transaction.tahoe.commands.MergeIntoCommandBase.recordMergeOperation(MergeIntoCommandBase.scala:426)
[2024-03-01T01:09:47.189Z]      at com.databricks.sql.transaction.tahoe.commands.MergeIntoCommandBase.recordMergeOperation$(MergeIntoCommandBase.scala:387)
[2024-03-01T01:09:47.189Z]      at com.databricks.sql.transaction.tahoe.commands.MergeIntoWithDeltaDMLCommandEdge.recordMergeOperation(MergeIntoCommandEdge.scala:60)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants