Skip to content

[VL][UNIFFLE] DRA does not work in gluten with uniffle #7559

@wForget

Description

@wForget

Backend

VL (Velox)

Bug description

DRA does not work in gluten with uniffle.

spark confs:

spark.shuffle.manager=org.apache.spark.shuffle.gluten.uniffle.UniffleShuffleManager;
spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.RssShuffleDataIo;
spark.dynamicAllocation.shuffleTracking.enabled=false;
spark.dynamicAllocation.enabled=true;

error:

24/10/16 15:47:47 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Dynamic allocation of executors requires one of the following conditions: 1) enabling external shuffle service through spark.shuffle.service.enabled. 2) enabling shuffle tracking through spark.dynamicAllocation.shuffleTracking.enabled. 3) enabling shuffle blocks decommission through spark.decommission.enabled and spark.storage.decommission.shuffleBlocks.enabled. 4) (Experimental) configuring spark.shuffle.sort.io.plugin.class to use a custom ShuffleDataIO who's ShuffleDriverComponents supports reliable storage.
	at org.apache.spark.ExecutorAllocationManager.validateSettings(ExecutorAllocationManager.scala:221)
	at org.apache.spark.ExecutorAllocationManager.<init>(ExecutorAllocationManager.scala:136)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:660)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888)
	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1093)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine$.createSpark(SparkSQLEngine.scala:303)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine$.main(SparkSQLEngine.scala:377)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine.main(SparkSQLEngine.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:738)

We should set RSS_ENABLED to true in the UniffleShuffleManager, because uniffle uses RSS_ENABLED conf and shuffle manager class to determine whether it supports reliableStorage:

https://github.com/apache/incubator-uniffle/blob/a36261296b05d72e4a774d9c9555cc12b922be97/client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleDriverComponents.java#L37-L42

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions