Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4160][CORE] Fix ClassCastException occurred when fallback broadcast join and broadcast exchange #4166

Merged
merged 1 commit into from
Dec 27, 2023

Conversation

fyp711
Copy link
Contributor

@fyp711 fyp711 commented Dec 22, 2023

What changes were proposed in this pull request?

An exception occurred when I disabled broadcasthashjoin,I expect it to work fine after I fallback broadcasthashjoin and broadcastExchange. here's how to make it happen, run tpcds q31 with the following configuration

--conf spark.gluten.sql.columnar.broadcastJoin=false
--conf spark.gluten.sql.columnar.broadcastExchange=false

Caused by: java.lang.ClassCastException: org.apache.spark.sql.execution.joins.LongHashedRelation cannot be cast to org.apache.spark.sql.execution.joins.BuildSideRelation
        at org.apache.spark.sql.execution.ColumnarSubqueryBroadcastExec.$anonfun$relationFuture$3(ColumnarSubqueryBroadcastExec.scala:89)
        at io.glutenproject.utils.Arm$.withResource(Arm.scala:25)
        at io.glutenproject.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
        at org.apache.spark.sql.execution.ColumnarSubqueryBroadcastExec.$anonfun$relationFuture$2(ColumnarSubqueryBroadcastExec.scala:77)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withExecutionId$1(SQLExecution.scala:145)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
        at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:143)
        at org.apache.spark.sql.execution.ColumnarSubqueryBroadcastExec.$anonfun$relationFuture$1(ColumnarSubqueryBroadcastExec.scala:75)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
        at scala.util.Success.$anonfun$map$1(Try.scala:255)
        at scala.util.Success.map(Try.scala:213)
        at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
        at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

(Fixes: #4160)

How was this patch tested?

Run tpcds q31 with the following configurations
--conf spark.gluten.sql.columnar.broadcastJoin=false
--conf spark.gluten.sql.columnar.broadcastExchange=false

Copy link

#4160

Copy link

Run Gluten Clickhouse CI

@fyp711
Copy link
Contributor Author

fyp711 commented Dec 22, 2023

Hi @zhouyuan , could you help me to review this pr ?

Copy link

Run Gluten Clickhouse CI

@fyp711
Copy link
Contributor Author

fyp711 commented Dec 25, 2023

Hi everyone, i fix the scalastyle check faild

@fyp711
Copy link
Contributor Author

fyp711 commented Dec 25, 2023

Hi, The centos7-test was due to low memory, how to retry it?
Here are some relevant logs

Not enough spark off-heap execution memory. Acquired: 8388608, granted: 4194304. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application.

@fyp711 fyp711 changed the title [GLUTEN-4160][CORE] Fix ClassCastException occurred when broadcastJoin is disabled (#4160) [GLUTEN-4160][CORE] Fix ClassCastException occurred when broadcastJoin is disabled Dec 26, 2023
Copy link

Run Gluten Clickhouse CI

if (GlutenConfig.getConf.enableScanOnly) {
// Disable ColumnarSubqueryBroadcast for scan-only execution.
if (
GlutenConfig.getConf.enableScanOnly || !GlutenConfig.getConf.enableColumnarBroadcastExchange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You disable columnar dpp if the columnar broadcast exchange is disabled. Why does this issue relate to when broadcastJoin is disabled ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I used the following configuration for the test, i think the issue should be changed to when fallback broadcast join and broadcast exchange
--conf spark.gluten.sql.columnar.broadcastJoin=false
--conf spark.gluten.sql.columnar.broadcastExchange=false

Copy link
Contributor Author

@fyp711 fyp711 Dec 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the issue description, thanks.

@fyp711 fyp711 changed the title [GLUTEN-4160][CORE] Fix ClassCastException occurred when broadcastJoin is disabled [GLUTEN-4160][CORE] Fix ClassCastException occurred when fallback broadcast join and broadcast exchange Dec 27, 2023
Copy link

Run Gluten Clickhouse CI

@fyp711
Copy link
Contributor Author

fyp711 commented Dec 27, 2023

Hi @ulysses-you , the ubuntu2004-test-spark32-slow environment test failed due to network reasons. Could you help to retry it or skip it, thanks.
image

@ulysses-you ulysses-you merged commit 73351b9 into apache:main Dec 27, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] A ClassCastException occurs when fallback broadcast join and broadcast exchange
2 participants