New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] DPP is not working in Databricks env #3143
Comments
is the CPU run here with AQE off or on? |
CPU with AQE on while GPU run can only use AQE off. This is because if I enable AQE for GPU run, another bug will be triggered and will crash the cluster so that i have to restart the cluster |
@tgravescs I just quickly tested CPU run on Databricks. Basically my test shows, even if AQE is off, DPP is happening. |
for some reason the GPU plan on Databricks is missing the SubqueryBroadcast which is used with DPP:
normally this woudl look like:
Somehow the subqueryBroadcast isn't there |
Yeah so the plan that we get from Databricks doesn't even have the SubqueryBroadcast in it like Apache Spark does, so they must be inserting this at some other point because the CPU side eventually gets it inserted, but its sometime after we see the plan. |
note, turning off our gpuBroadcastExchange makes dpp work on databricks: spark.conf.set("spark.rapids.sql.exec.BroadcastExchangeExec", "false")
|
Please note you can work around this issue by disabling DPP with |
Describe the bug
A clear and concise description of what the bug is.
DPP(dynamic partition pruning) is not working in Databricks env.
Found this issue when analyzing NDS query performance on Databricks.
Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.
Below is the minimum reproduce:
GPU's physical plan:
CPU's physical plan:
As you can see, even though the dynamicpruningexpression keyword is there , however the filter is always true.
In Spark standalone cluster, there is no such issue for GPU run.
Expected behavior
A clear and concise description of what you expected to happen.
DPP should happen in Databricks env.
Environment details (please complete the following information)
Databricks runtime 8.2ML GPU
RAPIDS 21.10snapshot / 21.06GA
Additional context
WorkAround:
Disabling DPP by setting:
spark.sql.optimizer.dynamicPartitionPruning.enabled=false
The text was updated successfully, but these errors were encountered: