-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Shims to replace Hive execs #7486
Allow Shims to replace Hive execs #7486
Conversation
Distro-specific shims are currently allowed to customize operator/exec/expr replacements in most cases. However, there is currently no good way to override any operator/exec/expr replacements done in `HiveProviderImpl`. For instance, the `HiveTableScanExec` is currently provided unilaterally by `HiveProviderImpl`. There is no way for a Spark distribution's shim to block/override this exec's replacement; the exec will be replaced in all distros. This change provides shims a way to specify a custom `HiveProvider` if required, where operator replacements may be customized. Signed-off-by: MithunR <mythrocks@gmail.com>
8d69afa
to
44de30e
Compare
Build |
The build failure on 3.4 seems to be unrelated to this current change:
Apache Spark has removed I should check if this is being addressed elsewhere. |
@jlowe: Thank you for the suggestion earlier, to reorder the expressions/execs in Unfortunately, |
Build |
That does not prevent shims from overriding HiveTableScanExec. Shims can, and do, have sources that are in different packages. See the stuff in org.apache.spark instead of com.nvidia.spark under the varous shims, as one example. Since this override meta would be essentially private to the shim, it can be located in any package it wants to be. |
sql-plugin/src/main/311until320-nondb/scala/com/nvidia/spark/rapids/shims/Spark31XShims.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
This reverts commit 8f1b84b.
This reverts commit 163c3aa.
This reverts commit 44de30e.
If a distro shim attempts to provide a replacement for Hive execs, e.g. `HiveTableScanExec`, it is likely to fail, because `GpuOverrides` currently applies execs derived from the HiveProvider last. This commit changes this order, so that the active shim has the last say in exec replacements. This should allow for shims to selectively replace or disable replacements for Hive operators.
Build |
HiveProvider
implementation.
I have modified this patch based on @jlowe's suggestion. I agree that this works a little better:
I have changed the brief and description to match the change. An example of how a shim (say for CDH) may modify the replacement rules for a Hive exec (say |
Distro-specific shims are currently allowed to customize operator/exec/expr replacements in most cases. However, replacements made specifically for exprs/execs from
org.apache.spark.sql.hive
(i.e. the ones handled byHiveProvider
) do not succeed.For instance, the
HiveTableScanExec
is currently provided byHiveProviderImpl
. When a Spark distribution's shim attempts to block/override this exec's replacement, it does not succeed, because theHiveProviderImpl
changes are applied last inGpuOverrides
.This change puts the Spark shim replacement rules at the end, so that the shim has the final say.