Skip to content

[VL] GPU code shouldn't be running on CPU node when cudf is enabled #11828

@marin-ma

Description

@marin-ma

Backend

VL (Velox)

Bug description

Currently, Gluten uses the configuration spark.gluten.sql.columnar.cudf=true to enable GPU support. When running on a hybrid cluster, CPU nodes will fail if this configuration is set. We should disable the GPU code by checking whether the node has the CUDA runtime and available devices.

Another issue that needs to be resolved is that we currently pass the cudf configurations spark.gluten.sql.columnar.cudf and spark.gluten.sql.columnar.backend.velox.cudf.enableTableScan to the session configuration at runtime for the WholeStageResultIterator. However, these are immutable configurations. We should remove them from the runtime session configuration and instead set their values from the backend configuration.

Gluten version

No response

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions