Handle minimum GPU architecture supported [databricks] #10540

parthosa · 2024-03-04T17:21:18Z

Fixes #10430. This PR ensures that Spark RAPIDS jobs are executed on supported GPU architectures without relying on manual configuration.

Changes:

Processes gpu_architectures property from the *version-info.properties file generated by the native builds.
Verifies if the user is running the job on an architecture supported by the cuDF and JNI libraries and throws an exception if the architecture is unsupported.

Testing

Tested on a Dataproc VM running on Nvidia P4 (GPU Architecture 6.1)

24/03/06 17:44:58 WARN RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.
24/03/06 17:45:10 ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down!
java.lang.RuntimeException: Device architecture 61 is unsupported. Minimum supported architecture: 75.
        at com.nvidia.spark.rapids.RapidsPluginUtils$.checkGpuArchitectureInternal(Plugin.scala:366)
        at com.nvidia.spark.rapids.RapidsPluginUtils$.checkGpuArchitecture(Plugin.scala:375)
        at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:461)

Related PR

Add GPU architectures to the build-info file spark-rapids-jni#1840

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala

This reverts commit 7b8eaea.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala

…hitectures Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

jlowe

Converted to draft since this

parthosa · 2024-03-07T20:59:24Z

Need to wait for a new artefact for spark-rapids-jni

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

parthosa · 2024-03-12T15:21:14Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala

gerashegalov

LGTM

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

jlowe

This is looking good, but agree with @gerashegalov that this should have at least some test. Refactoring checkGpuArchitecture to take the property set and the GPU major/minor architectures makes it easier to mock and test various scenarios of the core logic for this.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

gerashegalov

LGTM, thank you @parthosa for working through all the issues.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala

tests/src/test/scala/com/nvidia/spark/rapids/GpuArchitectureTestSuite.scala

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

gerashegalov · 2024-03-14T22:59:14Z

build

gerashegalov

LGTM

parthosa · 2024-03-15T05:22:09Z

Thank you @gerashegalov and @jlowe

Add conf for minimum supported CUDA and error handling

7b8eaea

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa added the feature request New feature or request label Mar 4, 2024

parthosa requested review from jlowe, revans2 and mythrocks March 4, 2024 17:21

parthosa self-assigned this Mar 4, 2024

kuhushukla previously approved these changes Mar 4, 2024

View reviewed changes

gerashegalov reviewed Mar 4, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala Outdated Show resolved Hide resolved

jlowe previously approved these changes Mar 4, 2024

View reviewed changes

parthosa mentioned this pull request Mar 6, 2024

Add GPU architectures to the build-info file NVIDIA/spark-rapids-jni#1840

Merged

parthosa added 2 commits March 6, 2024 18:19

Revert "Add conf for minimum supported CUDA and error handling"

a77947f

This reverts commit 7b8eaea.

Verify the GPU architecture is supported by the plugin libraries

acd3371

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa dismissed stale reviews from jlowe and kuhushukla via acd3371 March 6, 2024 18:21

parthosa requested review from jlowe, gerashegalov and kuhushukla March 6, 2024 18:39

jlowe reviewed Mar 6, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala Outdated Show resolved Hide resolved

gerashegalov reviewed Mar 6, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala Outdated Show resolved Hide resolved

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala Outdated Show resolved Hide resolved

Use semi-colon as delimiter and use intersection of supported gpu arc…

80285ec

…hitectures Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa requested review from jlowe and gerashegalov March 6, 2024 23:01

jlowe marked this pull request as draft March 7, 2024 14:46

jlowe marked this pull request as ready for review March 7, 2024 14:48

jlowe previously approved these changes Mar 7, 2024

View reviewed changes

parthosa mentioned this pull request Mar 7, 2024

[FEA] Error out when running on an unsupported GPU architecture #10430

Closed

gerashegalov previously approved these changes Mar 7, 2024

View reviewed changes

kuhushukla previously approved these changes Mar 7, 2024

View reviewed changes

gerashegalov changed the title ~~Handle minimum CUDA architecture supported~~ Handle minimum CUDA architecture supported [databricks] Mar 8, 2024

gerashegalov self-requested a review March 12, 2024 12:29

jlowe reviewed Mar 12, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Outdated Show resolved Hide resolved

Update RapidsConf.scala

592bed0

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

parthosa dismissed gerashegalov’s stale review via 592bed0 March 12, 2024 15:05

parthosa requested a review from jlowe March 12, 2024 15:43

jlowe reviewed Mar 12, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala Outdated Show resolved Hide resolved

Update verification logic

3a1595b

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

gerashegalov reviewed Mar 12, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala Outdated Show resolved Hide resolved

gerashegalov previously approved these changes Mar 12, 2024

View reviewed changes

Update warning message

8f730ba

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa dismissed gerashegalov’s stale review via 8f730ba March 12, 2024 21:56

parthosa requested review from gerashegalov and jlowe March 12, 2024 21:57

jlowe reviewed Mar 12, 2024

View reviewed changes

gerashegalov previously approved these changes Mar 12, 2024

View reviewed changes

Add unit tests and update warning message.

05f74d2

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa dismissed gerashegalov’s stale review via 05f74d2 March 14, 2024 21:25

Update exception class

75bd945

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa requested review from jlowe and gerashegalov March 14, 2024 21:41

jlowe previously approved these changes Mar 14, 2024

View reviewed changes

gerashegalov previously approved these changes Mar 14, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala Outdated Show resolved Hide resolved

tests/src/test/scala/com/nvidia/spark/rapids/GpuArchitectureTestSuite.scala Outdated Show resolved Hide resolved

Address review comments

8673f2a

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa dismissed stale reviews from gerashegalov and jlowe via 8673f2a March 14, 2024 22:31

gerashegalov approved these changes Mar 15, 2024

View reviewed changes

gerashegalov merged commit 79c2a3b into NVIDIA:branch-24.04 Mar 15, 2024
43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle minimum GPU architecture supported [databricks] #10540

Handle minimum GPU architecture supported [databricks] #10540

parthosa commented Mar 4, 2024 •

edited

jlowe left a comment

parthosa commented Mar 7, 2024

parthosa commented Mar 12, 2024

gerashegalov left a comment

jlowe left a comment

gerashegalov left a comment

gerashegalov commented Mar 14, 2024

gerashegalov left a comment

parthosa commented Mar 15, 2024

Handle minimum GPU architecture supported [databricks] #10540

Handle minimum GPU architecture supported [databricks] #10540

Conversation

parthosa commented Mar 4, 2024 • edited

Changes:

Testing

Related PR

jlowe left a comment

Choose a reason for hiding this comment

parthosa commented Mar 7, 2024

parthosa commented Mar 12, 2024

gerashegalov left a comment

Choose a reason for hiding this comment

jlowe left a comment

Choose a reason for hiding this comment

gerashegalov left a comment

Choose a reason for hiding this comment

gerashegalov commented Mar 14, 2024

gerashegalov left a comment

Choose a reason for hiding this comment

parthosa commented Mar 15, 2024

parthosa commented Mar 4, 2024 •

edited