Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] create the plugin package capable of storing conflicting multiple versions of same named classes #3232

Closed
gerashegalov opened this issue Aug 14, 2021 · 2 comments · Fixed by #3381 or #3411
Assignees
Labels
P0 Must have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher task Work required that improves the product but is not user facing

Comments

@gerashegalov
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
This feature contributes to #3223. Spark does not provide link compatibility. Therefore identical classes compiled against different spark versions may result in incompatible bytecode. We need to save multiple copies of such classes in the plugin jar. Standard way of using ASM-based shading/relocation does not work well with Scala.

Describe the solution you'd like

This issue proposes an approach similar to ParallelWorldClassLoader. Instead of overriding findClass in Spark we can lean on MutableURLClassLoader in conjunction with JarURLConnection JAR URLs

The package will consist of three types of areas

  1. a few publicly documented classes in the conventional layout
  2. a large fraction of classes whose bytecode is identical under all supported Spark versions
  3. a smaller fraction of classes that differ under one of the supported Spark vesions
$ jar tvf rapids-4-spark_2.12.jar
com/nvidia/spark/SQLPlugin.class
spark3xx-common/com/nvidia/spark/rapids/CastExprMeta.class
spark301/org/apache/spark/sql/rapids/GpuUnaryMinus.class    
spark311/org/apache/spark/sql/rapids/GpuUnaryMinus.class
spark320/org/apache/spark/sql/rapids/GpuUnaryMinus.class

So each shim can see a consistent parallel world without conflicts by referencing only one conflicting directory.

E.g., Spark 3.2.0 Shim will use

jar:file:/home/spark/rapids-4-spark_2.12-21.10.jar!/spark3xx-common/
jar:file:/home/spark/rapids-4-spark_2.12-21.10.jar!/spark320/

Spark 3.1.1 will use

jar:file:/home/spark/rapids-4-spark_2.12-21.10.jar!/spark3xx-common/
jar:file:/home/spark/rapids-4-spark_2.12-21.10.jar!/spark311/

Describe alternatives you've considered

  • shading
  • JDK ParallelWorldClassLoader
  • replicating copies of the same under different package names.

Additional context

  • this issue should be broken down in packaging and actual class loader implementation
@gerashegalov gerashegalov added the feature request New feature or request label Aug 14, 2021
@gerashegalov gerashegalov added this to To do in Release 21.10 via automation Aug 14, 2021
@sameerz sameerz added the P0 Must have for release label Aug 15, 2021
gerashegalov added a commit that referenced this issue Sep 10, 2021
Signed-off-by: Gera Shegalov <gera@apache.org>

Contributes to #3232. Use MutableURLClassLoader in conjunction with JarURLConnection JAR URLs to create "parallel worlds" for each shim in a single jar file.

Assumes a package layout consisting of three types of areas

- a few publicly documented classes in the conventional layout
- a large fraction of classes whose bytecode is identical under all supported Spark versions
- a smaller fraction of classes that differ under one of the supported Spark versions, aka "parallel worlds" in the JDK's com.sun.istack.internal.tools.ParallelWorldClassLoader terminology

```
$ jar tvf rapids-4-spark_2.12.jar
com/nvidia/spark/SQLPlugin.class
spark3xx-common/com/nvidia/spark/rapids/CastExprMeta.class
spark301/org/apache/spark/sql/rapids/GpuUnaryMinus.class    
spark311/org/apache/spark/sql/rapids/GpuUnaryMinus.class
spark320/org/apache/spark/sql/rapids/GpuUnaryMinus.class
```
@jlowe
Copy link
Member

jlowe commented Sep 10, 2021

@tgravescs this is now complete after #3411, correct?

@gerashegalov gerashegalov added the Spark 3.1+ Bugs only related to Spark 3.1 or higher label Sep 13, 2021
@gerashegalov gerashegalov added this to the Aug 30 - Sept 10 milestone Sep 13, 2021
@gerashegalov
Copy link
Collaborator Author

gerashegalov commented Sep 13, 2021

Completed via #3411

Release 21.10 automation moved this from To do to Done Sep 13, 2021
@sameerz sameerz added task Work required that improves the product but is not user facing and removed feature request New feature or request labels Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Must have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher task Work required that improves the product but is not user facing
Projects
No open projects
4 participants