Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not initialize class com.nvidia.spark.ml.linalg.JniRAPIDSML #73

Open
pxLi opened this issue May 31, 2022 · 0 comments
Open

Could not initialize class com.nvidia.spark.ml.linalg.JniRAPIDSML #73

pxLi opened this issue May 31, 2022 · 0 comments
Labels
bug Something isn't working P1

Comments

@pxLi
Copy link
Collaborator

pxLi commented May 31, 2022

ML JNI mvn build is OK, but when try test it w/ spark plugin in a fresh ENV (w/o conda env for build), it would throw error

[2022-05-31T03:28:40.517Z] 22/05/31 03:28:40 WARN TaskSetManager: 
Lost task 6.0 in stage 5.0 (TID 33) (10.233.109.181 executor 0): 
java.lang.UnsatisfiedLinkError: /tmp/librapidsml_jni.so5201224938898577270: 
libarrow_cuda.so.700: cannot open shared object file: 
No such file or directory

the only change we found is a new cmake option when build w/ conda cudf-22.06
previous (before 22.06.00a220530, e.g. 22.06.00a220519)

LINK_LIBRARIES = -Wl,-rpath,/usr/local/cuda/lib64:/root/miniconda3/lib  -Wl,-Bstatic  -lcudart_static  -lcusparse_static  
-lcusolver_static  -lculibos  -llapack_static  -Wl,-Bdynamic  /root/miniconda3/lib/libcudf.so  /usr/local/cuda/lib64/libcublas.so 
 /root/miniconda3/lib/libarrow.so.700.0.0  /root/miniconda3/lib/libarrow_cuda.so.700.0.0  -ldl  -lpthread  
/usr/local/cuda/lib64/libcudart.so  /usr/lib64/libcuda.so  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl

now (22.06.00a220530)

LINK_LIBRARIES = -Wl,-rpath,/usr/local/cuda/lib64:/root/miniconda3/lib  -Wl,-Bstatic  -lcudart_static  -lcusparse_static  
-lcusolver_static  -lculibos  -llapack_static  -Wl,-Bdynamic  /root/miniconda3/lib/libcudf.so  /usr/local/cuda/lib64/libcublas.so 
 /root/miniconda3/lib/libarrow.so.700.0.0  /root/miniconda3/lib/libarrow_cuda.so.700.0.0  -ldl  -lpthread 
/usr/local/cuda/lib64/libcudart.so /usr/lib64/libcuda.so -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl  
-Wl,-rpath-link,/root/miniconda3/lib

which introduced -Wl,-rpath-link,/root/miniconda3/lib in

cudf                      22.06.00a220530 cuda_11_py38_gdcb04704b3_316    rapidsai-nightly
libcudf                   22.06.00a220530 cuda11_gdcb04704b3_316    rapidsai-nightly
arrow-cpp                 7.0.0           py38he106920_7_cuda    conda-forge
arrow-cpp-proc            3.0.0                      cuda    conda-forge
pyarrow                   7.0.0           py38h17143e8_7_cuda    conda-forge

Probably the dependeny tree mess up in latest cudf pkg on conda?
Test using ml JNI artifacts built against cudf packages before 22.06.00a220530 worked fine.

@pxLi pxLi added the bug Something isn't working label May 31, 2022
@GaryShen2008 GaryShen2008 added the P1 label Jun 6, 2022
wjxiz1992 added a commit to wjxiz1992/spark-rapids-ml that referenced this issue Jun 6, 2022
See issue: NVIDIA#73

Signed-off-by: Allen Xu <allxu@nvidia.com>
wjxiz1992 added a commit to wjxiz1992/spark-rapids-ml that referenced this issue Jun 6, 2022
due to issue: NVIDIA#73

Signed-off-by: Allen Xu <allxu@nvidia.com>
GaryShen2008 pushed a commit that referenced this issue Jun 24, 2022
* fall back cudf version to 22.04

due to issue: #73

Signed-off-by: Allen Xu <allxu@nvidia.com>

* refine

Signed-off-by: Allen Xu <allxu@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1
Projects
No open projects
Release 22.06
Awaiting triage
Development

No branches or pull requests

2 participants