Update dockerfile.gpu #6452

NisuSan · 2024-05-09T13:42:15Z

Create blank file.

Create blank file

NisuSan · 2024-05-09T13:42:30Z

It seems like I`ve found the solution. So, there are questions starts:

Can I use own doker-image with preinstaled drivers for spead up the entire initialization process? If use nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04 image as starting point, there is a 10-20 minutes spend only for installing new drivers.
The same about LightGBM itself. Instalation is quite time costly, so I can make precompiled image with it.
The whole resulted image have size about 10GB. It's may be optimized by use layers in docker and the final layer will be smth about 3-4GB.

What do you think about that optimizations?

jameslamb · 2024-05-09T14:37:45Z

Please use an official image from NVIDIA (or some other official base image like ubuntu:latest), not one that you own.

Please keep compilation of LightGBM from source, not pulling from pre-compiled sources.

I strongly recommend that you try changing the base image (first FROM statement) and modifying the existing Dockerfile, instead of completely re-writing this from scratch. If you do continue with completely re-writing it, you should expect more review comments and a longer time until this is completed.

NisuSan · 2024-05-09T17:07:54Z

Understood. I will only change base image, add missing driver and change the libnvidia-opencl.so.1 location (according to the new driver and new image) inside required file.

NisuSan · 2024-05-09T20:18:09Z

@microsoft-github-policy-service agree

Change the base image, add missing driver and change the libnvidia-opencl.so.1 location (according to the new driver and new image) inside required file.

NisuSan · 2024-05-10T08:10:27Z

I did all changes so what next? :)

jameslamb · 2024-05-11T02:09:07Z

I did all changes so what next?

Did you test this?

jameslamb · 2024-05-11T02:11:06Z

docker/gpu/dockerfile.gpu

@@ -88,7 +89,7 @@ RUN cd /usr/local/src && mkdir lightgbm && cd lightgbm && \

 ENV PATH /usr/local/src/lightgbm/LightGBM:${PATH}

-RUN /bin/bash -c "source activate py3 && cd /usr/local/src/lightgbm/LightGBM && sh ./build-python.sh install --precompile && source deactivate"
+RUN /bin/bash -c "source activate py3 && cd /usr/local/src/lightgbm/LightGBM && sh ./build-python.sh install --gpu && source deactivate"


This is not correct. lib_lightgbm.so has already been compiled a few lines up (the line running cmake --build build), so --precompile is necessary to build a Python package bundling it in.

Using --gpu makes that previous compilation unnecessary... and will not use the same OpenCL library and headers that was passed there.

This should be reverted.

Get it. I'll try to work with it today if I have spare time and ckeck everything one more time.

If I turn back to --precompile I got this errors:

[LightGBM] [Warning] Using sparse features with CUDA is currently not supported. [LightGBM] [Fatal] CUDA Tree Learner was not enabled in this build. Please recompile with CMake option -DUSE_CUDA=1

That error suggests to me that you're passing {"device": "cuda"} through parameters. That isn't appropriate for this image, where the library hasn't been built with -DUSE_CUDA=1.

In this Dockerfile, lib_lightgbm is being built only with -DUSE_GPU=1, which means you'd need to pass {"device": "gpu"} through params.

I've tried different versions of building like cmake -DUSE_GPU=1 or cmake -DUSE_CUDA=1, and then in the installation command, I also tried all possible variants: sh ./build-python.sh install --gpu, sh ./build-python.sh install --cuda, and sh ./build-python.sh install --precompile as well. I even found your reply on StackOverflow and tried to change some installation steps, but it still didn't work.

The good news is that I fixed the missing files and driver in the Docker image, so now we just need to figure out how to install it properly :)

@jameslamb, @shiyu1994, Today I decide to install it with simple pip command like pip install --no-binary lightgbm --config-settings=cmake.define.USE_CUDA=ON 'lightgbm>=4.0.0' and after run code with device: cuda, I get already known error from this issue. This gave me an idea that promblem with instalation from the sorce can be inside the build-python.sh or cmakelists.txt files. I ask you to get look at this if you can

It's difficult for me to help you because you're reporting error messages but not showing the code you ran the led to them.

This Dockerfile is about the -DUSE_GPU version of LightGBM (OpenCL-based), not the -DUSE_CUDA version (CUDA kernels). Please keep it that way.

Stop passing -DUSE_CUDA or using {"device": "cuda"} with images built from this Dockerfile.

Update dockerfile.gpu

7df7e7e

Create blank file

jameslamb added the in progress label May 9, 2024

Update dockerfile.gpu

24dcd56

Change the base image, add missing driver and change the libnvidia-opencl.so.1 location (according to the new driver and new image) inside required file.

NisuSan marked this pull request as ready for review May 10, 2024 08:13

NisuSan requested review from guolinke, jameslamb, shiyu1994, jmoralez and borchero as code owners May 10, 2024 08:13

jameslamb reviewed May 11, 2024

View reviewed changes

jameslamb added the fix label May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dockerfile.gpu #6452

Update dockerfile.gpu #6452

NisuSan commented May 9, 2024

NisuSan commented May 9, 2024 •

edited

jameslamb commented May 9, 2024

NisuSan commented May 9, 2024

NisuSan commented May 9, 2024

NisuSan commented May 10, 2024

jameslamb commented May 11, 2024

jameslamb May 11, 2024

NisuSan May 17, 2024

NisuSan May 20, 2024

jameslamb May 20, 2024

NisuSan May 21, 2024 •

edited

NisuSan May 26, 2024

jameslamb May 26, 2024

Update dockerfile.gpu #6452

Are you sure you want to change the base?

Update dockerfile.gpu #6452

Conversation

NisuSan commented May 9, 2024

NisuSan commented May 9, 2024 • edited

jameslamb commented May 9, 2024

NisuSan commented May 9, 2024

NisuSan commented May 9, 2024

NisuSan commented May 10, 2024

jameslamb commented May 11, 2024

jameslamb May 11, 2024

Choose a reason for hiding this comment

NisuSan May 17, 2024

Choose a reason for hiding this comment

NisuSan May 20, 2024

Choose a reason for hiding this comment

jameslamb May 20, 2024

Choose a reason for hiding this comment

NisuSan May 21, 2024 • edited

Choose a reason for hiding this comment

NisuSan May 26, 2024

Choose a reason for hiding this comment

jameslamb May 26, 2024

Choose a reason for hiding this comment

NisuSan commented May 9, 2024 •

edited

NisuSan May 21, 2024 •

edited