Library does't see GPU #1212

ostreech1997 · 2020-05-13T05:36:48Z

Hi everyone, thanks for your library!
I use several BERT models, but I can't train them using GPU. I describe all process:

I install Deeppavlov package into docker container
I install tensorflow-gpu: pip install tensorflow-gpu==1.14.0
I install model’s package requirements and download model
I move docker container to another machine with acсess to GPU. This machine has CUDA and cudnn.
But when I train model, it uses CPU.
I try to check access to GPU using this command: tf.test_is_gpu_avalaible. It returns me False(
May be there is a mistake in this sequence of actions?

IgnatovFedor · 2020-05-13T07:37:18Z

Hi, @ostreech1997
Could you please write what base image do you use, all commands used to build, and command used to run this image?
Also note that DeepPavlov already has gpu Docker image.

ostreech1997 · 2020-05-13T12:30:35Z

Hi, @IgnatovFedor
I use jupyter/base-notebook (link:https://hub.docker.com/r/jupyter/base-notebook/), I only download it and then run: 'docker run -d --name chatbot -p 9010:8888 jupyter/base-notebook'
I read about your deeppavlov/base-gpu image, but I think, that it is not suitable for my task.

IgnatovFedor · 2020-05-13T18:41:10Z

@ostreech1997, do you have nvidia-docker installed? You could use gpu in container only if you run it with nvidia runtime and if image contains CUDA, CUDNN. To check if nvidia-docker installed correctly use nvidia-docker run nvidia/cuda:10.0-base nvidia-smi.
If you need jupyter, build new image based on deeppavlov/base-gpu using this dockerfile:

FROM deeppavlov/base-gpu:0.9.1

RUN pip install jupyter

CMD jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root

After building image with docker build -t dp-jupyter . you could run it with nvidia-docker run --rm -p 9010:8888. GPU should become available.

ostreech1997 · 2020-05-14T13:55:34Z

@IgnatovFedor thanks a lot! I build image using your dockerfile, and now I can use GPU to train models. But, it seems, that train process uses only one video card. Is it possible to use all video cards for training process?

IgnatovFedor · 2020-05-14T14:08:16Z

@ostreech1997, you welcome. Unfortunately, now DeepPavlov does not support the use of more than one GPU.

ostreech1997 · 2020-05-14T14:12:25Z

Okey, I got it.
Thanks anyway for your help and for Deeppavlov library. It's very helpful!

ostreech1997 · 2020-05-15T13:06:08Z

Hi, I have new problem with GPU. I want to train several models one after another. But after first train, second model uses CPU to train.
nvidia-smi shows that learning process still is going. However, model has already trained.
I think, that I have to close this process by myself somehow. How I can do it?

IgnatovFedor · 2020-05-15T13:15:01Z

@ostreech1997, could you show your code used to train several models one after another?

ostreech1997 · 2020-05-15T13:21:36Z

@IgnatovFedor Now, I test training models in Jupyter. Example of intent classifier

with configs.classifiers.rusentiment_bert.open(encoding='utf8') as f:
config_classifier = json.load(f)

config_classifier['metadata']['variables']['MODEL_PATH'] = '/base/.deeppavlov/models/classification_task/classification_intent/'
config_classifier['dataset_reader']["data_path"] = '/base/.deeppavlov/downloads/classification_task/classification_intent/'
config_classifier['dataset_reader']["train"] = 'train.csv'
config_classifier['dataset_reader']["test"] = 'test.csv'
config_classifier['dataset_reader']["x"] = 'name'
config_classifier['dataset_reader']["y"] = 'class'
config_classifier['dataset_reader']["sep"] = ';'
config_classifier['metadata']['download'] = [config_classifier['metadata']['download'][-1]]

model_clf = train_model(config_classifier, download=False)

When training was finished, I check nvidia-smi and see this:

ostreech1997 · 2020-05-15T14:44:30Z

I found that restarting Jupyter kernel fix this. May be there is no problem, if I train models with .py file.
I check it and inform you!

ostreech1997 · 2020-05-18T09:37:20Z

Bad news, when I train models using .py file, I have the same problem. For some reason, gpu continues to be loaded...
Is there any way to free GPU after model's training?

ostreech1997 · 2020-05-19T08:39:01Z

I think this problem for another issue. Thanks a lot for your help @IgnatovFedor, now I can use GPU for train.

IgnatovFedor self-assigned this May 13, 2020

ostreech1997 closed this as completed May 14, 2020

ostreech1997 reopened this May 15, 2020

ostreech1997 closed this as completed May 19, 2020

ostreech1997 mentioned this issue May 19, 2020

Train using GPU, predict using CPU #1222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Library does't see GPU #1212

Library does't see GPU #1212

ostreech1997 commented May 13, 2020

IgnatovFedor commented May 13, 2020

ostreech1997 commented May 13, 2020 •

edited

IgnatovFedor commented May 13, 2020

ostreech1997 commented May 14, 2020

IgnatovFedor commented May 14, 2020

ostreech1997 commented May 14, 2020

ostreech1997 commented May 15, 2020

IgnatovFedor commented May 15, 2020

ostreech1997 commented May 15, 2020 •

edited

ostreech1997 commented May 15, 2020

ostreech1997 commented May 18, 2020

ostreech1997 commented May 19, 2020

Library does't see GPU #1212

Library does't see GPU #1212

Comments

ostreech1997 commented May 13, 2020

IgnatovFedor commented May 13, 2020

ostreech1997 commented May 13, 2020 • edited

IgnatovFedor commented May 13, 2020

ostreech1997 commented May 14, 2020

IgnatovFedor commented May 14, 2020

ostreech1997 commented May 14, 2020

ostreech1997 commented May 15, 2020

IgnatovFedor commented May 15, 2020

ostreech1997 commented May 15, 2020 • edited

ostreech1997 commented May 15, 2020

ostreech1997 commented May 18, 2020

ostreech1997 commented May 19, 2020

ostreech1997 commented May 13, 2020 •

edited

ostreech1997 commented May 15, 2020 •

edited