Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library does't see GPU #1212

Closed
ostreech1997 opened this issue May 13, 2020 · 12 comments
Closed

Library does't see GPU #1212

ostreech1997 opened this issue May 13, 2020 · 12 comments
Assignees

Comments

@ostreech1997
Copy link

Hi everyone, thanks for your library!
I use several BERT models, but I can't train them using GPU. I describe all process:

  1. I install Deeppavlov package into docker container
  2. I install tensorflow-gpu: pip install tensorflow-gpu==1.14.0
  3. I install model’s package requirements and download model
  4. I move docker container to another machine with acсess to GPU. This machine has CUDA and cudnn.
    But when I train model, it uses CPU.
    I try to check access to GPU using this command: tf.test_is_gpu_avalaible. It returns me False(
    May be there is a mistake in this sequence of actions?
@IgnatovFedor
Copy link
Collaborator

Hi, @ostreech1997
Could you please write what base image do you use, all commands used to build, and command used to run this image?
Also note that DeepPavlov already has gpu Docker image.

@IgnatovFedor IgnatovFedor self-assigned this May 13, 2020
@ostreech1997
Copy link
Author

ostreech1997 commented May 13, 2020

Hi, @IgnatovFedor
I use jupyter/base-notebook (link:https://hub.docker.com/r/jupyter/base-notebook/), I only download it and then run: 'docker run -d --name chatbot -p 9010:8888 jupyter/base-notebook'
I read about your deeppavlov/base-gpu image, but I think, that it is not suitable for my task.

@IgnatovFedor
Copy link
Collaborator

@ostreech1997, do you have nvidia-docker installed? You could use gpu in container only if you run it with nvidia runtime and if image contains CUDA, CUDNN. To check if nvidia-docker installed correctly use nvidia-docker run nvidia/cuda:10.0-base nvidia-smi.
If you need jupyter, build new image based on deeppavlov/base-gpu using this dockerfile:

FROM deeppavlov/base-gpu:0.9.1

RUN pip install jupyter

CMD jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root

After building image with docker build -t dp-jupyter . you could run it with nvidia-docker run --rm -p 9010:8888. GPU should become available.

@ostreech1997
Copy link
Author

@IgnatovFedor thanks a lot! I build image using your dockerfile, and now I can use GPU to train models. But, it seems, that train process uses only one video card. Is it possible to use all video cards for training process?
photo5325770588641406317

@IgnatovFedor
Copy link
Collaborator

@ostreech1997, you welcome. Unfortunately, now DeepPavlov does not support the use of more than one GPU.

@ostreech1997
Copy link
Author

Okey, I got it.
Thanks anyway for your help and for Deeppavlov library. It's very helpful!

@ostreech1997
Copy link
Author

Hi, I have new problem with GPU. I want to train several models one after another. But after first train, second model uses CPU to train.
nvidia-smi shows that learning process still is going. However, model has already trained.
I think, that I have to close this process by myself somehow. How I can do it?

@ostreech1997 ostreech1997 reopened this May 15, 2020
@IgnatovFedor
Copy link
Collaborator

@ostreech1997, could you show your code used to train several models one after another?

@ostreech1997
Copy link
Author

ostreech1997 commented May 15, 2020

@IgnatovFedor Now, I test training models in Jupyter. Example of intent classifier

with configs.classifiers.rusentiment_bert.open(encoding='utf8') as f:
config_classifier = json.load(f)

config_classifier['metadata']['variables']['MODEL_PATH'] = '/base/.deeppavlov/models/classification_task/classification_intent/'
config_classifier['dataset_reader']["data_path"] = '/base/.deeppavlov/downloads/classification_task/classification_intent/'
config_classifier['dataset_reader']["train"] = 'train.csv'
config_classifier['dataset_reader']["test"] = 'test.csv'
config_classifier['dataset_reader']["x"] = 'name'
config_classifier['dataset_reader']["y"] = 'class'
config_classifier['dataset_reader']["sep"] = ';'
config_classifier['metadata']['download'] = [config_classifier['metadata']['download'][-1]]

model_clf = train_model(config_classifier, download=False)

When training was finished, I check nvidia-smi and see this:

image

@ostreech1997
Copy link
Author

I found that restarting Jupyter kernel fix this. May be there is no problem, if I train models with .py file.
I check it and inform you!

@ostreech1997
Copy link
Author

Bad news, when I train models using .py file, I have the same problem. For some reason, gpu continues to be loaded...
Is there any way to free GPU after model's training?

@ostreech1997
Copy link
Author

I think this problem for another issue. Thanks a lot for your help @IgnatovFedor, now I can use GPU for train.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants