Skip to content
This repository has been archived by the owner on Jul 1, 2021. It is now read-only.

Compatibility with RTX 3080 #13

Open
PlatinumYao opened this issue Jan 12, 2021 · 4 comments
Open

Compatibility with RTX 3080 #13

PlatinumYao opened this issue Jan 12, 2021 · 4 comments

Comments

@PlatinumYao
Copy link

OS: Win 10
DeepLabCut Version: DeepLabCut-core tf 2.2 alpha
Anaconda env used: DLC-GPU (clone the DLC-GPU env and uninstall the CUDA and cudnn)
Tensorflow Version: TF2.3, TF2.4, or tf-nightly, installed with pip (see below)
Cuda version: 11.0 and 11.1 (see below)

Hi everyone,
First of all, I want to say thank you to the deeplabcut team! I have been using the DLC for whisker tracking on an RTX 2060 for a while and it significantly facilitates my project.
Recently, I got an RTX 3080 in the lab. However, I had a hard time setting it up for DLC due to the compatibility issue. First, I noticed that RTX 3000 series does not support CUDA 10.x or earlier versions, so I installed CUDA 11.0 or CUDA 11.1 with the coresponding CuDNN on my windows. And I also cloned DLC-GPU conda environment and uninstalled the original CUDA and cudnn in the environment to prevent conflict.
TensorFlow starts to support CUDA 11.0 from TensorFlow 2.4, so I installed the TensorFlow 2.4 or tf-nightly-2.5 in the conda environment (via pip). I also tried TF-2.3 to check whether TF-2.3 is indeed incompatible with CUDA 11.x. I followed the
https://github.com/DeepLabCut/DeepLabCut-core/blob/tf2.2alpha/Colab_TrainNetwork_VideoAnalysis_TF2.ipynb
to install DeepLabCut-core tf 2.2 alpha and tf-slim and run the deeplabcut-core. However, I could not get it to start training in any of the settings.
Here is the summary
CUDA 11.0 | TF-2.3 | TF cannot recognize GPU as it is looking for .dll files that only exist in CUDA10.x
CUDA 11.0 | TF-2.4 | TF can recognize GPU smoothly, cannot start training with an error message (see Notes 1)
CUDA 11.0 | TF-nightly | TF can recognize GPU smoothly, cannot start training with an error message (see Notes 1)
CUDA 11.1 | TF-2.4| TF can recognize GPU with a trick (see Notes 2), cannot start training with no error message
CUDA 11.1 | TF-nightly | TF can recognize GPU with a trick (see Notes 2), cannot start training with no error message
I tested some simple TensorFlow script (https://www.tensorflow.org/tutorials/quickstart/advanced), they seemed to work fine on GPU in the last 4 configurations that I listed above.

Notes 1: Error message: failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED. And I saw the VRAM exploded in Windows Task manager after I started training. I tried to restrict the memory to a lower use by "config.gpu_options.per_process_gpu_memory_fraction = 0.6". It did not help, unfortunately.

Notes 2: TF could not recognize GPU because it could find "cusolver64_10.dll" which exists in CUDA 11.0 but replaced by "cusolver64_11.dll" in CUDA 11.1. So I copied "cusolver64_11.dll" and renamed it as "cusolver64_10.dll". Although TF can recognize GPU after that, it cannot start training. I saw the VRAM usage increased (but did not explode) in task manager after training start and after ~ 30 seconds, ipython or python just closed itself without any error message.

I also carefully followed the suggestions in DeepLabCut/DeepLabCut#944. They are very useful suggestions. However, I still cannot get my RTX3080 work.

Do you have any more suggestions that I could try?
Does anyone have a guide to set DLC-Core on RTX 3000 Series?

Thank you in advance

@dlramamurthy
Copy link

Thanks for this post -- I have also been having these exact issues when trying to install with RTX 3070.
I made an initial post which didn't go into the same level of detail as you have: DeepLabCut/DeepLabCut#1078
I would love to hear if you find out how to get it running!

@bobfromjapan
Copy link

Hello. I'm a RTX3000 user too!

This may not be the answer you are looking for, but I was able to run DeepLabCut 2.2b8 on my RTX3080 using the package tensorlow-directml.
This is an UNOFFICIAL WAY with a package not originally used by DeepLabCut, and it works slower than using native CUDA, but I was able to confirm that it works.

I wrote a report about this on imaga.sc: https://forum.image.sc/t/fyi-deeplabcut-worked-on-radeon-gpu-rtx3080-using-tensorflow-directml/47700

@angelgho
Copy link

OS: Win 10
Installation sequence:
Build cuda following:
https://www.reddit.com/r/tensorflow/comments/jsalkw/rtx_3090_and_tensorflow_for_windows_10_step_by/
Cuda: 11.1
cuDNN: v8.0.5.39
Anaconda env used: DLC-GPU (latest, v2.1.10.2)
Tensorflow Version: pip install tensorflow-gpu==2.4.1
Deeplabcutcore: pip installed first, later from github directly (see below)
tf-slim: pip install tf-slim==1.1.0

Hi all,
Another RTX 3070 user checking in.

I found out that the pip installed deeplabcutcore seems to be an older version than the github one.
When I ran the testscript.py with the pip installed deeplabcutcore, it gives a lot of errors related to tensorflow library while importing deeplabcutcore (for example, module 'tensorflow.python.framework.ops' has no attribute 'RegisterShape').
But if I download the github repo, it works.

Plus, I had the same problem with VRAM exploding, too.
I guess it might be because I am using RTX 3070 as my displaying GPU, too.
I found a solution in here:
tensorflow/tensorflow#46209
I added the following two lines after "import tensorflow as tf" in Lib\site-packages\deeplabcutcore\pose_estimation_tensorflow\train.py:
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True).
It then worked well.

Hope it helps!
Best,
Chen

@F2AGLAXY
Copy link

Hi Yao, I am new to DLC and I also use RTX2060 but it does not work. Can you share your config? Best

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants