Cannot get RTX 3090 card to start training #944

cfernandezpa · 2020-10-05T15:57:49Z

OS: Win 10
DeepLabCut Version: 2.2b8
Anaconda env used: DLC-GPU (cloned from Alex's github)
WxPython version: 4.0.7.post2
Tensorflow version: many, installed with pip (see below)
Cuda version: 10 and 11

Hi everyone,

First of all, I wanted to thank all the authors for this amazing software!

I'm starting to work with DeepLabCut and after a few promising preliminary results with an "old" GPU (Turing architecture), we decided to upgrade to the recent Ampere architecture. Since it is also backwards compatible with old CUDA versions, we thought that it would be fine. However, after trying many combinations of Tensorflow and CUDA, I cannot make it to work. Here are the combinations I have tried so far:

Cuda | Tensorflow | Cudnn | Works?

10 | 1.15.2 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
10 | 1.15.0 | 7.6.5 | Same as with tf 1.15.2
10 | 1.14.0 | 7.6.5 | Does not detect GPU
11 | 1.15.0 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.13.1 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.14.0 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.15.4 | 7.6.5 | Does not detect GPU
11 | 1.15.2 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
*Tensorflow 1.13.1 does not detect the GPU either.

Using the combinations mentioned above that recognizes the GPU, and can print "Hello, Tensorflow", I ended up stuck at this screen (see code below).

I know that in the documentation says that CUDA 10.+ is not supported, but with the old card we had, it was running fine with CUDA 11. I have very limited knowledge about this, so not sure why/how it worked.

Reading in CUDA documentation it says that Ampere architecture is compatible with CUDA 10.2 or earlier. Also, according to Tensorflow documentation, Tensorflow 1.15 should be compatible with ampere. The only caveat is that it takes too long to start (up to 30 min) but that can be fixed by increasing the cuda cache size.

So, to me, the only thing left that could be giving issues is Cudnn. According to Nvidia, support for Ampere only appeared in Cudnn 8. However, as far as I know, Anaconda only supports up to Cudnn 7.6.5 on Windows. Apparently it has reached Cudnn 8 on Linux.

Code output

[Selecting multi-animal trainer
Config:
{'all_joints': [[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9],
[10],
[11],
[12]],
'all_joints_names': ['snout',
'cap',
'leftear',
'rightear',
'spine',
'lforepaw',
'rforepaw',
'lhindpaw',
'rhindpaw',
'tailbase',
'tailend',
'cornerofbox1',
'cornerofbox2'],
'batch_size': 8,
'crop_pad': 0,
'cropratio': 0.4,
'dataset': 'training-datasets\iteration-0\UnaugmentedDataSet_2CamTest9Oct4\2CamTest9_CF95shuffle3.pickle',
'dataset_type': 'multi-animal-imgaug',
'deterministic': False,
'display_iters': 500,
'fg_fraction': 0.25,
'global_scale': 0.8,
'init_weights': 'C:\Users\RyC\anaconda3\envs\dlc-gpu\lib\site-packages\deeplabcut\pose_estimation_tensorflow\models\pretrained\resnet_v1_50.ckpt',
'intermediate_supervision': False,
'intermediate_supervision_layer': 12,
'location_refinement': True,
'locref_huber_loss': True,
'locref_loss_weight': 0.05,
'locref_stdev': 7.2801,
'log_dir': 'log',
'max_input_size': 1500,
'mean_pixel': [123.68, 116.779, 103.939],
'metadataset': 'training-datasets\iteration-0\UnaugmentedDataSet_2CamTest9Oct4\Documentation_data-2CamTest9_95shuffle3.pickle',
'min_input_size': 64,
'mirror': False,
'multi_step': [[0.0001, 7500], [5e-05, 12000], [1e-05, 200000]],
'net_type': 'resnet_50',
'num_joints': 13,
'num_limbs': 55,
'optimizer': 'adam',
'pafwidth': 20,
'pairwise_huber_loss': False,
'pairwise_loss_weight': 0.1,
'pairwise_predict': False,
'partaffinityfield_graph': [[5, 9],
[4, 7],
[1, 3],
[6, 9],
[4, 8],
[5, 6],
[2, 8],
[0, 7],
[8, 9],
[1, 6],
[0, 10],
[3, 7],
[0, 3],
[2, 5],
[2, 4],
[5, 8],
[1, 2],
[4, 9],
[6, 7],
[2, 9],
[3, 10],
[6, 10],
[8, 10],
[1, 5],
[3, 6],
[0, 4],
[1, 10],
[7, 10],
[4, 10],
[2, 6],
[4, 5],
[1, 4],
[2, 10],
[9, 10],
[3, 9],
[0, 5],
[1, 9],
[2, 3],
[0, 8],
[3, 5],
[0, 1],
[2, 7],
[7, 9],
[7, 8],
[5, 10],
[4, 6],
[6, 8],
[5, 7],
[3, 8],
[0, 6],
[1, 8],
[1, 7],
[0, 9],
[3, 4],
[0, 2]],
'partaffinityfield_predict': True,
'pos_dist_thresh': 17,
'project_path': 'C:\Users\RyC\2CamTest9-CF-2020-10-04',
'regularize': False,
'rotation': 25,
'rotratio': 0.4,
'save_iters': 10000,
'scale_jitter_lo': 0.5,
'scale_jitter_up': 1.25,
'scoremap_dir': 'test',
'shuffle': True,
'snapshot_prefix': 'C:\Users\RyC\2CamTest9-CF-2020-10-04\dlc-models\iteration-0\2CamTest9Oct4-trainset95shuffle3\train\snapshot',
'stride': 8.0,
'weigh_negatives': False,
'weigh_only_present_joints': False,
'weigh_part_predictions': False,
'weight_decay': 0.0001}
Activating limb prediction...
Starting with multi-animal imaug + adam pose-dataset loader.
Batch Size is 8
Getting specs multi-animal-imgaug 55 13
Initializing ResNet
Loading ImageNet-pretrained resnet_50
2020-10-05 10:40:16.943131: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-10-05 10:40:16.946595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:08:00.0
2020-10-05 10:40:16.946675: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-10-05 10:40:16.948226: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-10-05 10:40:16.948570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-10-05 10:40:16.948928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-10-05 10:40:16.949263: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-10-05 10:40:16.949302: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-10-05 10:40:16.949559: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-10-05 10:40:16.949840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-10-05 10:40:17.963045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-05 10:40:17.963140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2020-10-05 10:40:17.964083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2020-10-05 10:40:17.964440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22071 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:08:00.0, compute capability: 8.6)
Max_iters overwritten as 3000
Display_iters overwritten as 10
Save_iters overwritten as 50
Training parameters:
{'stride': 8.0, 'weigh_part_predictions': False, 'weigh_negatives': False, 'fg_fraction': 0.25, 'mean_pixel': [123.68, 116.779, 103.939], 'shuffle': True, 'snapshot_prefix': 'C:\Users\RyC\2CamTest9-CF-2020-10-04\dlc-models\iteration-0\2CamTest9Oct4-trainset95shuffle3\train\snapshot', 'log_dir': 'log', 'global_scale': 0.8, 'location_refinement': True, 'locref_stdev': 7.2801, 'locref_loss_weight': 0.05, 'locref_huber_loss': True, 'optimizer': 'adam', 'intermediate_supervision': False, 'intermediate_supervision_layer': 12, 'regularize': False, 'weight_decay': 0.0001, 'crop_pad': 0, 'scoremap_dir': 'test', 'batch_size': 8, 'dataset_type': 'multi-animal-imgaug', 'deterministic': False, 'mirror': False, 'pairwise_huber_loss': False, 'weigh_only_present_joints': False, 'partaffinityfield_predict': True, 'pairwise_predict': True, 'all_joints': [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]], 'all_joints_names': ['snout', 'cap', 'leftear', 'rightear', 'spine', 'lforepaw', 'rforepaw', 'lhindpaw', 'rhindpaw', 'tailbase', 'tailend', 'cornerofbox1', 'cornerofbox2'], 'cropratio': 0.4, 'dataset': 'training-datasets\iteration-0\UnaugmentedDataSet_2CamTest9Oct4\2CamTest9_CF95shuffle3.pickle', 'display_iters': 500, 'init_weights': 'C:\Users\RyC\anaconda3\envs\dlc-gpu\lib\site-packages\deeplabcut\pose_estimation_tensorflow\models\pretrained\resnet_v1_50.ckpt', 'max_input_size': 1500, 'metadataset': 'training-datasets\iteration-0\UnaugmentedDataSet_2CamTest9Oct4\Documentation_data-2CamTest9_95shuffle3.pickle', 'min_input_size': 64, 'multi_step': [[0.0001, 7500], [5e-05, 12000], [1e-05, 200000]], 'net_type': 'resnet_50', 'num_joints': 13, 'num_limbs': 55, 'pafwidth': 20, 'pairwise_loss_weight': 0.1, 'partaffinityfield_graph': [[5, 9], [4, 7], [1, 3], [6, 9], [4, 8], [5, 6], [2, 8], [0, 7], [8, 9], [1, 6], [0, 10], [3, 7], [0, 3], [2, 5], [2, 4], [5, 8], [1, 2], [4, 9], [6, 7], [2, 9], [3, 10], [6, 10], [8, 10], [1, 5], [3, 6], [0, 4], [1, 10], [7, 10], [4, 10], [2, 6], [4, 5], [1, 4], [2, 10], [9, 10], [3, 9], [0, 5], [1, 9], [2, 3], [0, 8], [3, 5], [0, 1], [2, 7], [7, 9], [7, 8], [5, 10], [4, 6], [6, 8], [5, 7], [3, 8], [0, 6], [1, 8], [1, 7], [0, 9], [3, 4], [0, 2]], 'pos_dist_thresh': 17, 'project_path': 'C:\Users\RyC\2CamTest9-CF-2020-10-04', 'rotation': 25, 'rotratio': 0.4, 'save_iters': 10000, 'scale_jitter_lo': 0.5, 'scale_jitter_up': 1.25}
Starting multi-animal training....
2020-10-05 10:40:27.731872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll]

Upon reading in some forums, some people have been succesful using Symlink in other applications, so I tried that with Cudnn64_7.dll and hardlinked to Cudnn64_8.dll inside DLC-GPU enviroment, but I have not been able to make it work. It shows an error saying that compute capabilities does not match.

Do you have any suggestion that I might try?

Many thanks in advance.

MMathisLab · 2020-10-11T13:57:54Z

We just got a 3090 in the lab this week; so we can test it. But in general what I would suggest is running our testscripts always as a first pass after installation.

https://www.youtube.com/watch?v=IOWtKn3l33s

https://github.com/DeepLabCut/DeepLabCut/tree/master/examples

cfernandezpa · 2020-10-12T02:39:01Z

Thank you very much for the reply. That is great news, hopefully you would be able to make it work! I won't be able to try testscripts this week but I'll do the other week for sure and I'll report back.

Thanks again!

MMathisLab · 2020-10-16T17:29:43Z

sorry we haven't gotten to this yet; but you might try our dev branch with TF2.x--> https://github.com/DeepLabCut/DeepLabCut-core/tree/tf2.2alpha

cfernandezpa · 2020-10-16T18:32:32Z

Hi,

I have tried the testscript with version 2.2b8 and it stops at the same point to when I tried with my data set.
I am going to try now with the Dev branch and see how it goes.

Thanks!

cfernandezpa · 2020-10-17T21:18:34Z

Hi,

I am currently trying with the dev branch and I was able to start training. However, it was far from ideal. First, I learned that TF2.2 does not work with CUDA 11 (let alone 11.1), so it won't recognize the GPU. So, I had to install CUDA 10.1 which is supposed to be the version that works with TF2.2. That change made the system to recognize the GPU. Then, training took a long time to start but it engaged the GPU as seen in Task manager. A warning message was shown about PTX compiling been done by the driver (I cannot find the original message in the training log), after which training started but it was very slow. Also, the reduction of the "loss" value after each iteration seems smaller than I remembered, but I have no objective way to confirm this. In any case, I was able to train for 10000 iterations which is a good progress.

I think one possible solution is to compile TF2.2 or 2.3 with CUDA 11.1 from sources, but I don't know how to do that in Windows. I found an article on how to do it for Linux (https://towardsdatascience.com/how-to-compile-tensorflow-2-3-with-cuda-11-1-8cbecffcb8d3). Could you please advice on this matter?

If I find anything else, I'll post it here.

Thanks!

cfernandezpa · 2020-10-17T23:59:55Z

Little update. I noticed that I had "gputouse=0", so I changed it to 1 and started much faster and it is training like 100X faster.

I'll keep you posted with any advances I make.

cfernandezpa · 2020-10-19T18:59:27Z

Hi,

I noticed you closed this issue, which is fair since I was able to train using DeepLabCutCore. However, I'm not sure about the validity of the results of the training as I'm unable to evaluate it with either this version or using the GUI with version 2.2b8; there is a Key error after evaluation started. Also, the available options in the Core version are limited as you know.

So, my question is, should there be another issue open to tackle DLC compatibility with RTX 3000 series cards? I'm willing to help as far as my skills allow.

Thanks!

MMathisLab · 2020-10-20T15:18:02Z

it's a good point; i'll reopen until it's really resolved; for now, also people can hopefully find the TF2.x branch!

However, I'm not sure about the validity of the results of the training as I'm unable to evaluate it with either this version or using the GUI with version 2.2b8;

correct - the branch is only up to date with 2.1.8.1! :) so when we roll up to 2.2x for TF that would work again.

cfernandezpa · 2020-10-31T03:17:05Z

Hi,

I have been testing some more and I have made some progress. I can confirm that the training works well with the following system settings:

Deeplabcutcore
CUDA 11.1
Cudnn 8.0.4.30
Drivers 456.71
Tensorflow tf-nightly-gpu 2.5.0.dev20201019

I had Deeplabcut and TF installed in a Python environment (not Anaconda) and I was able to train, evaluate, analyze and create a video. I enconunter an issue where the video analysis was running very low, which makes me think that the GPU was not fully engaged in this part.

Hopefully the full version, including the GUI would be available soon.

Thanks!

MMathisLab · 2020-12-05T17:34:21Z

Hi @cfernandezpa please also check out the blog post/ the branch is now working (and a colab notebook): http://www.mousemotorlab.org/deeplabcutblog/2020/11/23/rolling-up-to-tensorflow-2

In general, deeplabcutcore will be the package to use for TF2 support from now on; you can indeed have deeplabcut and deeplabcutcore in the same env, check out the colab on how to easily imort deeplabcutcore in the workflow; it's all the same functionality as "normal" dlc (and then deeplabcut is the GUIs, etc).

https://github.com/DeepLabCut/DeepLabCut-core/blob/tf2.2alpha/Colab_TrainNetwork_VideoAnalysis_TF2.ipynb

I think then I can close this issue, since it can support 3090 training now (woo hoo)

Gittinator · 2020-12-10T18:40:03Z

Is there a guide available on how get this set up? I'm also trying to use deeplabcut with an RTX 3090. I've got CUDA 11.1 on WIndows. I made a new conda environment, with python 3.7, and installed deeplabcutcore and tf-nightly-gpu.

When I go to import deeplabcutcore, it says: No module named 'tensorflow.contrib'. This seems like it wants TF1?

cfernandezpa · 2020-12-10T19:11:12Z

Hi @cfernandezpa please also check out the blog post/ the branch is now working (and a colab notebook): http://www.mousemotorlab.org/deeplabcutblog/2020/11/23/rolling-up-to-tensorflow-2

In general, deeplabcutcore will be the package to use for TF2 support from now on; you can indeed have deeplabcut and deeplabcutcore in the same env, check out the colab on how to easily imort deeplabcutcore in the workflow; it's all the same functionality as "normal" dlc (and then deeplabcut is the GUIs, etc).

https://github.com/DeepLabCut/DeepLabCut-core/blob/tf2.2alpha/Colab_TrainNetwork_VideoAnalysis_TF2.ipynb

I think then I can close this issue, since it can support 3090 training now (woo hoo)

Thank you very much for your message and for your work/support of DLC as well! That is Awesome news!

cfernandezpa · 2020-12-10T19:14:03Z

Is there a guide available on how get this set up? I'm also trying to use deeplabcut with an RTX 3090. I've got CUDA 11.1 on WIndows. I made a new conda environment, with python 3.7, and installed deeplabcutcore and tf-nightly-gpu.

When I go to import deeplabcutcore, it says: No module named 'tensorflow.contrib'. This seems like it wants TF1?

Hello,

I was not able to make it to work under a conda environment, so I made a Python environment. The difference is that under conda, CuDDN is limited to 7.6 (I think) and with the other one you can update to 8. I believe that it was the source of my problem. Try that and see if that solves your issue.

DuanWei-fudan · 2020-12-19T04:47:41Z

@Gittinator
I got the same problem. Do you solve it?
If you did,could you please tell you how can I solve it ?

SabriQ · 2020-12-25T03:04:15Z

Hi @cfernandezpa please also check out the blog post/ the branch is now working (and a colab notebook): http://www.mousemotorlab.org/deeplabcutblog/2020/11/23/rolling-up-to-tensorflow-2
In general, deeplabcutcore will be the package to use for TF2 support from now on; you can indeed have deeplabcut and deeplabcutcore in the same env, check out the colab on how to easily imort deeplabcutcore in the workflow; it's all the same functionality as "normal" dlc (and then deeplabcut is the GUIs, etc).
https://github.com/DeepLabCut/DeepLabCut-core/blob/tf2.2alpha/Colab_TrainNetwork_VideoAnalysis_TF2.ipynb
I think then I can close this issue, since it can support 3090 training now (woo hoo)

Thank you very much for your message and for your work/support of DLC as well! That is Awesome news!

@MMathisLab I met the same problem. my question is do I really need to link the GOOGLE-DRIVE firstly , following the "create a training dataset" in COLAB?
my labeled data is quite huge and somehow the uploading is quit slow.
I'm wondering what makes the COLAB neccessary to create and train the dataset if we already get a local GPU.
Thanks for your job. it do helps a lot.

DuanWei-fudan · 2020-12-25T07:16:47Z

When I used the GPU to train my network, the software stopped at the _start training..._I saw the GPU is working, but the iteration: 10 loss: 0.2167 lr: 0.005 disappeared. It shunted down after a well.
When I changed GPU to CPU, it worked normally. How can I solve it? I can't understand it.

MMathisLab · 2021-01-01T20:30:50Z

@SabriQ you can use the branch of dlc-core on your own machine, but see how to install it from the top of the colab notebook.

runninghsus · 2021-01-21T22:30:42Z

Hi @MMathisLab

I just want to comment on making this work.
First, for tensorflow to be built with the new NVIDIA RTX 3090, this reddit post contains a step-by-step tutorial:
https://www.reddit.com/r/tensorflow/comments/jsalkw/rtx_3090_and_tensorflow_for_windows_10_step_by/

Second, I used the easy-install for DLC-GPU for windows.

Finally, I installed
tf2: https://pypi.org/project/tf-nightly-gpu/2.4.0.dev20201019/
DeepLabCutCore: pip install deeplabcutcore

and basically just used import deeplabcutcore as deeplabcut for training steps.

Hope this helps!

DuanWei-fudan · 2021-02-27T12:56:42Z

@runninghsus hello，when I pip install deeplabcutcore,there are some mistakes:
_WARNING: Discarding https://files.pythonhosted.org/packages/26/04/8b381d5b166508cc258632b225adbafec49bbe69aa9a4fa1f1b461428313/matplotlib-3.0.3.tar.gz#sha256=e1d33589e32f482d0a7d1957bf473d43341115d40d33f578dad44432e47df7b7 (from https://pypi.org/simple/matplotlib/) (requires-python:>=3.5). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Collecting deeplabcutcore
Using cached deeplabcutcore-0.0b2-py3-none-any.whl (172 kB)
Using cached deeplabcutcore-0.0b1-py3-none-any.whl (171 kB)
ERROR: Cannot install deeplabcutcore==0.0b1, deeplabcutcore==0.0b2 and deeplabcutcore==0.0b3 because these package versions have conflicting dependencies.

The conflict is caused by:
deeplabcutcore 0.0b3 depends on matplotlib==3.0.3
deeplabcutcore 0.0b2 depends on matplotlib==3.0.3
deeplabcutcore 0.0b1 depends on matplotlib==3.0.3

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies_

runninghsus · 2021-02-28T01:15:50Z

@DuanWei-fudan
Your issue does not seem familiar to me, could it be that your tf version is wrong? Check that by doing the following in ipython

import tensorflow as tf
tf.__version__

Regardless, you will have to use the alpha version, it's a different version that the one on pypi.
I also realized they merged the nightly-build into 2.4.1
So the steps I just had success last night, assuming you have 3090 and the proper driver installed following that reddit link I posted, was

upon doing the easy-install and activating DLC-GPU

pip install tensorflow==2.4.1
pip install git+https://github.com/DeepLabCut/DeepLabCut-core.git@tf2.2alpha
pip install tf_slim

and make sure when you run command lines you do
import deeplabcutcore as deeplabcut
with potentially necessary tf import:
import tensorflow as tf

all the steps that does not use GUI (labeling images, etc.) should work fine with deeplabcutcore (deeplabcut.create_training_dataset(), deeplabcut.train_network(), deeplabcut.analyze_videos(), deeplabcut.create_labeled_videos()) that is, the regular deeplabcut (tf==1) can be used to label images, the training, analyses and labeled videos can be done with deeplabcutcore (tf==2)

if you need help, I may write a blog post specifically on this. I'll keep this post updated with the blog link

DuanWei-fudan · 2021-02-28T06:46:49Z

@runninghsus
Well,thank you very much.I can train my network with CPU normally!!!!
So how can I use my GPU?

DuanWei-fudan · 2021-02-28T07:26:57Z

OMG!
I did it!
I can use my GPU right now!

xtzhou25 · 2021-03-04T10:19:48Z

@DuanWei-fudan
Hi, I met the same error when I pip install deeplabcutcore, so I tried pip install git+https://github.com/DeepLabCut/DeepLabCut-core.git@tf2.2alpha but it showed a similar error about matplotlib==3.0.3. Then I tried pip install matplotlib==3.0.3, but it became more confused because the same error occurred.
So I wonder what did you do to fix this problem. Thanks!

MMathisLab · 2021-03-16T12:42:10Z

@xtzhou25 be sure you use python 3.7, make a new conda env and pip install deeplabcutcore + pip install tf_slim as this was updated yesterday as well.

xtzhou25 · 2021-03-16T12:54:21Z

@xtzhou25 be sure you use python 3.7, make a new conda env and pip install deeplabcutcore + pip install tf_slim as this was updated yesterday as well.

Thanks! That works!
Python=3.7; tensorflow=2.4.1; tf-slim=1.1.0; wxpython=4.0.4 (and always set English as system language on Windows). Now I can train network by import deeplabcutcore and do the rest of works by python -m deeplabcut.

MaloM-CVision · 2021-03-19T13:26:49Z

Hi, I can't find a way to get the proper config. With tensorflow=2.4.1, Python=3.7.10, and deeplabcutcore i still have conflict with numpy version (tensorflow needs 1.19 and dlc needs 1.16). Can someone share his environment.yml, so that i can copy his environment that works ?
Thanks in advance

MMathisLab · 2021-03-19T14:21:31Z

you need to run tensorflow==2.4 to install it. but I would recommend 2.2

xtzhou25 · 2021-03-20T09:16:30Z

@MaloM-CVision
Hi, I double checked my env:
python==3.7.10 tensorflow==2.4.1 numpy==1.16.4
I successfully installed Deeplabcut==2.1.10.2 and Deeplabcutcore in this env.
But I also met some compatibility problem when using them. Most of the errors were about tensorflow version. I just googled the errors one by one and modified the documentx. It won't take too long. Now I still have some warnings occasionally, but it works on my 3090 now.

MMathisLab · 2021-03-20T09:59:58Z

@xtzhou25 indeed best not to have deeplabcut and deeplabcut core in the same environment! Just core with TF2. If you need guis, then just use the dlc-cpu conda file in a separate environment. You can open the project in both! :)

MaloM-CVision · 2021-03-22T11:05:36Z

thanks a lot @MMathisLab , i finally get my env running :)

NejcKejzar · 2021-03-22T20:30:59Z

Hi @MMathisLab and others! I've been pouring over this thread and still can't get my RTX 3070 GPU to work with deeplabcutcore.

Summary:
Specs: Ubuntu 20.04 LTS + NVIDIA GeForce RTX 3070

First, I made sure to remove all NVIDIA and CUDA files from the PC (there's been a lot of tinkering to try to make things work):

> sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*" 
> sudo apt-get --purge remove "*nvidia*"
> sudo apt-get autoremove
> sudo apt-get autoclean

Next, I installed the latest NVIDIA drivers: sudo apt-get install nvidia-driver-460 (exact version as verified with nvidia-smi: 460.32.03)
Next, I installed the latest CUDA 11.2 with the local .deb file, by following instructions here. After installation was successful, I added the following two lines to my ~/.bashrc file:

export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

I verified the installation with nvcc -V (returned Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0) as well as verifying installation by compiling examples. Running deviceQuery recognized the correct GPU and returned a test=PASS status.

Next, I installed the latest cuDNN8.1.1 (compatible with CUDA 11.2), by following installing from .tar instructions here. I noticed, that the instructions suggest copying the cuDNN files to /usr/local/cuda; I changed this to /usr/local/cuda-11.2.
After all this, I rebooted the PC and created a new conda environment with Python 3.7 as suggested here:

> conda create -n test4 python=3.7
> conda activate test4
> pip install tensorflow==2.4
> pip install deeplabcutcore
> pip install tf_slim

I verified, that the TF recognizes the GPU:

> import tensorflow as tf
> tf.__version__
'2.4.0'
> tf.config.experimental.list_physical_devices()

This last recognized the GPU, but did not find a certain library 'libcusolver.so.10':

This has been a known issue here. By hard-linking the missing libcusolver.so.10 with the installed libcusolver.so.11, this issue was solved and the GPU was successfully detected:

> cd $LD_LIBRARY_PATH
> sudo ln libcusolver.so.11 libcusolver.so.10  # hard link

Lastly, I got to my code, which I run from a Jupyter notebook. I am batch analyzing videos by calling:

import deeplabcutcore as dlc
dlc.analyze_videos(NN_config_path,
                   videos=[video_dir],
                   videotype='mp4',
                   shuffle=shuffle,
                   gputouse=0,
                   dynamic=(True, 0.5, 10),
                   destfolder=analysis_dir)

Running this engages the GPU as seen from nvidia-smi:

But returns quite a massive error log:

Looking at the terminal output reveals a more compact error:

This is where I don't know how to proceed. I double-checked my $PATH and $LD_LIBRARY_PATH variables to see they point to the right directories:

> echo $PATH
/usr/local/cuda-11.2/bin:/home/nejc_pc2/anaconda3/envs/test4/bin:/home/nejc_pc2/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

> echo $LD_LIBRARY_PATH
/usr/local/cuda-11.2/lib64

Checking the $LD_LIBRARY_PATH directory shows that the latest cublas libraries are there:

So, I am not sure why cublas would be giving this error. Have I missed anything in the above steps? I apologize for the long post, but hopefully, this will also help other Ubuntu users getting DLC to work with 3000-series GPUs.

UPDATE [SOLVED]:
Finally solved this issue! Allowing memory growth in the same Jupyter notebook cell from which dlc.analyze_videos is run, did the trick:

import deeplabcutcore as dlc
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
dlc.analyze_videos(NN_config_path,
                   videos=[video_dir],
                   videotype='mp4',
                   shuffle=shuffle,
                   gputouse=0,
                   dynamic=(True, 0.5, 10),
                   destfolder=analysis_dir)

Hope this helps somebody else as well!

NejcKejzar · 2021-04-12T15:31:00Z

UPDATE: I've managed to start analyzing videos by relying on conda's installation of tensorflow-gpu, which also automatically installs compatible cuda-toolkit and cudnn versions within the anaconda environment. Specifically:

> conda create -n test6 python=3.7
> conda activate test6
> conda install tensorflow-gpu
> pip install deeplabcutcore
> pip install tf_slim

In this installation the GPU is again recognized as checked with:

> import tensorflow as tf
> tf.__version__
'2.4.1'
> tf.config.experimental.list_physical_devices()
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

When I run dlc.analyze_videos as above, the above error does not arise - it seems DLC is frozen for some time (several minutes) first on Initializing ResNet and then on Starting to extract posture.... After a while, the videos start getting analyzed and all seems good - nvidia-smi shows the GPU is engaged (the majority of memory is allocated to the conda environment from which I am running the code as well as the power spikes to >100W/220W; rest power is around 15W/220W), and the analysis rapidly progresses (>130 it/s). However, when I open the generated .h5 files, they seem completely off:

This is what the actual result is:

I also notice a strange output includingmetadata.pickle which is not there in regular DLC2.1.9, but I guess this could be a feature of deeplabcutcore:

Lastly, I checked the installed cudatoolkit and cudnn in the conda environment:

> conda list cudnn
# packages in environment at /home/nejc_pc2/anaconda3/envs/test12:
#
# Name                    Version                   Build  Channel
cudnn                     7.6.5                cuda10.1_0

Now, this I found very strange, because TF2.4.1 should only work with latest cuda toolkit and cudnn. Just to double check, I also looked at the tensorflow installation in the conda environment:

> conda list tensorflow
# packages in environment at /home/nejc_pc2/anaconda3/envs/test12:
#
# Name                    Version                   Build  Channel
tensorflow                2.4.1           gpu_py37ha2e99fa_0  
tensorflow-base           2.4.1           gpu_py37h29c2da4_0  
tensorflow-estimator      2.4.1              pyheb71bc4_0  
tensorflow-gpu            2.4.1                h30adc30_0

So it seems to me that anaconda automatically installs the incorrect version of cuda toolkit and cudnn. I guess that is why people suggest installing cudnn and cuda toolkit manually and tensorflow with pip.

rlinus · 2021-04-16T14:55:58Z

Hi

I found a way to run the DeepLabCut version based on Tensorflow 1.x on my RTX3090. A docker container based on a tensorflow docker image from nvidia that comes with tensorflow 1.5 compiled with CUDA 11. You can test it with the following dockerfile that I made:

##########################################
# Dockerfile for DeepLabCut GPU training #
##########################################

#We use Tensorflow v1.5.5 for deeplabcut
FROM nvcr.io/nvidia/tensorflow:21.03-tf1-py3

# install needed tools
RUN apt-get update && apt-get install -y wget ffmpeg

# install deeplabcut
RUN python3 -m pip install deeplabcut --no-cache-dir

# download git repo
RUN git clone https://github.com/AlexEMG/DeepLabCut /root/DeepLabCut/

WORKDIR /root

#Instructions:
#cd in folder with this file and then build the docker image with
#   docker build -t dlc .
#then run the image and open interactive shell with:
#   docker run --gpus all -it --rm dlc /bin/bash
#run testscript
#   python3 /root/DeepLabCut/examples/testscript.py

ARHassett · 2021-05-13T08:44:51Z

OS: Win 10
DeepLabCut Version: 2.2b8
Anaconda env used: DLC-GPU (cloned from Alex's github)
WxPython version: 4.0.7.post2
Tensorflow version: many, installed with pip (see below)
Cuda version: 10 and 11

Hi everyone,

First of all, I wanted to thank all the authors for this amazing software!

I'm starting to work with DeepLabCut and after a few promising preliminary results with an "old" GPU (Turing architecture), we decided to upgrade to the recent Ampere architecture. Since it is also backwards compatible with old CUDA versions, we thought that it would be fine. However, after trying many combinations of Tensorflow and CUDA, I cannot make it to work. Here are the combinations I have tried so far:

Cuda | Tensorflow | Cudnn | Works?

10 | 1.15.2 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
10 | 1.15.0 | 7.6.5 | Same as with tf 1.15.2
10 | 1.14.0 | 7.6.5 | Does not detect GPU
11 | 1.15.0 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.13.1 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.14.0 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.15.4 | 7.6.5 | Does not detect GPU
11 | 1.15.2 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
*Tensorflow 1.13.1 does not detect the GPU either.

Using the combinations mentioned above that recognizes the GPU, and can print "Hello, Tensorflow", I ended up stuck at this screen (see code below).

I know that in the documentation says that CUDA 10.+ is not supported, but with the old card we had, it was running fine with CUDA 11. I have very limited knowledge about this, so not sure why/how it worked.

Reading in CUDA documentation it says that Ampere architecture is compatible with CUDA 10.2 or earlier. Also, according to Tensorflow documentation, Tensorflow 1.15 should be compatible with ampere. The only caveat is that it takes too long to start (up to 30 min) but that can be fixed by increasing the cuda cache size.

So, to me, the only thing left that could be giving issues is Cudnn. According to Nvidia, support for Ampere only appeared in Cudnn 8. However, as far as I know, Anaconda only supports up to Cudnn 7.6.5 on Windows. Apparently it has reached Cudnn 8 on Linux.

Code output
Upon reading in some forums, some people have been succesful using Symlink in other applications, so I tried that with Cudnn64_7.dll and hardlinked to Cudnn64_8.dll inside DLC-GPU enviroment, but I have not been able to make it work. It shows an error saying that compute capabilities does not match.

Do you have any suggestion that I might try?

Many thanks in advance.

It appears here that you are running multi-animal DLC using deeplabcutcore, is this correct? I'm having issues running any maDLC related functions at the moment with it.

mschart · 2021-05-19T09:25:30Z

rlinus' solution worked well for me on Ubuntu 20.04. See here for some more details.

MMathisLab · 2021-05-19T11:23:41Z

Thanks @mschart ! BTW, we have new DeepLabCut dockers here: https://github.com/stes/deeplabcut-docker so perhaps those we be most useful for the IBL workflow too.

rlinus · 2021-05-20T14:05:49Z

Thanks @mschart ! BTW, we have new DeepLabCut dockers here: https://github.com/stes/deeplabcut-docker so perhaps those we be most useful for the IBL workflow too.

Those Docker images are based on the official Google Tensorflow 1.15 builds, that do not work with RTX 30xx GPUs (because of noncompatible CUDA versions). The dockerfile that I posted works with RTX 30xx GPUs.

patakihara · 2021-06-04T12:57:46Z

Is there a guide available on how get this set up? I'm also trying to use deeplabcut with an RTX 3090. I've got CUDA 11.1 on WIndows. I made a new conda environment, with python 3.7, and installed deeplabcutcore and tf-nightly-gpu.

When I go to import deeplabcutcore, it says: No module named 'tensorflow.contrib'. This seems like it wants TF1?

Don't know if you managed to set this up or not, but after I figured it out I wrote up a short guide on how to run DeepLabCut on an RTX 3090. Here's the link: https://hackmd.io/@guilhermepata/r1U__n89O

ARHassett · 2021-06-04T13:01:09Z

This is really nice. Any luck getting multi-animal DLC working on it?

…

On Fri, Jun 4, 2021 at 1:58 PM guilhermepata ***@***.***> wrote: Is there a guide available on how get this set up? I'm also trying to use deeplabcut with an RTX 3090. I've got CUDA 11.1 on WIndows. I made a new conda environment, with python 3.7, and installed deeplabcutcore and tf-nightly-gpu. When I go to import deeplabcutcore, it says: No module named 'tensorflow.contrib'. This seems like it wants TF1? Don't know if you managed to set this up or not, but after I figured it out I wrote up a short guide on how to run DeepLabCut on an RTX 3090. Here's the link: ***@***.***/r1U__n89O — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#944 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARWMSWKSQWFAUGBMVSBYHF3TRDEV3ANCNFSM4SE35PPA> .

-- Amy Hassett

Dashbrook · 2021-06-07T17:30:16Z

Is there a guide available on how get this set up? I'm also trying to use deeplabcut with an RTX 3090. I've got CUDA 11.1 on WIndows. I made a new conda environment, with python 3.7, and installed deeplabcutcore and tf-nightly-gpu.
When I go to import deeplabcutcore, it says: No module named 'tensorflow.contrib'. This seems like it wants TF1?

Don't know if you managed to set this up or not, but after I figured it out I wrote up a short guide on how to run DeepLabCut on an RTX 3090. Here's the link: https://hackmd.io/@guilhermepata/r1U__n89O

I just want to say that Solution 2 here worked beautifully for me! Thank you!

patakihara · 2021-06-08T11:14:58Z

This is really nice. Any luck getting multi-animal DLC working on it?
-- Amy Hassett

I haven't tested multi-animal 😕

MMathisLab · 2021-06-08T12:09:50Z

maDLC is not yet supported in DLC core; it will be supported soon though, in this repo.

ARHassett · 2021-06-08T13:51:02Z

Fantastic! I’ll wait for it so!

…

On Jun 8, 2021 at 1:10 p.m., <Mackenzie Mathis ***@***.***)> wrote: maDLC is not yet supported in DLC core; it will be supported soon though, in this repo. — You are receiving this because you commented. Reply to this email directly, view it on GitHub (#944 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/ARWMSWNOAIISLEGFTVJB4E3TRYCB5ANCNFSM4SE35PPA).

AlexEMG · 2022-01-04T06:00:52Z

Just fyi - It's now supported as TF 2.* has been integrated in the main repo

MMathisLab assigned AlexEMG Oct 16, 2020

MMathisLab closed this as completed Oct 19, 2020

MMathisLab reopened this Oct 20, 2020

MMathisLab added backwards compatibility issues concerning prior to current versions tensorflow/training WORK IN PROGRESS! developers are currently working on this feature... stay tuned. labels Oct 20, 2020

MMathisLab added this to Deeplabcutcore in Unbundling DLC gui and DLCcore Dec 5, 2020

MMathisLab mentioned this issue Dec 6, 2020

InvalidArgumentError: LossTensor is inf or nan : Tensor had NaN values #1017

Closed

DuanWei-fudan mentioned this issue Dec 8, 2020

cannot import name 'GceClusterResolver' #1024

Closed

MMathisLab removed the WORK IN PROGRESS! developers are currently working on this feature... stay tuned. label Jan 1, 2021

MMathisLab assigned MMathisLab and unassigned AlexEMG Jan 6, 2021

PlatinumYao mentioned this issue Jan 12, 2021

Compatibility with RTX 3080 DeepLabCut/DeepLabCut-core#13

Open

Marti-R mentioned this issue Feb 20, 2021

Very slow training with RTX 3090 #1119

Closed

MMathisLab mentioned this issue Mar 13, 2021

All predictions place in top left-hand corner [ RTX 3*** does NOT work with TensorFlow 1.x! == odd errors! Please use deeplabcutcore ] #1142

Closed

MMathisLab closed this as completed Mar 16, 2021

Cannot get RTX 3090 card to start training #944

Cannot get RTX 3090 card to start training #944

Comments

cfernandezpa commented Oct 5, 2020

MMathisLab commented Oct 11, 2020

cfernandezpa commented Oct 12, 2020

MMathisLab commented Oct 16, 2020

cfernandezpa commented Oct 16, 2020

cfernandezpa commented Oct 17, 2020

cfernandezpa commented Oct 17, 2020

cfernandezpa commented Oct 19, 2020

MMathisLab commented Oct 20, 2020 • edited

cfernandezpa commented Oct 31, 2020

MMathisLab commented Dec 5, 2020 • edited

Gittinator commented Dec 10, 2020

cfernandezpa commented Dec 10, 2020

cfernandezpa commented Dec 10, 2020

DuanWei-fudan commented Dec 19, 2020

SabriQ commented Dec 25, 2020

DuanWei-fudan commented Dec 25, 2020

MMathisLab commented Jan 1, 2021

runninghsus commented Jan 21, 2021

DuanWei-fudan commented Feb 27, 2021

runninghsus commented Feb 28, 2021 • edited

DuanWei-fudan commented Feb 28, 2021

DuanWei-fudan commented Feb 28, 2021

xtzhou25 commented Mar 4, 2021 • edited

MMathisLab commented Mar 16, 2021

xtzhou25 commented Mar 16, 2021

MaloM-CVision commented Mar 19, 2021 • edited

MMathisLab commented Mar 19, 2021

xtzhou25 commented Mar 20, 2021

MMathisLab commented Mar 20, 2021

MaloM-CVision commented Mar 22, 2021

NejcKejzar commented Mar 22, 2021 • edited

NejcKejzar commented Apr 12, 2021 • edited

rlinus commented Apr 16, 2021

ARHassett commented May 13, 2021

mschart commented May 19, 2021

MMathisLab commented May 19, 2021

rlinus commented May 20, 2021

patakihara commented Jun 4, 2021

ARHassett commented Jun 4, 2021 via email

Dashbrook commented Jun 7, 2021

patakihara commented Jun 8, 2021

MMathisLab commented Jun 8, 2021

ARHassett commented Jun 8, 2021 via email

AlexEMG commented Jan 4, 2022

MMathisLab commented Oct 20, 2020 •

edited

MMathisLab commented Dec 5, 2020 •

edited

runninghsus commented Feb 28, 2021 •

edited

xtzhou25 commented Mar 4, 2021 •

edited

MaloM-CVision commented Mar 19, 2021 •

edited

NejcKejzar commented Mar 22, 2021 •

edited

NejcKejzar commented Apr 12, 2021 •

edited