-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot get RTX 3090 card to start training #944
Comments
We just got a 3090 in the lab this week; so we can test it. But in general what I would suggest is running our testscripts always as a first pass after installation. https://www.youtube.com/watch?v=IOWtKn3l33s https://github.com/DeepLabCut/DeepLabCut/tree/master/examples |
Thank you very much for the reply. That is great news, hopefully you would be able to make it work! I won't be able to try testscripts this week but I'll do the other week for sure and I'll report back. Thanks again! |
sorry we haven't gotten to this yet; but you might try our dev branch with TF2.x--> https://github.com/DeepLabCut/DeepLabCut-core/tree/tf2.2alpha |
Hi, I have tried the testscript with version 2.2b8 and it stops at the same point to when I tried with my data set. Thanks! |
Hi, I am currently trying with the dev branch and I was able to start training. However, it was far from ideal. First, I learned that TF2.2 does not work with CUDA 11 (let alone 11.1), so it won't recognize the GPU. So, I had to install CUDA 10.1 which is supposed to be the version that works with TF2.2. That change made the system to recognize the GPU. Then, training took a long time to start but it engaged the GPU as seen in Task manager. A warning message was shown about PTX compiling been done by the driver (I cannot find the original message in the training log), after which training started but it was very slow. Also, the reduction of the "loss" value after each iteration seems smaller than I remembered, but I have no objective way to confirm this. In any case, I was able to train for 10000 iterations which is a good progress. I think one possible solution is to compile TF2.2 or 2.3 with CUDA 11.1 from sources, but I don't know how to do that in Windows. I found an article on how to do it for Linux (https://towardsdatascience.com/how-to-compile-tensorflow-2-3-with-cuda-11-1-8cbecffcb8d3). Could you please advice on this matter? If I find anything else, I'll post it here. Thanks! |
Little update. I noticed that I had "gputouse=0", so I changed it to 1 and started much faster and it is training like 100X faster. I'll keep you posted with any advances I make. |
Hi, I noticed you closed this issue, which is fair since I was able to train using DeepLabCutCore. However, I'm not sure about the validity of the results of the training as I'm unable to evaluate it with either this version or using the GUI with version 2.2b8; there is a Key error after evaluation started. Also, the available options in the Core version are limited as you know. So, my question is, should there be another issue open to tackle DLC compatibility with RTX 3000 series cards? I'm willing to help as far as my skills allow. Thanks! |
it's a good point; i'll reopen until it's really resolved; for now, also people can hopefully find the TF2.x branch!
correct - the branch is only up to date with 2.1.8.1! :) so when we roll up to 2.2x for TF that would work again. |
Hi, I have been testing some more and I have made some progress. I can confirm that the training works well with the following system settings: Deeplabcutcore I had Deeplabcut and TF installed in a Python environment (not Anaconda) and I was able to train, evaluate, analyze and create a video. I enconunter an issue where the video analysis was running very low, which makes me think that the GPU was not fully engaged in this part. Hopefully the full version, including the GUI would be available soon. Thanks! |
Hi @cfernandezpa please also check out the blog post/ the branch is now working (and a colab notebook): http://www.mousemotorlab.org/deeplabcutblog/2020/11/23/rolling-up-to-tensorflow-2 In general, I think then I can close this issue, since it can support 3090 training now (woo hoo) |
Is there a guide available on how get this set up? I'm also trying to use deeplabcut with an RTX 3090. I've got CUDA 11.1 on WIndows. I made a new conda environment, with python 3.7, and installed deeplabcutcore and tf-nightly-gpu. When I go to import deeplabcutcore, it says: No module named 'tensorflow.contrib'. This seems like it wants TF1? |
Thank you very much for your message and for your work/support of DLC as well! That is Awesome news! |
Hello, I was not able to make it to work under a conda environment, so I made a Python environment. The difference is that under conda, CuDDN is limited to 7.6 (I think) and with the other one you can update to 8. I believe that it was the source of my problem. Try that and see if that solves your issue. |
@Gittinator |
@MMathisLab I met the same problem. my question is do I really need to link the GOOGLE-DRIVE firstly , following the "create a training dataset" in COLAB? |
When I used the GPU to train my network, the software stopped at the _start training..._I saw the GPU is working, but the iteration: 10 loss: 0.2167 lr: 0.005 disappeared. It shunted down after a well. |
@SabriQ you can use the branch of dlc-core on your own machine, but see how to install it from the top of the colab notebook. |
Hi @MMathisLab I just want to comment on making this work. Second, I used the easy-install for DLC-GPU for windows. Finally, I installed and basically just used import deeplabcutcore as deeplabcut for training steps. Hope this helps! |
@runninghsus hello,when I pip install deeplabcutcore,there are some mistakes: The conflict is caused by: To fix this you could try to:
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies_ |
@DuanWei-fudan
Regardless, you will have to use the alpha version, it's a different version that the one on pypi. upon doing the easy-install and activating DLC-GPU
and make sure when you run command lines you do all the steps that does not use GUI (labeling images, etc.) should work fine with deeplabcutcore if you need help, I may write a blog post specifically on this. I'll keep this post updated with the blog link |
@runninghsus |
OMG! |
@DuanWei-fudan |
@xtzhou25 be sure you use python 3.7, make a new conda env and |
Thanks! That works! |
Hi, I can't find a way to get the proper config. With tensorflow=2.4.1, Python=3.7.10, and deeplabcutcore i still have conflict with numpy version (tensorflow needs 1.19 and dlc needs 1.16). Can someone share his |
you need to run |
@MaloM-CVision |
@xtzhou25 indeed best not to have deeplabcut and deeplabcut core in the same environment! Just core with TF2. If you need guis, then just use the dlc-cpu conda file in a separate environment. You can open the project in both! :) |
thanks a lot @MMathisLab , i finally get my env running :) |
Hi @MMathisLab and others! I've been pouring over this thread and still can't get my RTX 3070 GPU to work with Summary:
I verified the installation with
This last recognized the GPU, but did not find a certain library This has been a known issue here. By hard-linking the missing
Running this engages the GPU as seen from But returns quite a massive error log: Looking at the terminal output reveals a more compact error: This is where I don't know how to proceed. I double-checked my
Checking the So, I am not sure why cublas would be giving this error. Have I missed anything in the above steps? I apologize for the long post, but hopefully, this will also help other Ubuntu users getting DLC to work with 3000-series GPUs. UPDATE [SOLVED]:
Hope this helps somebody else as well! |
Hi I found a way to run the DeepLabCut version based on Tensorflow 1.x on my RTX3090. A docker container based on a tensorflow docker image from nvidia that comes with tensorflow 1.5 compiled with CUDA 11. You can test it with the following dockerfile that I made:
|
It appears here that you are running multi-animal DLC using deeplabcutcore, is this correct? I'm having issues running any maDLC related functions at the moment with it. |
rlinus' solution worked well for me on Ubuntu 20.04. See here for some more details. |
Thanks @mschart ! BTW, we have new DeepLabCut dockers here: https://github.com/stes/deeplabcut-docker so perhaps those we be most useful for the IBL workflow too. |
Those Docker images are based on the official Google Tensorflow 1.15 builds, that do not work with RTX 30xx GPUs (because of noncompatible CUDA versions). The dockerfile that I posted works with RTX 30xx GPUs. |
Don't know if you managed to set this up or not, but after I figured it out I wrote up a short guide on how to run DeepLabCut on an RTX 3090. Here's the link: https://hackmd.io/@guilhermepata/r1U__n89O |
This is really nice. Any luck getting multi-animal DLC working on it?
…On Fri, Jun 4, 2021 at 1:58 PM guilhermepata ***@***.***> wrote:
Is there a guide available on how get this set up? I'm also trying to use
deeplabcut with an RTX 3090. I've got CUDA 11.1 on WIndows. I made a new
conda environment, with python 3.7, and installed deeplabcutcore and
tf-nightly-gpu.
When I go to import deeplabcutcore, it says: No module named
'tensorflow.contrib'. This seems like it wants TF1?
Don't know if you managed to set this up or not, but after I figured it
out I wrote up a short guide on how to run DeepLabCut on an RTX 3090.
Here's the link: ***@***.***/r1U__n89O
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#944 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARWMSWKSQWFAUGBMVSBYHF3TRDEV3ANCNFSM4SE35PPA>
.
--
Amy Hassett
|
I just want to say that Solution 2 here worked beautifully for me! Thank you! |
I haven't tested multi-animal 😕 |
maDLC is not yet supported in DLC core; it will be supported soon though, in this repo. |
Fantastic! I’ll wait for it so!
…
On Jun 8, 2021 at 1:10 p.m., <Mackenzie Mathis ***@***.***)> wrote:
maDLC is not yet supported in DLC core; it will be supported soon though, in this repo.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub (#944 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/ARWMSWNOAIISLEGFTVJB4E3TRYCB5ANCNFSM4SE35PPA).
|
Just fyi - It's now supported as TF 2.* has been integrated in the main repo |
OS: Win 10
DeepLabCut Version: 2.2b8
Anaconda env used: DLC-GPU (cloned from Alex's github)
WxPython version: 4.0.7.post2
Tensorflow version: many, installed with pip (see below)
Cuda version: 10 and 11
Hi everyone,
First of all, I wanted to thank all the authors for this amazing software!
I'm starting to work with DeepLabCut and after a few promising preliminary results with an "old" GPU (Turing architecture), we decided to upgrade to the recent Ampere architecture. Since it is also backwards compatible with old CUDA versions, we thought that it would be fine. However, after trying many combinations of Tensorflow and CUDA, I cannot make it to work. Here are the combinations I have tried so far:
Cuda | Tensorflow | Cudnn | Works?
10 | 1.15.2 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
10 | 1.15.0 | 7.6.5 | Same as with tf 1.15.2
10 | 1.14.0 | 7.6.5 | Does not detect GPU
11 | 1.15.0 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.13.1 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.14.0 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
11 | 1.15.4 | 7.6.5 | Does not detect GPU
11 | 1.15.2 | 7.6.5 | Recognizes GPU and run some tf test, but takes to long and ended up failing
*Tensorflow 1.13.1 does not detect the GPU either.
Using the combinations mentioned above that recognizes the GPU, and can print "Hello, Tensorflow", I ended up stuck at this screen (see code below).
I know that in the documentation says that CUDA 10.+ is not supported, but with the old card we had, it was running fine with CUDA 11. I have very limited knowledge about this, so not sure why/how it worked.
Reading in CUDA documentation it says that Ampere architecture is compatible with CUDA 10.2 or earlier. Also, according to Tensorflow documentation, Tensorflow 1.15 should be compatible with ampere. The only caveat is that it takes too long to start (up to 30 min) but that can be fixed by increasing the cuda cache size.
So, to me, the only thing left that could be giving issues is Cudnn. According to Nvidia, support for Ampere only appeared in Cudnn 8. However, as far as I know, Anaconda only supports up to Cudnn 7.6.5 on Windows. Apparently it has reached Cudnn 8 on Linux.
Code output
[Selecting multi-animal trainer
Config:
{'all_joints': [[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9],
[10],
[11],
[12]],
'all_joints_names': ['snout',
'cap',
'leftear',
'rightear',
'spine',
'lforepaw',
'rforepaw',
'lhindpaw',
'rhindpaw',
'tailbase',
'tailend',
'cornerofbox1',
'cornerofbox2'],
'batch_size': 8,
'crop_pad': 0,
'cropratio': 0.4,
'dataset': 'training-datasets\iteration-0\UnaugmentedDataSet_2CamTest9Oct4\2CamTest9_CF95shuffle3.pickle',
'dataset_type': 'multi-animal-imgaug',
'deterministic': False,
'display_iters': 500,
'fg_fraction': 0.25,
'global_scale': 0.8,
'init_weights': 'C:\Users\RyC\anaconda3\envs\dlc-gpu\lib\site-packages\deeplabcut\pose_estimation_tensorflow\models\pretrained\resnet_v1_50.ckpt',
'intermediate_supervision': False,
'intermediate_supervision_layer': 12,
'location_refinement': True,
'locref_huber_loss': True,
'locref_loss_weight': 0.05,
'locref_stdev': 7.2801,
'log_dir': 'log',
'max_input_size': 1500,
'mean_pixel': [123.68, 116.779, 103.939],
'metadataset': 'training-datasets\iteration-0\UnaugmentedDataSet_2CamTest9Oct4\Documentation_data-2CamTest9_95shuffle3.pickle',
'min_input_size': 64,
'mirror': False,
'multi_step': [[0.0001, 7500], [5e-05, 12000], [1e-05, 200000]],
'net_type': 'resnet_50',
'num_joints': 13,
'num_limbs': 55,
'optimizer': 'adam',
'pafwidth': 20,
'pairwise_huber_loss': False,
'pairwise_loss_weight': 0.1,
'pairwise_predict': False,
'partaffinityfield_graph': [[5, 9],
[4, 7],
[1, 3],
[6, 9],
[4, 8],
[5, 6],
[2, 8],
[0, 7],
[8, 9],
[1, 6],
[0, 10],
[3, 7],
[0, 3],
[2, 5],
[2, 4],
[5, 8],
[1, 2],
[4, 9],
[6, 7],
[2, 9],
[3, 10],
[6, 10],
[8, 10],
[1, 5],
[3, 6],
[0, 4],
[1, 10],
[7, 10],
[4, 10],
[2, 6],
[4, 5],
[1, 4],
[2, 10],
[9, 10],
[3, 9],
[0, 5],
[1, 9],
[2, 3],
[0, 8],
[3, 5],
[0, 1],
[2, 7],
[7, 9],
[7, 8],
[5, 10],
[4, 6],
[6, 8],
[5, 7],
[3, 8],
[0, 6],
[1, 8],
[1, 7],
[0, 9],
[3, 4],
[0, 2]],
'partaffinityfield_predict': True,
'pos_dist_thresh': 17,
'project_path': 'C:\Users\RyC\2CamTest9-CF-2020-10-04',
'regularize': False,
'rotation': 25,
'rotratio': 0.4,
'save_iters': 10000,
'scale_jitter_lo': 0.5,
'scale_jitter_up': 1.25,
'scoremap_dir': 'test',
'shuffle': True,
'snapshot_prefix': 'C:\Users\RyC\2CamTest9-CF-2020-10-04\dlc-models\iteration-0\2CamTest9Oct4-trainset95shuffle3\train\snapshot',
'stride': 8.0,
'weigh_negatives': False,
'weigh_only_present_joints': False,
'weigh_part_predictions': False,
'weight_decay': 0.0001}
Activating limb prediction...
Starting with multi-animal imaug + adam pose-dataset loader.
Batch Size is 8
Getting specs multi-animal-imgaug 55 13
Initializing ResNet
Loading ImageNet-pretrained resnet_50
2020-10-05 10:40:16.943131: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-10-05 10:40:16.946595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:08:00.0
2020-10-05 10:40:16.946675: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-10-05 10:40:16.948226: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-10-05 10:40:16.948570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-10-05 10:40:16.948928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-10-05 10:40:16.949263: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-10-05 10:40:16.949302: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-10-05 10:40:16.949559: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-10-05 10:40:16.949840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-10-05 10:40:17.963045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-05 10:40:17.963140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2020-10-05 10:40:17.964083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2020-10-05 10:40:17.964440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22071 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:08:00.0, compute capability: 8.6)
Max_iters overwritten as 3000
Display_iters overwritten as 10
Save_iters overwritten as 50
Training parameters:
{'stride': 8.0, 'weigh_part_predictions': False, 'weigh_negatives': False, 'fg_fraction': 0.25, 'mean_pixel': [123.68, 116.779, 103.939], 'shuffle': True, 'snapshot_prefix': 'C:\Users\RyC\2CamTest9-CF-2020-10-04\dlc-models\iteration-0\2CamTest9Oct4-trainset95shuffle3\train\snapshot', 'log_dir': 'log', 'global_scale': 0.8, 'location_refinement': True, 'locref_stdev': 7.2801, 'locref_loss_weight': 0.05, 'locref_huber_loss': True, 'optimizer': 'adam', 'intermediate_supervision': False, 'intermediate_supervision_layer': 12, 'regularize': False, 'weight_decay': 0.0001, 'crop_pad': 0, 'scoremap_dir': 'test', 'batch_size': 8, 'dataset_type': 'multi-animal-imgaug', 'deterministic': False, 'mirror': False, 'pairwise_huber_loss': False, 'weigh_only_present_joints': False, 'partaffinityfield_predict': True, 'pairwise_predict': True, 'all_joints': [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]], 'all_joints_names': ['snout', 'cap', 'leftear', 'rightear', 'spine', 'lforepaw', 'rforepaw', 'lhindpaw', 'rhindpaw', 'tailbase', 'tailend', 'cornerofbox1', 'cornerofbox2'], 'cropratio': 0.4, 'dataset': 'training-datasets\iteration-0\UnaugmentedDataSet_2CamTest9Oct4\2CamTest9_CF95shuffle3.pickle', 'display_iters': 500, 'init_weights': 'C:\Users\RyC\anaconda3\envs\dlc-gpu\lib\site-packages\deeplabcut\pose_estimation_tensorflow\models\pretrained\resnet_v1_50.ckpt', 'max_input_size': 1500, 'metadataset': 'training-datasets\iteration-0\UnaugmentedDataSet_2CamTest9Oct4\Documentation_data-2CamTest9_95shuffle3.pickle', 'min_input_size': 64, 'multi_step': [[0.0001, 7500], [5e-05, 12000], [1e-05, 200000]], 'net_type': 'resnet_50', 'num_joints': 13, 'num_limbs': 55, 'pafwidth': 20, 'pairwise_loss_weight': 0.1, 'partaffinityfield_graph': [[5, 9], [4, 7], [1, 3], [6, 9], [4, 8], [5, 6], [2, 8], [0, 7], [8, 9], [1, 6], [0, 10], [3, 7], [0, 3], [2, 5], [2, 4], [5, 8], [1, 2], [4, 9], [6, 7], [2, 9], [3, 10], [6, 10], [8, 10], [1, 5], [3, 6], [0, 4], [1, 10], [7, 10], [4, 10], [2, 6], [4, 5], [1, 4], [2, 10], [9, 10], [3, 9], [0, 5], [1, 9], [2, 3], [0, 8], [3, 5], [0, 1], [2, 7], [7, 9], [7, 8], [5, 10], [4, 6], [6, 8], [5, 7], [3, 8], [0, 6], [1, 8], [1, 7], [0, 9], [3, 4], [0, 2]], 'pos_dist_thresh': 17, 'project_path': 'C:\Users\RyC\2CamTest9-CF-2020-10-04', 'rotation': 25, 'rotratio': 0.4, 'save_iters': 10000, 'scale_jitter_lo': 0.5, 'scale_jitter_up': 1.25}
Starting multi-animal training....
2020-10-05 10:40:27.731872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll]
Upon reading in some forums, some people have been succesful using Symlink in other applications, so I tried that with Cudnn64_7.dll and hardlinked to Cudnn64_8.dll inside DLC-GPU enviroment, but I have not been able to make it work. It shows an error saying that compute capabilities does not match.
Do you have any suggestion that I might try?
Many thanks in advance.
The text was updated successfully, but these errors were encountered: