No such file or directory #35

rsicak · 2022-02-24T23:19:27Z

Hi I have the same issue as others in this issue history.
I have tried solution to set DOWNLOAD_ALL=1 in dockerfile but not works for me.
I have yolov4.weights in the right folder under config/darknet/yolov4_default_weights/
Any help? Thank you. Robert

hadikoub · 2022-03-02T17:51:47Z

Hello,
What is the build command you used to build the docker image?
can you try to rebuild the docker image by specifying the build arg and without cache ex:

sudo docker build -f docker/Dockerfile -t darknet_yolov4_gpu:1 --build-arg GPU=1  --build-arg DOWNLOAD_ALL=1 --build-arg CUDNN=1 --build-arg CUDNN_HALF=0 --build-arg OPENCV=1 . --no-cache

rsicak · 2022-03-03T12:13:17Z

Hi hadikoub,
I have tried to build docker as you suggest (DOWNLOAD_ALL=1 CUDNN_HALF=0 and --no-cache) and it works on older machine with GTX1060 6GB, GPU is utilizing almost 100 percent.
When I have build the same docker on the machine with RTX3080 or on the another machine with A4000 with CUDNN_HALF=1 the behavior is strange. After starting docker, only one core is working for a lot of minutes and after that something is doing but GPU not working (nvidia-smi) and in the log file there are for about half an hour only this:

CUDA-version: 10000 (11060), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 2

CUDNN_HALF=1

OpenCV version: 3.2.0

0 : compute_capability = 860, cudnn_half = 1, GPU: NVIDIA RTX A4000

layer filters size/strd(dil) input output

And after half an hour in the log file is something like:

v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 16, class_loss = -nan, iou_loss = -nan, total_loss = -nan

v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 17, class_loss = -nan, iou_loss = -nan, total_loss = -nan

v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 5, class_loss = -nan, iou_loss = -nan, total_loss = -nan

It seems that with modern Nvidia GPUs is some problem.
Robert

hadikoub · 2022-03-04T17:28:48Z

Hello Robert,

Glad that the solution is now working.
Regarding the other issue you are facing on newer versions of the RTX 3080 and A4000; This issue is not caused by the solution but it's due to Nvidia Cuda support on newer devices.
Currently, Nvidia 30 RTX Series supports Cuda 11.x only without backward compatibility with older versions of Cuda like the one that's currently used in the solution (Cuda 10.0)

Please refer to:

rsicak · 2022-03-04T18:02:17Z

Hi Hadi, thanks for fast response. I will look at it. Best regards. Robert

…

Dňa 4. 3. 2022 o 18:29, Hadi Koubeissy ***@***.***> napísal: Hello Robert, Glad that the solution is now working. Regarding the other issue you are facing on newer versions of the RTX 3080 and A4000; This issue is not caused by the solution but it's due to Nvidia Cuda support on newer devices. Currently, Nvidia 30 RTX Series supports Cuda 11.x only without backward compatibility with older versions of Cuda like the one that's currently used in the solution (Cuda 10.0) Please refer to: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ <https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/> https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html <https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html> — Reply to this email directly, view it on GitHub <#35 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIH2MACHIMKCYQYESJ7K4VTU6JB6FANCNFSM5PIX5PXQ>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.

hadikoub · 2022-03-04T18:30:04Z

Hello @rsicak Again,

I've created a branch for Cuda 11 support named (cuda11_support) Link: https://github.com/BMW-InnovationLab/BMW-YOLOv4-Training-Automation/tree/cuda11_support but it's still under testing and thus stability is not fully guaranteed.

You can take a look at it in case this is convenient for you.

rsicak · 2022-03-06T19:07:23Z

Hi, I have tried the new branch. Docker compiled and then run. It worked up to 300 iterations and stopped on cudnn error.
I have cuda 11.6 so I have replaced "nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04" with newer one "nvidia/cuda:11.5.1-cudnn8-devel-ubuntu20.04" in docker file and recompile docker image. After that it works with RTX3080 and A4000 GPUs. The sample dataset training finished well after some hours. It also works with dual A4000 GPUs.

hadikoub · 2022-03-06T19:16:24Z

Hi,

Thank you for the suggestion I'll try to change the docker image to the suggested one and do some tests.

hadikoub added the help wanted Extra attention is needed label Mar 2, 2022

rsicak closed this as completed Mar 4, 2022

hadikoub reopened this Mar 4, 2022

hadikoub added the enhancement New feature or request label Mar 9, 2022

hadikoub closed this as completed Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No such file or directory #35

No such file or directory #35

rsicak commented Feb 24, 2022

hadikoub commented Mar 2, 2022 •

edited

rsicak commented Mar 3, 2022

hadikoub commented Mar 4, 2022

rsicak commented Mar 4, 2022 via email

hadikoub commented Mar 4, 2022 •

edited

rsicak commented Mar 6, 2022

hadikoub commented Mar 6, 2022

No such file or directory #35

No such file or directory #35

Comments

rsicak commented Feb 24, 2022

hadikoub commented Mar 2, 2022 • edited

rsicak commented Mar 3, 2022

hadikoub commented Mar 4, 2022

rsicak commented Mar 4, 2022 via email

hadikoub commented Mar 4, 2022 • edited

rsicak commented Mar 6, 2022

hadikoub commented Mar 6, 2022

hadikoub commented Mar 2, 2022 •

edited

hadikoub commented Mar 4, 2022 •

edited