Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory #35

Closed
rsicak opened this issue Feb 24, 2022 · 7 comments
Closed

No such file or directory #35

rsicak opened this issue Feb 24, 2022 · 7 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@rsicak
Copy link

rsicak commented Feb 24, 2022

Hi I have the same issue as others in this issue history.
I have tried solution to set DOWNLOAD_ALL=1 in dockerfile but not works for me.
I have yolov4.weights in the right folder under config/darknet/yolov4_default_weights/
Any help? Thank you. Robert
image

@hadikoub
Copy link
Member

hadikoub commented Mar 2, 2022

Hello,
What is the build command you used to build the docker image?
can you try to rebuild the docker image by specifying the build arg and without cache ex:

sudo docker build -f docker/Dockerfile -t darknet_yolov4_gpu:1 --build-arg GPU=1  --build-arg DOWNLOAD_ALL=1 --build-arg CUDNN=1 --build-arg CUDNN_HALF=0 --build-arg OPENCV=1 . --no-cache 

@hadikoub hadikoub added the help wanted Extra attention is needed label Mar 2, 2022
@rsicak
Copy link
Author

rsicak commented Mar 3, 2022

Hi hadikoub,
I have tried to build docker as you suggest (DOWNLOAD_ALL=1 CUDNN_HALF=0 and --no-cache) and it works on older machine with GTX1060 6GB, GPU is utilizing almost 100 percent.
When I have build the same docker on the machine with RTX3080 or on the another machine with A4000 with CUDNN_HALF=1 the behavior is strange. After starting docker, only one core is working for a lot of minutes and after that something is doing but GPU not working (nvidia-smi) and in the log file there are for about half an hour only this:

CUDA-version: 10000 (11060), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 2

CUDNN_HALF=1

OpenCV version: 3.2.0

0 : compute_capability = 860, cudnn_half = 1, GPU: NVIDIA RTX A4000

layer filters size/strd(dil) input output

And after half an hour in the log file is something like:

v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 16, class_loss = -nan, iou_loss = -nan, total_loss = -nan

v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 17, class_loss = -nan, iou_loss = -nan, total_loss = -nan

v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 5, class_loss = -nan, iou_loss = -nan, total_loss = -nan

It seems that with modern Nvidia GPUs is some problem.
Robert

@hadikoub
Copy link
Member

hadikoub commented Mar 4, 2022

Hello Robert,

Glad that the solution is now working.
Regarding the other issue you are facing on newer versions of the RTX 3080 and A4000; This issue is not caused by the solution but it's due to Nvidia Cuda support on newer devices.
Currently, Nvidia 30 RTX Series supports Cuda 11.x only without backward compatibility with older versions of Cuda like the one that's currently used in the solution (Cuda 10.0)

Please refer to:

@rsicak
Copy link
Author

rsicak commented Mar 4, 2022 via email

@rsicak rsicak closed this as completed Mar 4, 2022
@hadikoub
Copy link
Member

hadikoub commented Mar 4, 2022

Hello @rsicak Again,

I've created a branch for Cuda 11 support named (cuda11_support) Link: https://github.com/BMW-InnovationLab/BMW-YOLOv4-Training-Automation/tree/cuda11_support but it's still under testing and thus stability is not fully guaranteed.

You can take a look at it in case this is convenient for you.

@hadikoub hadikoub reopened this Mar 4, 2022
@rsicak
Copy link
Author

rsicak commented Mar 6, 2022

Hi, I have tried the new branch. Docker compiled and then run. It worked up to 300 iterations and stopped on cudnn error.
I have cuda 11.6 so I have replaced "nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04" with newer one "nvidia/cuda:11.5.1-cudnn8-devel-ubuntu20.04" in docker file and recompile docker image. After that it works with RTX3080 and A4000 GPUs. The sample dataset training finished well after some hours. It also works with dual A4000 GPUs.

@hadikoub
Copy link
Member

hadikoub commented Mar 6, 2022

Hi,

Thank you for the suggestion I'll try to change the docker image to the suggested one and do some tests.

@hadikoub hadikoub added the enhancement New feature or request label Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants