Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv3 low accuracy #5257

Open
Devin97 opened this issue Apr 18, 2020 · 22 comments
Open

YOLOv3 low accuracy #5257

Devin97 opened this issue Apr 18, 2020 · 22 comments

Comments

@Devin97
Copy link

Devin97 commented Apr 18, 2020

Hey @AlexeyAB I've trained full yolov3 with pedestrian dataset from OpenImages. [Downloaded using OIDToolkitv4]

Train: 2400
Valid: 600
(Split from same dataset)

Trained for 6000 Iterations, used darknet53.conv.74 pretrained weights
batch = 64
subdivisions = 32
width & height = 416

chart_yolov3-custom-train

  • What can I do to improve accuracy?
  • Tried to perform detection on a 1 min video clip, average FPS was around 16-18. What can I do to improve FPS?
@AlexeyAB
Copy link
Owner

What can I do to improve accuracy?

  1. Use more training images
  2. Train this cfg https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1 with this pre-trained weights file https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV and train with flag -clear at the end of training command

Tried to perform detection on a 1 min video clip, average FPS was around 1618. What can I do to improve FPS?

  1. Download the latest Darknet version, and recompile with GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1

  2. What GPU do you use?

@Devin97
Copy link
Author

Devin97 commented Apr 18, 2020

Thanks for replying. I'll train again with that config.

Im using NVIDIA Tesla T4 (1 GPU) on google cloud.

But I actually need it to be fast enough for NVIDIA Gtx 1050ti

@Devin97
Copy link
Author

Devin97 commented Apr 18, 2020

Also, is there any documentation for all the hyper parameters?

@AlexeyAB
Copy link
Owner

@Devin97
Read: https://github.com/AlexeyAB/darknet/wiki
Some will be added later.

@Devin97
Copy link
Author

Devin97 commented Apr 18, 2020

SharedScreenshot

Hey @AlexeyAB

This time I have around 6000 images and training for 6000 iterations.

Recompiled darknet with these enabled:
GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=1

After 1000 iterations, mAP is 0% and avg loss is -nan. Is this common?
Command:-
./darknet detector train data/obj.data cfg/cd53paspp-gamma-train.cfg cd53paspp-gamma_final.weights -map -dont_show -clear

@AlexeyAB
Copy link
Owner

AlexeyAB commented Apr 19, 2020

Something goes wrong.

  • Did you change burn_in= or learninig_rate= in cfg-file? Don't do it.

  • Try to set before line ########################## in cfg-file:

stopbackward=2000
train_only_bn=1

and train again

@Devin97
Copy link
Author

Devin97 commented Apr 19, 2020

Did you change burn_in= or learninig_rate= in cfg-file? Don't do it.
No haven't changed these parameters. Just changed batch=64, subdivision=16 and classes=1

Also, is it ok to directly use the full pretrained weights? Do I have to get a portion of these weights using "darknet partial" command?

@AlexeyAB
Copy link
Owner

Better if you will use partial:

darknet.exe partial cfg/cd53paspp-omega.cfg cd53paspp-omega_final.weights cd53paspp-omega.conv.137 137

@Devin97
Copy link
Author

Devin97 commented Apr 19, 2020

Better if you will use partial:

darknet.exe partial cfg/cd53paspp-omega.cfg cd53paspp-omega_final.weights cd53paspp-omega.conv.137 137

Getting Segmentation Fault error.
SharedScreenshot1

Tried to train with final weights again and added these parameters as you suggested
stopbackward=2000
train_only_bn=1

Still facing the same issue

SharedScreenshot2

chart_cd53paspp-gamma-train

Here's my cfg file

cd53paspp-gamma-train.txt

@AlexeyAB
Copy link
Owner

  1. Don' use obj.data file for partial command.
    ./darknet partial cfg/cd53paspp-omega.cfg cd53paspp-omega_final.weights cd53paspp-omega.conv.137 137

  2. Use

stopbackward=2000
train_only_bn=1
  1. set learning_rate=0.001

@Devin97
Copy link
Author

Devin97 commented Apr 19, 2020

I see, I'll train again with learning_rate=0.001
What is the reason for the avg loss to be nan? Does this mean that something's wrong with the dataset?

@AlexeyAB
Copy link
Owner

Exploding gradients(backward) / features(forward): https://machinelearningmastery.com/exploding-gradients-in-neural-networks/

To solve this:

  • reduce learning_rate=
  • reduce batch=
  • increase burn_in=
  • increase decay=
  • use max_delta=3 in [yolo] layers
  • use stopbackward=2000 and train_only_bn=1 at the last backbone layer
  • use less layers
  • fix your dataset
  • use another model
  • use gradinent clipping

@Devin97
Copy link
Author

Devin97 commented Apr 19, 2020

Hey @AlexeyAB
It's been more than 2000 iterations and I didn't get any "nan" avg loss, so thats good news. However, there's another problem..

chart_cd53paspp-gamma-train

Avg loss seems to be not decreasing much. Is it normal for this to happen at the beginning of the training with this cd53paspp config? or something's wrong?

Screenshot from 2020-04-20 01-06-15

@AlexeyAB
Copy link
Owner

avg loss for cd53paspp is higher than for yolov3, but the mAP is also higher.

Also the lower learning_rate= - the more stable training - but training is slower. So you should find optimal learning rate.

  • you can keep learninig_rate=0.001 and increase max_batches= and steps= to train longer
  • or you can try to add max_delta=3 in each [yolo] layer, and use higher learninig_rate=0.00261

@Devin97
Copy link
Author

Devin97 commented Apr 19, 2020

My current configurations has
learning_rate = 0.001

and I've added max_delta = 3 in yolo layers.

you can keep learninig_rate=0.001 and increase max_batches= and steps= to train longer

I'll also try this out on next training session.

I'll post my results when the training is finished. Thanks for the help @AlexeyAB !

@Devin97
Copy link
Author

Devin97 commented Apr 20, 2020

This is my training results
chart_cd53paspp-gamma-train (1)

Accuracy didn't improve, its almost close to yolov3 config.

Performed detection on a 1min clip. The FPS is around 9, but detection is somewhat more stable as compared to yolov3.

Video: https://drive.google.com/open?id=1mMe-S2XL2InaTjhzEQAc3O50fpyPRxlF

It might be overfitting..

@AlexeyAB
Copy link
Owner

Try to train again:
with stopbackward=2000
but without train_only_bn=1

use max_delta=3 in [yolo] layers.
and use learning_rate=0.00261

And set 2x higher max_batches=12000 and steps=...


If the avg loss Nan will occur, then train with learning_rate=0.001

@Devin97
Copy link
Author

Devin97 commented Apr 21, 2020

Try to train again:
with stopbackward=2000
but without train_only_bn=1

use max_delta=3 in [yolo] layers.
and use learning_rate=0.00261

And set 2x higher max_batches=12000 and steps=...

If the avg loss Nan will occur, then train with learning_rate=0.001

So Im training with the parameters above for 12000 iterations.
learning_rate = 0.00261 results in "nan" avg loss, so reduced it back to 0.001..

While its training, I wanted to ask, what is the expected FPS rate with cd53paspp-gamma cfg. On previous training I got around 9 FPS, which does makes sense, since it has more layers, detections are supposed to be slow? Is this correct?

@AlexeyAB
Copy link
Owner

0.9x speed, but 1.3x higher accuracy.

@seojupal
Copy link

How did you let mAP show on that chart?
for me, there's only loss.

@nanhui69
Copy link

What can I do to improve accuracy?

  1. Use more training images
  2. Train this cfg https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1 with this pre-trained weights file https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV and train with flag -clear at the end of training command

Tried to perform detection on a 1 min video clip, average FPS was around 1618. What can I do to improve FPS?

  1. Download the latest Darknet version, and recompile with GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1
  2. What GPU do you use?

@AlexeyAB whats the -clear mean? and is this flag necessary ?ts

@Devin97
Copy link
Author

Devin97 commented Sep 18, 2020

@Jureong

How did you let mAP show on that chart?
for me, there's only loss.

Use -map flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants