Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss can't be below 1. #829

Open
BlueAnthony opened this issue May 16, 2018 · 8 comments
Open

Loss can't be below 1. #829

BlueAnthony opened this issue May 16, 2018 · 8 comments
Labels

Comments

@BlueAnthony
Copy link

BlueAnthony commented May 16, 2018

The loss can not decrease under 1. It will stop and jitter around some number, like 5 to 6 or 12 to 13, when the iteration is around 50.
I already try different base learning rate, like 0.001, 0.0001, 0.00001, 0.000001.
And the loss start from about 1000.
I have 2 classes(Car and pedestrian), 3712 images for training and 3769 images for validation.
I use yolov3.weight as pretrained.
image

Thank you!!


I use the code from pjreddie/darknet and try to fine-tune with yolov3.weight.
The command I use is below.
"./darknet detector train cfg/kitti.data cfg/yolov3-kitti.cfg model/yolov3.weights"
Yes, I use random=1. My cfg is modified from yolov3.cfg of pjreddie/darknet.

And why I use these learning rate and steps?
It's because the yolov3.weights seems to remember the max iteration number, the max_batches for fine-tuning must be larger than 500200 and the fine-tuning just can be start.
The loss start about 1000 and stop decreasing about "500200+50" iterations.
Do I misunderstand something?

@AlexeyAB Really thank you for your patience.

@AlexeyAB
Copy link
Owner

Use these params:

darknet/cfg/yolov3.cfg

Lines 18 to 23 in e29fcb7

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

And train about 2000 iterations.
If it doesn't help, then something wrong with classes number or with your dataset. Check it using this software: https://github.com/AlexeyAB/Yolo_mark

@BlueAnthony
Copy link
Author

@AlexeyAB When I change to use "AlexeyAB/darknet", "./darknet detector train cfg/kitti.data cfg/yolov3-kitti.cfg model/yolov3.weights" will directly save original model without training
image

@ghost
Copy link

ghost commented May 17, 2018

Your learning rate is about 1e-8 which is too small. Try using the option of -clear, then the iteration will restart from 0, if you'd like to use yolov3.weights as pre-trained weights.

@BlueAnthony
Copy link
Author

@panda9095 Thank you for your response.
Could you tell me more detail about "-clear"? When will "-clear" restart?

@ghost
Copy link

ghost commented May 17, 2018

@BlueAnthony ./darknet detector train cfg/kitti.data cfg/yolov3-kitti.cfg model/yolov3.weights -clear

By doing so, the step number will start from 0 instead of 500200. Then you can use @AlexeyAB 's parameters for training.

@BlueAnthony
Copy link
Author

BlueAnthony commented May 17, 2018

@panda9095 Really thank you for your helping! I will try.

@AlexeyAB
Copy link
Owner

@BlueAnthony
Properly commands for training:

  1. ./darknet detector train cfg/kitti.data cfg/yolov3-kitti.cfg model/yolov3.weights -clear

  2. ./darknet detector train cfg/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74

  3. ./darknet detector train cfg/kitti.data cfg/yolov3-kitti.cfg yolov3.conv.105
    Pre-trained file yolov3.conv.105 you can get by using this command:
    ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.105 105

darknet.exe partial cfg/yolov3.cfg yolov3.weights yolov3.conv.105 105

@AbhimanyuAryan
Copy link

@AlexeyAB does this mean that I can further train my last trained model(on my dataset)....with new data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants