Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some comparison #32

Closed
WongKinYiu opened this issue Jun 3, 2020 · 12 comments
Closed

some comparison #32

WongKinYiu opened this issue Jun 3, 2020 · 12 comments

Comments

@WongKinYiu
Copy link
Owner

No description provided.

@WongKinYiu
Copy link
Owner Author

WongKinYiu commented Jun 3, 2020

@amusi Hello,

I saw your article, here I provide some comparison of Pytorch version YOLOv3, YOLOv4, and YOLOv5. (All experiments are run on a same Tesla V100 GPU)

Pytorch version

Train with YOLOv3 setting (416x416)

trained on coco 2014 trainvalno5k set and tested on coco 2014 5k set.

YOLOv3-SPP:

yolov3-spp 43.1% AP @ 608x608
Model Summary: 152 layers, 6.29719e+07 parameters, 6.29719e+07 gradients
Speed: 6.8/1.6/8.3 ms inference/NMS/total per 608x608 image at batch-size 16

Train with YOLOv4 setting (512x512)

trained on coco 2014 trainvalno5k set and tested on coco 2014 5k set.

YOLOv3-SPP:

yolov3-spp 43.6% AP @ 608x608
Model Summary: 152 layers, 6.29719e+07 parameters, 6.29719e+07 gradients
Speed: 6.8/1.6/8.3 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-YOSPP: (~YOLOv4(Leaky) backbone + YOLOv3 head)

cd53s-yospp 43.7% AP @ 608x608
Model Summary: 184 layers, 4.89836e+07 parameters, 4.89836e+07 gradients
Speed: 6.3/1.6/7.8 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-YOSPP-Mish: (~YOLOv4 backbone + YOLOv3 head)

cd53s-yospp-mish 44.3% AP @ 608x608
Model Summary: 184 layers, 4.89836e+07 parameters, 4.89836e+07 gradients
Speed: 7.9/1.6/9.6 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-PASPP: (~YOLOv4(Leaky))

cd53s-paspp 44.5% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 6.9/1.6/8.5 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-PASPP-Mish: (~YOLOv4)

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-PACSP:

cd53s-paspp-cspt 45.1% AP @ 608x608
Model Summary: 222 layers, 5.84596e+07 parameters, 5.84596e+07 gradients
Speed: 6.6/1.5/8.1 ms inference/NMS/total per 608x608 image at batch-size 16

Train with YOLOv5 setting (640x640)

trained on coco 2017 train set and tested on coco 2017 5k set.

YOLOv3-SPP:

yolov3-spp 45.5% AP @ 736x736
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5s:

yolov5s 33.1% AP @ 736x736
Model Summary: 99 layers, 6.99302e+06 parameters, 6.99302e+06 gradients
Speed: 2.2/2.1/4.4 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5m:

yolov5m 41.5% AP @ 736x736
Model Summary: 165 layers, 2.51928e+07 parameters, 2.51928e+07 gradients
Speed: 5.4/1.8/7.2 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5l:

yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5x:

yolov5x 47.1% AP @ 736x736
Model Summary: 297 layers, 1.23102e+08 parameters, 1.23102e+08 gradients
Speed: 20.3/2.2/22.5 ms inference/NMS/total per 736x736 image at batch-size 16

@AlexeyAB
Copy link
Collaborator

AlexeyAB commented Jun 3, 2020

@WongKinYiu Hi,

It obviously CSPDarknet53s-PASPP-Mish: (~YOLOv4) is much better than amusi YOLOv5l (640x640) (batch-size 16):

  • CSPDarknet53s-PASPP-Mish: (~YOLOv4) 512x512/608x608: - 45.0% AP - Speed: 8.7/1.6/10.3 ms
  • YOLOv5l (640x640)/(736x736): - 44.2% AP - Speed: 11.3/2.2/13.5 ms

While our new YOLOv4 model is even much better:

  • CSPDarknet53s-PACSP: 45.1% AP - Speed: 6.6/1.5/8.1 ms

  1. Does it use inference time data augmetation?
  2. Why is batch 16 used here?
  3. Is there GitHub-repo with amusi YOLOv5l (640x640) ?

Train with YOLOv5 setting (640x640)

trained on coco 2017 train set and tested on coco 2017 5k set.

YOLOv3-SPP:

yolov3-spp 45.5% AP
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16
  1. Is better AP for Yolov3-spp achieved just by using 640x640 network resolution, or something else?

@WongKinYiu
Copy link
Owner Author

WongKinYiu commented Jun 3, 2020

@AlexeyAB

  1. Does it use inference time data augmetation?

No, there is no any inference time augmentation.

  1. Why is batch 16 used here?

I just follow Ultralytics testing protocol with batch size 16.

  1. Is there GitHub-repo with amusi YOLOv5l (640x640) ?

It is not amusi's repo, it is Ultralytics's new repo.

  1. Is better AP for Yolov3-spp achieved just by using 640x640 network resolution, or something else?

There are some modifications in Ultralytics's new repo.
But yes I think main reason of improvement is from 640x640 training.
And In Ultralytics's new repo, it seems use affine transform instead of multi-resolution training.
So new training won't use too much GPU ram. (need to check code in details.) training log details

I am training CSPDarknet53-PACSP-(SAM)-Mish with darknet on MSCOCO 2017.

@AlexeyAB
Copy link
Collaborator

AlexeyAB commented Jun 3, 2020

And In Ultralytics's new repo, it seems use affine transform instead of multi-resolution training.

Yes:

  1. scale=0.5 https://github.com/ultralytics/yolov5/blob/391492ee5b56ef36424b4a9257c18f7c784a8f44/train.py#L44
  2. python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 16

May be we should use random=0 resize=1.5 instead of random=1 too in the Darknet?

@WongKinYiu
Copy link
Owner Author

@AlexeyAB

OK, will train this setting on tiny-yolov4 with width=640 and height=640.
If this can work good, users can use cheaper gpu to train yolo.

@WongKinYiu
Copy link
Owner Author

@AlexeyAB Hello,

Yes, the AP is benefit by 640x640 training.
CSPDarknet53s-YOSPP gets 12.5% faster model inference speed and 0.1% higher AP than YOLOv3-SPP.
CSPDarknet53s-YOSPP gets 19.5% faster model inference speed and 1.3% higher AP than YOLOv5l.

YOLOv3-SPP:

yolov3-spp: 45.5% AP @736x736
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16

CSPDarknet53s-YOSPP: (~YOLOv4(Leaky) backbone + YOLOv3 head)

cd53s-yospp: 45.6% AP @736x736
Model Summary: 225 layers, 4.90092e+07 parameters, 4.90092e+07 gradients
Speed: 9.1/2.0/11.1 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5l:

yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16

@AlexeyAB
Copy link
Collaborator

AlexeyAB commented Jun 8, 2020

@WongKinYiu Nice.

  • Does CSPDarknet53s s give improvements for training on both Ultralitics and Darknet?
  • Interesting, what AP will give P6-model that is trained on 640x640 and tested on 736x736?

@WongKinYiu
Copy link
Owner Author

@AlexeyAB

  • Does CSPDarknet53s s give improvements for training on both Ultralitics and Darknet?

I am not sure for Darknet due to I do not train it on ImageNet, but yes for Ultralytics.

  • Interesting, what AP will give P6-model that is trained on 640x640 and tested on 736x736?

To acheive this goal I have to take a look how to construct P6 model using new Ultralytics repository. Then I need construct the YOLOv4 model, it does not support all of blocks of YOLOv4 currently.
(or maybe directly modify my current used pytorch code)
I think I will design training scheme to train P6 model on Darknet first.

@AlexeyAB
Copy link
Collaborator

@WongKinYiu Hi,

Can you share cfg/weights files for this model?

CSPDarknet53s-PASPP-Mish: (~YOLOv4) - trained 512x512, tested 608x608

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16

@WongKinYiu
Copy link
Owner Author

@AlexeyAB

cd53s-paspp-mish.cfg
cd53s-paspp-mish.pt

@clw5180
Copy link

clw5180 commented Jun 13, 2020

Hi WongKinYiu, what does -PACSP mean ? And I can't find config and weight file of it, thanks a lot !

@WongKinYiu
Copy link
Owner Author

Hello, PACSP means apply CSP to PANet, the model is still in training process, will release .weights file after finish training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants