Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bbox loss increases when using compute_ciou #7

Open
ginobilinie opened this issue May 16, 2020 · 8 comments
Open

bbox loss increases when using compute_ciou #7

ginobilinie opened this issue May 16, 2020 · 8 comments

Comments

@ginobilinie
Copy link

Thanks to your great work.

I have called the compute_ciou function to generate the bbox loss,

self.bbox_loss = compute_ciou

_, bbox_loss = self.bbox_loss(bbox_pred, bbox_target, bbox_inside_weight, bbox_outside_weight, transform_weights=config.network.bbox_reg_weights)

However, I found the bbox_loss increases after training. I have checked the compute_ciou, I think it is the loss instead of ciou. Can you please provide some comments?

@Zzh-tju
Copy link
Owner

Zzh-tju commented May 16, 2020

give more description of your problem, and your terminal output

@ginobilinie
Copy link
Author

ginobilinie commented May 17, 2020

@Zzh-tju Thanks.

More description:
I try to use the ciou/diou loss (just call the compute_ciou or compute_diou function to replace the original smooth_L1 loss) in the bbox regression (i donot use it in the rpn bbox regression) branch of maskrcnn.

Here is some output examples, we can see that the bbox loss is always becoming bigger and bigger even if it has gone to more than 1500 iterations.

020-05-16 13:58:04,031 | callback.py | line 40 : Batch [1120] Speed: 2.09 samples/sec Train-rpn_cls_loss=0.113447, rpn_bbox_loss=0.103411, rcnn_accuracy=0.965366, cls_loss=0.145091, bbox_loss=0.013169, mask_loss=0.421295
2020-05-16 13:58:32,343 | callback.py | line 40 : Batch [1140] Speed: 3.53 samples/sec Train-rpn_cls_loss=0.112570, rpn_bbox_loss=0.103330, rcnn_accuracy=0.965201, cls_loss=0.144857, bbox_loss=0.013289, mask_loss=0.420059
2020-05-16 13:59:00,609 | callback.py | line 40 : Batch [1160] Speed: 3.54 samples/sec Train-rpn_cls_loss=0.111402, rpn_bbox_loss=0.103031, rcnn_accuracy=0.965109, cls_loss=0.144442, bbox_loss=0.013391, mask_loss=0.418765
2020-05-16 13:59:28,599 | callback.py | line 40 : Batch [1180] Speed: 3.57 samples/sec Train-rpn_cls_loss=0.110362, rpn_bbox_loss=0.102688, rcnn_accuracy=0.965005, cls_loss=0.144045, bbox_loss=0.013488, mask_loss=0.417557
2020-05-16 13:59:57,988 | callback.py | line 40 : Batch [1200] Speed: 3.40 samples/sec Train-rpn_cls_loss=0.109447, rpn_bbox_loss=0.102511, rcnn_accuracy=0.964971, cls_loss=0.143690, bbox_loss=0.013563, mask_loss=0.416479, fcn_loss=2.902343,
2020-05-16 14:00:26,571 | callback.py | line 40 : Batch [1220] Speed: 3.50 samples/sec Train-rpn_cls_loss=0.108640, rpn_bbox_loss=0.102478, rcnn_accuracy=0.964896, cls_loss=0.143314, bbox_loss=0.013652, mask_loss=0.415290
2020-05-16 14:00:54,899 | callback.py | line 40 : Batch [1240] Speed: 3.53 samples/sec Train-rpn_cls_loss=0.108040, rpn_bbox_loss=0.102286, rcnn_accuracy=0.964789, cls_loss=0.143183, bbox_loss=0.013735, mask_loss=0.414629
2020-05-16 14:01:25,687 | callback.py | line 40 : Batch [1260] Speed: 3.25 samples/sec Train-rpn_cls_loss=0.107158, rpn_bbox_loss=0.101806, rcnn_accuracy=0.964729, cls_loss=0.142844, bbox_loss=0.013789, mask_loss=0.413583
2020-05-16 14:01:56,916 | callback.py | line 40 : Batch [1280] Speed: 3.20 samples/sec Train-rpn_cls_loss=0.106302, rpn_bbox_loss=0.101344, rcnn_accuracy=0.964675, cls_loss=0.142398, bbox_loss=0.013846, mask_loss=0.412258
2020-05-16 14:02:29,997 | callback.py | line 40 : Batch [1300] Speed: 3.02 samples/sec Train-rpn_cls_loss=0.105540, rpn_bbox_loss=0.101310, rcnn_accuracy=0.964535, cls_loss=0.142259, bbox_loss=0.013934, mask_loss=0.410907
2020-05-16 14:03:17,346 | callback.py | line 40 : Batch [1320] Speed: 2.11 samples/sec Train-rpn_cls_loss=0.104824, rpn_bbox_loss=0.101343, rcnn_accuracy=0.964492, cls_loss=0.141957, bbox_loss=0.013984, mask_loss=0.410117
2020-05-16 14:04:30,898 | callback.py | line 40 : Batch [1340] Speed: 1.36 samples/sec Train-rpn_cls_loss=0.104065, rpn_bbox_loss=0.100915, rcnn_accuracy=0.964418, cls_loss=0.141760, bbox_loss=0.014041, mask_loss=0.409114
2020-05-16 14:05:51,355 | callback.py | line 40 : Batch [1360] Speed: 1.24 samples/sec Train-rpn_cls_loss=0.103361, rpn_bbox_loss=0.100984, rcnn_accuracy=0.964272, cls_loss=0.141721, bbox_loss=0.014127, mask_loss=0.407994
2020-05-16 14:07:10,705 | callback.py | line 40 : Batch [1380] Speed: 1.26 samples/sec Train-rpn_cls_loss=0.102657, rpn_bbox_loss=0.100683, rcnn_accuracy=0.964276, cls_loss=0.141271, bbox_loss=0.014151, mask_loss=0.406887
2020-05-16 14:08:38,692 | callback.py | line 40 : Batch [1400] Speed: 1.14 samples/sec Train-rpn_cls_loss=0.101894, rpn_bbox_loss=0.100366, rcnn_accuracy=0.964249, cls_loss=0.140927, bbox_loss=0.014209, mask_loss=0.405810
2020-05-16 14:10:07,195 | callback.py | line 40 : Batch [1420] Speed: 1.13 samples/sec Train-rpn_cls_loss=0.101241, rpn_bbox_loss=0.100023, rcnn_accuracy=0.964270, cls_loss=0.140451, bbox_loss=0.014244, mask_loss=0.404589
2020-05-16 14:11:41,699 | callback.py | line 40 : Batch [1440] Speed: 1.06 samples/sec Train-rpn_cls_loss=0.100725, rpn_bbox_loss=0.100094, rcnn_accuracy=0.964232, cls_loss=0.140070, bbox_loss=0.014290, mask_loss=0.403361
2020-05-16 14:13:04,015 | callback.py | line 40 : Batch [1460] Speed: 1.21 samples/sec Train-rpn_cls_loss=0.100241, rpn_bbox_loss=0.100062, rcnn_accuracy=0.964128, cls_loss=0.139946, bbox_loss=0.014349, mask_loss=0.402280
2020-05-16 14:14:29,447 | callback.py | line 40 : Batch [1480] Speed: 1.17 samples/sec Train-rpn_cls_loss=0.099680, rpn_bbox_loss=0.100023, rcnn_accuracy=0.963948, cls_loss=0.140032, bbox_loss=0.014419, mask_loss=0.401260
2020-05-16 14:16:02,162 | callback.py | line 40 : Batch [1500] Speed: 1.08 samples/sec Train-rpn_cls_loss=0.099082, rpn_bbox_loss=0.100100, rcnn_accuracy=0.963881, cls_loss=0.139839, bbox_loss=0.014442, mask_loss=0.400566
....
2020-05-16 14:48:40,384 | callback.py | line 40 : Batch [1940] Speed: 1.09 samples/sec Train-rpn_cls_loss=0.089056, rpn_bbox_loss=0.097449, rcnn_accuracy=0.962810, cls_loss=0.136322, bbox_loss=0.015455, mask_loss=0.383530
2020-05-16 14:50:11,303 | callback.py | line 40 : Batch [1960] Speed: 1.10 samples/sec Train-rpn_cls_loss=0.088642, rpn_bbox_loss=0.097169, rcnn_accuracy=0.962809, cls_loss=0.136030, bbox_loss=0.015470, mask_loss=0.382931

@Zzh-tju
Copy link
Owner

Zzh-tju commented May 17, 2020

It seems that you are using the other detection repository.
Train more iterations to see what happen.
Did you just replace the loss function without any modification?
If so, what's your regression loss weight?

@ginobilinie
Copy link
Author

@Zzh-tju

Thanks.

Yes, I am using a MaskRCNN repository. When I train more iterations (currently 30k iterations), the loss does not increase but fixed at 0.019, however, it does not decreases, either.

The loss weight for the regression is set to 1.

@Zzh-tju
Copy link
Owner

Zzh-tju commented May 18, 2020

In our experiment, the regression loss weight is set to 12 to balance with the classification loss. But judging from the above terminal output, it is obvious that the loss of classification and regression is very imbalanced.

@ginobilinie
Copy link
Author

@Zzh-tju Thanks. I'll try to settle the balance issue.

@ginobilinie
Copy link
Author

Hi, I have tried different weight for the ciou loss, however, the performance all decreased. Do I need to pay attention to some other hyper-parameters? Thanks.

@Zzh-tju
Copy link
Owner

Zzh-tju commented May 27, 2020

more details will be good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants