A question about model training #22

AerysNan · 2022-02-26T14:23:29Z

Hello, thanks for such a wonderful work. After reading this paper, I have a question regarding model training.

According to the codes, ground truth annotations are still necessary during knowledge distillation to compute classfication loss, regression loss and DFL. The paper mentions that removing regression loss and DFL only causes a small decrease in mAP.

But if I want to completely get rid of the dependency on ground truth labels, how should I deal with the classfication loss? Does this term affects the final mAP a lot? Would you please share some insights or results about this?

Zzh-tju · 2022-02-27T03:51:38Z

Removing classification loss is training now.
Removing bbox regression loss causes very slight performance drop whe main LD is used (if add VLR LD, never try yet, but probably help).
Removing DFL will improve AP slightly when LD is used (this phenomenon was observed many times).

AerysNan · 2022-02-27T06:31:27Z

Removing classification loss is training now.

Thanks for your reply. But I can't quite understand this sentence. Please let me express my questions more clearly.

Currently the implementation of LDHead.loss_single indicates the overall loss is composed of 4 parts: loss_cls, loss_bbox, loss_dfl and loss_ld. And the first 3 parts require ground truth annotations to be computed, while loss_ld only requires soft_targets which is computed by the teacher model. So:

Does the current implementation of loss computation rely on ground truth annotations? Because according to my understanding, there are cases in knowledge distillation where ground truth annotations are unavailable.
If no, how does the current implementation work without ground truth? Can I only use loss_ld as the overall loss and drop loss_cls, loss_bbox and loss_dfl? According to your reply, loss_bbox and loss_dfl only affect mAP slightly, so what about loss_cls?

Zzh-tju · 2022-02-27T08:42:12Z

I'm trying an experiment without cls_loss.

BTW, why does KD method remove GT annotation? Is there any literature?

You can disable cls_loss, bbox_loss and DFL of course, however, the label assignment still leverages the GT information (i.e., decide where to distill). If you remove these three losses and you use the full map locations to distill, then no GT information will be used. But notice that even if you do so, the teacher detector was trained with GT annotation.

Zzh-tju · 2022-02-27T16:39:47Z

2022-02-28 00:19:13,754 - mmdet - INFO - Epoch [12][7050/7330]	lr: 1.000e-04, eta: 0:02:22, time: 0.507, data_time: 0.009, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3418, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1564, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4982
2022-02-28 00:19:39,186 - mmdet - INFO - Epoch [12][7100/7330]	lr: 1.000e-04, eta: 0:01:57, time: 0.509, data_time: 0.009, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3358, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1495, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4852
2022-02-28 00:20:04,561 - mmdet - INFO - Epoch [12][7150/7330]	lr: 1.000e-04, eta: 0:01:31, time: 0.508, data_time: 0.009, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3399, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1526, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4925
2022-02-28 00:20:29,878 - mmdet - INFO - Epoch [12][7200/7330]	lr: 1.000e-04, eta: 0:01:06, time: 0.507, data_time: 0.009, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3240, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1539, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4779
2022-02-28 00:20:55,356 - mmdet - INFO - Epoch [12][7250/7330]	lr: 1.000e-04, eta: 0:00:40, time: 0.510, data_time: 0.008, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3488, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1390, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4879
2022-02-28 00:21:20,694 - mmdet - INFO - Epoch [12][7300/7330]	lr: 1.000e-04, eta: 0:00:15, time: 0.507, data_time: 0.008, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3350, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1496, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4845
2022-02-28 00:21:45,821 - mmdet - INFO - Saving checkpoint at 12 epochs
2022-02-28 00:24:14,925 - mmdet - INFO - Evaluating bbox...
2022-02-28 00:25:27,692 - mmdet - INFO - Exp name: im_r101_r50_coco_1x.py
2022-02-28 00:25:27,693 - mmdet - INFO - Epoch(val) [12][7330]	bbox_mAP: 0.3270, bbox_mAP_50: 0.4820, bbox_mAP_75: 0.3520, bbox_mAP_s: 0.1940, bbox_mAP_m: 0.3680, bbox_mAP_l: 0.3980, bbox_mAP_copypaste: 0.327 0.482 0.352 0.194 0.368 0.398

The above training settings are bbox_loss on positive locations, and cls KD on full map locations.

Removing cls_loss causes significant AP drops (7.4 points).

AerysNan · 2022-03-01T01:51:12Z

All my doubts are cleared. Thanks a lot!

AerysNan closed this as completed Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about model training #22

A question about model training #22

AerysNan commented Feb 26, 2022

Zzh-tju commented Feb 27, 2022

AerysNan commented Feb 27, 2022

Zzh-tju commented Feb 27, 2022

Zzh-tju commented Feb 27, 2022 •

edited

AerysNan commented Mar 1, 2022

A question about model training #22

A question about model training #22

Comments

AerysNan commented Feb 26, 2022

Zzh-tju commented Feb 27, 2022

AerysNan commented Feb 27, 2022

Zzh-tju commented Feb 27, 2022

Zzh-tju commented Feb 27, 2022 • edited

AerysNan commented Mar 1, 2022

Zzh-tju commented Feb 27, 2022 •

edited