Inconsistency of test-dev result #9

tjqansthd · 2021-10-08T09:52:06Z

Hi, thanks for your great work!

I tested your pre-trained model (R-101 3x) on test-dev2017 in the coco evaluation server.
when I extract only mask results and save to json files, the segmentation score is matched the results reported on the paper. However, when I save mask results with box results (generated by 511~514 lines in sotr.py) to json files, the score (AP_s, AP_m, AP_l) is different from the paper.

This is the result from json file which is consist of only mask information

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.612
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.434
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.102
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.731
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.328
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733

This is the result from json file which is consist of mask information with box information

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.612
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.434
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.194
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.440
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.552
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.328
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733

As can be seen, (AP_s, AP_m, AP_l) of first result and second result are (0.102, 0.590, 0.731) and (0.194, 0.440, 0.552), respectively.
I don't know the detailed process of the coco evaluation server, so I wonder why there is a difference in segmentation ap due to the presence or absence of box information.

The text was updated successfully, but these errors were encountered:

QuLiao1117 · 2021-10-08T13:44:39Z

Thanks for your attention. We did not perform an in-depth investigation about this issue, maybe you can find the reason from https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L163.

tjqansthd · 2021-10-09T16:37:23Z

The code seems to calculate iou by dividing the case of the box and the case of segmentation independently. And All the AP is the same but only the case of AP with respect to the area is different... I haven't found out exactly which part is the problem.

easton-cau · 2021-10-13T03:44:15Z

For more detailed testing questions, please contact the COCO contributors.

tjqansthd · 2021-10-13T07:46:11Z

In my opinion, 'loadRes' function of 305 lines in https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/coco.py cause a difference of result about area. As can be seen in 331 line and 341 line, ann['area'] is differently calculated whether or not there is box information. (ann['area'] = bb[2]*bb[3] when there is box information and ann['area'] = maskUtils.area(ann['segmentation']) when there isn't)

when I run evaluation of SOLOv2_R-101_3x from AdelaiDet on test-dev, it shows a similar trend where (AP_s, AP_m, AP_l) is (0.095, 0.574, 0.714) when there is box information, and (0.178, 0.427, 0.546 - similar result of SOLOv2 paper) when there isn't.

QuLiao1117 · 2021-10-13T09:33:15Z

Thanks very much for your issue. By reading the code, we found that these two calculation methods do lead to differences in area calculation. We have added a note in readme to illustrate this problem.

easton-cau closed this as completed Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency of test-dev result #9

Inconsistency of test-dev result #9

tjqansthd commented Oct 8, 2021

QuLiao1117 commented Oct 8, 2021

tjqansthd commented Oct 9, 2021

easton-cau commented Oct 13, 2021

tjqansthd commented Oct 13, 2021

QuLiao1117 commented Oct 13, 2021

Inconsistency of test-dev result #9

Inconsistency of test-dev result #9

Comments

tjqansthd commented Oct 8, 2021

QuLiao1117 commented Oct 8, 2021

tjqansthd commented Oct 9, 2021

easton-cau commented Oct 13, 2021

tjqansthd commented Oct 13, 2021

QuLiao1117 commented Oct 13, 2021