Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of test-dev result #9

Closed
tjqansthd opened this issue Oct 8, 2021 · 5 comments
Closed

Inconsistency of test-dev result #9

tjqansthd opened this issue Oct 8, 2021 · 5 comments

Comments

@tjqansthd
Copy link

Hi, thanks for your great work!

I tested your pre-trained model (R-101 3x) on test-dev2017 in the coco evaluation server.
when I extract only mask results and save to json files, the segmentation score is matched the results reported on the paper. However, when I save mask results with box results (generated by 511~514 lines in sotr.py) to json files, the score (AP_s, AP_m, AP_l) is different from the paper.

This is the result from json file which is consist of only mask information

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.612
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.434
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.102
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.731
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.328
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733

This is the result from json file which is consist of mask information with box information

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.612
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.434
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.194
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.440
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.552
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.328
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733

As can be seen, (AP_s, AP_m, AP_l) of first result and second result are (0.102, 0.590, 0.731) and (0.194, 0.440, 0.552), respectively.
I don't know the detailed process of the coco evaluation server, so I wonder why there is a difference in segmentation ap due to the presence or absence of box information.

@QuLiao1117
Copy link
Collaborator

Thanks for your attention. We did not perform an in-depth investigation about this issue, maybe you can find the reason from https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L163.

@tjqansthd
Copy link
Author

The code seems to calculate iou by dividing the case of the box and the case of segmentation independently. And All the AP is the same but only the case of AP with respect to the area is different... I haven't found out exactly which part is the problem.

@easton-cau
Copy link
Owner

For more detailed testing questions, please contact the COCO contributors.

@tjqansthd
Copy link
Author

In my opinion, 'loadRes' function of 305 lines in https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/coco.py cause a difference of result about area. As can be seen in 331 line and 341 line, ann['area'] is differently calculated whether or not there is box information. (ann['area'] = bb[2]*bb[3] when there is box information and ann['area'] = maskUtils.area(ann['segmentation']) when there isn't)

when I run evaluation of SOLOv2_R-101_3x from AdelaiDet on test-dev, it shows a similar trend where (AP_s, AP_m, AP_l) is (0.095, 0.574, 0.714) when there is box information, and (0.178, 0.427, 0.546 - similar result of SOLOv2 paper) when there isn't.

@QuLiao1117
Copy link
Collaborator

Thanks very much for your issue. By reading the code, we found that these two calculation methods do lead to differences in area calculation. We have added a note in readme to illustrate this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants