New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About obtaining uvo data set ap or boundary ap #114
Comments
Hi, we use an earlier version of UVO v0.5. It has class agnostic label and image/frame set. |
Then I would like to ask, on data sets such as coco, hq-ytvis, lvis, and uvo, when selecting boxes based on scores, is it only coco that not only uses the score of the detector to output the box, but also uses the score of the mask when selecting the box? ? The score of mask is predicted by sam's iou token. Do hq-ytvis, lvis, and uvo only require the box score output by the detector, and do not need the score predicted by sam? |
When I inference on instance segmentation across datasets, such as coco, should I resize the image to equal proportions so that the long side reaches 1024 and then add 0 to 1024x1024, instead of directly resizing to 1024X1024. Just like the public hqsam code on hqseg 44k, the inference code directly resizes the image to 1024X1024. Is there no other data augmentation used during inference? |
we provided the coco evaluation code here: #113 (comment) |
I followed coco's testing method and tested the indicators on the uvo data set. But sam-l only reached 29.2, while the one in the paper reached 29.7. Could you please provide the configuration file for testing uvo data set indicators? Are only the first hundred boxes output by the detector selected for the uvo data set? When testing the indicators of the uvo data set, do you also use the predicted score of sam's iou token to select boxes? May I ask under which framework did you measure the uvo data set indicators? For example, mmdetectron or mmdet. The uvo v0.5 data set only has the object class, while focalnet-dino will output 91 categories. When you calculate ap, do the category IDs predicted by focalnet-dino exceed 80 categories as the background, otherwise they are regarded as the foreground category, and are they calculated as two categories? |
I would like to ask if the uvo data set is a video segmentation data set. How is instance segmentation performed? There are two types of videos in the uvo data set, one is dense, which is labeled with coco categories, and the other is sparse, which is only classified into coco or non-coco categories. Since the focalnet-dino used was trained on coco, is it possible to use its dense version of data to measure the indicators in the paper?
Thank you.
The text was updated successfully, but these errors were encountered: