-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Description
I am attempting to deploy a finetuned Segment Anything model with a ViT-B backbone (instead of the original ViT-H). While the Nuclio function initializes correctly and the interactor is functional, the resulting mask quality is extremely poor. I suspect that the issue stems from the ONNX decoder.
That’s why I tried exporting the finetuned ViT-B model as an ONNX file. It appears that the output dimensions of the exported decoder.onnx in CVAT are not the same as those produced by the export script provided in the Segment Anything GitHub repository.
Steps to Reproduce
- Finetune the Segment Anything model using a ViT-B backbone.
- Adjust
model_typeandweights_pathinmodel_handler.py. - Deploy the model using the Nuclio function in CVAT.
- Use the interactor in CVAT to generate masks.
- Observe that the quality of the masks is significantly degraded compared to when I use the finetuned model locally.
- Export the ONNX model for the finetuned SAM with the ViT-B backbone — and the same issues occur.
Observed ONNX Model Outputs
CVAT-provided decoder.onnx:
- masks:
uint8[Slicemasks_dim_0,Slicemasks_dim_1,Slicemasks_dim_2,Slicemasks_dim_3] - iou_predictions:
float32[Unsqueezeiou_predictions_dim_0,1] - low_res_masks:
float32[Unsqueezelow_res_masks_dim_0,1,Unsqueezelow_res_masks_dim_2,Unsqueezelow_res_masks_dim_3] - xtl, ytl, xbr, ybr:
int64
My Exported Model:
- masks:
float32[Resizemasks_dim_0,Resizemasks_dim_1,Resizemasks_dim_2,Resizemasks_dim_3] - iou_predictions:
float32[Gemmiou_predictions_dim_0,4] - low_res_masks:
float32[Reshapelow_res_masks_dim_0,Reshapelow_res_masks_dim_1,Reshapelow_res_masks_dim_2,Reshapelow_res_masks_dim_3]
Analysis & Suspicions
It appears that Segment Anything’s export process for the ViT-B decoder does not currently include bounding box outputs (xtl, ytl, xbr, ybr). Additionally, I am encountering issues when trying to quantize the model using the provided script in the Segment Anything repository.
Questions / Request for Guidance
- Are there any recommended modifications or additional export steps to ensure compatibility when switching from a ViT-H to a ViT-B backbone?
- Could you please provide the script used to export the onnx model?
- Any guidance or suggestions to resolve these discrepancies would be greatly appreciated.
Thank you!