Segment Anything troubles when using ViT-B Backbone

## Description

I am attempting to deploy a finetuned Segment Anything model with a ViT-B backbone (instead of the original ViT-H). While the Nuclio function initializes correctly and the interactor is functional, the resulting mask quality is extremely poor. I suspect that the issue stems from the ONNX decoder.

That’s why I tried exporting the finetuned ViT-B model as an ONNX file. It appears that the output dimensions of the exported `decoder.onnx` in CVAT are not the same as those produced by the [export script](https://github.com/facebookresearch/segment-anything/blob/main/scripts/export_onnx_model.py) provided in the Segment Anything GitHub repository.

## Steps to Reproduce

1. Finetune the Segment Anything model using a ViT-B backbone.
2. Adjust `model_type` and `weights_path` in `model_handler.py`.
3. Deploy the model using the Nuclio function in CVAT.
4. Use the interactor in CVAT to generate masks.
5. Observe that the quality of the masks is significantly degraded compared to when I use the finetuned model locally.
6. Export the ONNX model for the finetuned SAM with the ViT-B backbone — and the same issues occur.

## Observed ONNX Model Outputs

**CVAT-provided `decoder.onnx`:**

- **masks:** `uint8[Slicemasks_dim_0,Slicemasks_dim_1,Slicemasks_dim_2,Slicemasks_dim_3]`
- **iou_predictions:** `float32[Unsqueezeiou_predictions_dim_0,1]`
- **low_res_masks:** `float32[Unsqueezelow_res_masks_dim_0,1,Unsqueezelow_res_masks_dim_2,Unsqueezelow_res_masks_dim_3]`
- **xtl, ytl, xbr, ybr:** `int64`

**My Exported Model:**

- **masks:** `float32[Resizemasks_dim_0,Resizemasks_dim_1,Resizemasks_dim_2,Resizemasks_dim_3]`
- **iou_predictions:** `float32[Gemmiou_predictions_dim_0,4]`
- **low_res_masks:** `float32[Reshapelow_res_masks_dim_0,Reshapelow_res_masks_dim_1,Reshapelow_res_masks_dim_2,Reshapelow_res_masks_dim_3]`

## Analysis & Suspicions

It appears that Segment Anything’s export process for the ViT-B decoder does not currently include bounding box outputs (`xtl, ytl, xbr, ybr`). Additionally, I am encountering issues when trying to quantize the model using the provided script in the Segment Anything repository.

## Questions / Request for Guidance

- Are there any recommended modifications or additional export steps to ensure compatibility when switching from a ViT-H to a ViT-B backbone?
- Could you please provide the script used to export the onnx model?
- Any guidance or suggestions to resolve these discrepancies would be greatly appreciated.

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segment Anything troubles when using ViT-B Backbone #9136

Description

Steps to Reproduce

Observed ONNX Model Outputs

Analysis & Suspicions

Questions / Request for Guidance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segment Anything troubles when using ViT-B Backbone #9136

Description

Description

Steps to Reproduce

Observed ONNX Model Outputs

Analysis & Suspicions

Questions / Request for Guidance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions