Skip to content

Segment Anything troubles when using ViT-B Backbone #9136

@raphp-ait

Description

@raphp-ait

Description

I am attempting to deploy a finetuned Segment Anything model with a ViT-B backbone (instead of the original ViT-H). While the Nuclio function initializes correctly and the interactor is functional, the resulting mask quality is extremely poor. I suspect that the issue stems from the ONNX decoder.

That’s why I tried exporting the finetuned ViT-B model as an ONNX file. It appears that the output dimensions of the exported decoder.onnx in CVAT are not the same as those produced by the export script provided in the Segment Anything GitHub repository.

Steps to Reproduce

  1. Finetune the Segment Anything model using a ViT-B backbone.
  2. Adjust model_type and weights_path in model_handler.py.
  3. Deploy the model using the Nuclio function in CVAT.
  4. Use the interactor in CVAT to generate masks.
  5. Observe that the quality of the masks is significantly degraded compared to when I use the finetuned model locally.
  6. Export the ONNX model for the finetuned SAM with the ViT-B backbone — and the same issues occur.

Observed ONNX Model Outputs

CVAT-provided decoder.onnx:

  • masks: uint8[Slicemasks_dim_0,Slicemasks_dim_1,Slicemasks_dim_2,Slicemasks_dim_3]
  • iou_predictions: float32[Unsqueezeiou_predictions_dim_0,1]
  • low_res_masks: float32[Unsqueezelow_res_masks_dim_0,1,Unsqueezelow_res_masks_dim_2,Unsqueezelow_res_masks_dim_3]
  • xtl, ytl, xbr, ybr: int64

My Exported Model:

  • masks: float32[Resizemasks_dim_0,Resizemasks_dim_1,Resizemasks_dim_2,Resizemasks_dim_3]
  • iou_predictions: float32[Gemmiou_predictions_dim_0,4]
  • low_res_masks: float32[Reshapelow_res_masks_dim_0,Reshapelow_res_masks_dim_1,Reshapelow_res_masks_dim_2,Reshapelow_res_masks_dim_3]

Analysis & Suspicions

It appears that Segment Anything’s export process for the ViT-B decoder does not currently include bounding box outputs (xtl, ytl, xbr, ybr). Additionally, I am encountering issues when trying to quantize the model using the provided script in the Segment Anything repository.

Questions / Request for Guidance

  • Are there any recommended modifications or additional export steps to ensure compatibility when switching from a ViT-H to a ViT-B backbone?
  • Could you please provide the script used to export the onnx model?
  • Any guidance or suggestions to resolve these discrepancies would be greatly appreciated.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions