Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Idea: Incorporate "Segment Anything" #5984

Closed
M-Colley opened this issue Apr 5, 2023 · 14 comments · Fixed by #6019
Closed

Feature Idea: Incorporate "Segment Anything" #5984

M-Colley opened this issue Apr 5, 2023 · 14 comments · Fixed by #6019
Labels

Comments

@M-Colley
Copy link

M-Colley commented Apr 5, 2023

Hello, it is great that you support out-of-the-box models like YoloV7, do you also plan to include the latest FAI model "Segment-Anything"? I think that could be very helpful!

https://github.com/facebookresearch/segment-anything

Kind regards

@timmermansjoy
Copy link

@M-Colley since they support hugging face and roboflow models you could also just make the SAM model available there. And then just import it.

However because this is such a strong model, they should add it to the models imo

@nmanovic
Copy link
Contributor

nmanovic commented Apr 6, 2023

@M-Colley , we are discussing how to do that. I agree that the model is very strong. Thanks for the heads up!

@nmanovic nmanovic added the models label Apr 6, 2023
@medphisiker
Copy link

@M-Colley , we are discussing how to do that. I agree that the model is very strong. Thanks for the heads up!

Thank you, that would be fantastic !

@M-Colley
Copy link
Author

Very cool!

I came across this additional project that combines BLIP, GroundingDINO and stable-diffusion: https://github.com/IDEA-Research/Grounded-Segment-Anything

Might be worth also taking a look at :)

Kind regards

@anuragxel
Copy link

I wrote a simple labelling tool on top of SAM, I think CVAT really needs this as a feature, it'll help a lot of people. Feel free to attribute and borrow helpers from my tool if needed:

https://github.com/anuragxel/salt

@bsekachev
Copy link
Member

Hi guys, we implemented the first prototype here: #6008

This should work well on GPU for a self-hosted solution.
For our platform we are going to find a better solution because it is not gonna work there in current architecture because of a lot of customers.

@modyngs
Copy link

modyngs commented Apr 12, 2023

This one is also for Video:
https://github.com/kadirnar/segment-anything-video

nmanovic pushed a commit that referenced this issue Apr 12, 2023
Idea of the PR is to finish this one #5990
Deploy for GPU: ``./deploy_gpu.sh pytorch/facebookresearch/sam/nuclio/``
Deploy for CPU: ``./deploy_cpu.sh pytorch/facebookresearch/sam/nuclio/``


If you want to use GPU, be sure you setup docker for this
[guide](https://github.com/NVIDIA/nvidia-docker/blob/master/README.md#quickstart).

Resolved issue #5984
But the interface probably can be improved

Co-authored-by: Alx-Wo <alexander.wolpert@googlemail.com>
@medphisiker
Copy link

Hi guys, we implemented the first prototype here: #6008

This should work well on GPU for a self-hosted solution. For our platform we are going to find a better solution because it is not gonna work there in current architecture because of a lot of customers.

Thank you very much for integrating this neural network! Works like fBRs, but much more accurate. It's great that it has an inference on both CPU and GPU.

@modyngs
Copy link

modyngs commented Apr 15, 2023

@bsekachev
Is there any plan to implement in tracker mode?
Thanks

@medphisiker
Copy link

medphisiker commented Apr 17, 2023

@bsekachev Is there any plan to implement in tracker mode? Thanks

Also, there is a very cool XMem model for tracking masks (link).
There are very cool video demonstrations that look fantastic.
I wrote about it in this issue (link).

@descilla
Copy link

descilla commented Apr 17, 2023

First of all, thank you for the quick integration of SAM. SAM really seems to be a huge breakthrough.

Unfortunately, at the moment, only positive and negative points can be used. However,SAM also supports the use of bounding boxes and the combination of bounding boxes and points.

I played around with it a bit (adjusted the serverless function) and was able to use bounding boxes. However, with the following limitations:

  • At least one additional point must always be set for the function to be "triggered".
  • The bounding box is only used visible in the first iteration; it disappears when adding more points.

Of course, it could be that I am just misunderstood something, but I assume that these are limitations in the CVAT interface for serverless functions, as I could only find the three parameters min_pos_points, min_neg_points, and startswith_box.

Do you think there is hope that the CVAT interface can be adapted/expanded to make full use of SAM's capabilities? The use of (additional) bounding boxes seems to be able to significantly improve the results in my use case.

@shortcipher3
Copy link
Contributor

Track Anything would be super cool too:
https://github.com/gaomingqi/Track-Anything

bsekachev added a commit that referenced this issue May 11, 2023
<!-- Raise an issue to propose your change
(https://github.com/opencv/cvat/issues).
It helps to avoid duplication of efforts from multiple independent
contributors.
Discuss your ideas with maintainers to be sure that changes will be
approved and merged.
Read the [Contribution
guide](https://opencv.github.io/cvat/docs/contributing/). -->

<!-- Provide a general summary of your changes in the Title above -->

### Motivation and context
Resolved #5984 
Resolved #6049
Resolved #6041

- Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to
re-export ONNX mask decoder with some custom model changes (see below)
to support other weights (or just download them using links below)
- Need to redeploy the serverless function because its interface has
been changed.

Decoders for other weights:
sam_vit_l_0b3195.pth:
[Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing)
sam_vit_b_01ec64.pth:
[Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing)

Changes done in ONNX part:
```
git diff scripts/export_onnx_model.py
diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py
index 8441258..18d5be7 100644
--- a/scripts/export_onnx_model.py
+++ b/scripts/export_onnx_model.py
@@ -138,7 +138,7 @@ def run_export(

     _ = onnx_model(**dummy_inputs)

-    output_names = ["masks", "iou_predictions", "low_res_masks"]
+    output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"]

     with warnings.catch_warnings():
         warnings.filterwarnings("ignore", category=torch.jit.TracerWarning)
bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py
diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py
index 3196bdf..85729c1 100644
--- a/segment_anything/utils/onnx.py
+++ b/segment_anything/utils/onnx.py
@@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module):
         orig_im_size = orig_im_size.to(torch.int64)
         h, w = orig_im_size[0], orig_im_size[1]
         masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False)
-        return masks
+        masks = torch.gt(masks, 0).to(torch.uint8)
+        nonzero = torch.nonzero(masks)
+        xindices = nonzero[:, 3:4]
+        yindices = nonzero[:, 2:3]
+        ytl = torch.min(yindices).to(torch.int64)
+        ybr = torch.max(yindices).to(torch.int64)
+        xtl = torch.min(xindices).to(torch.int64)
+        xbr = torch.max(xindices).to(torch.int64)
+        return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr

     def select_masks(
         self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int
@@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module):
         if self.return_single_mask:
             masks, scores = self.select_masks(masks, scores, point_coords.shape[1])

-        upscaled_masks = self.mask_postprocessing(masks, orig_im_size)
+        upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size)

         if self.return_extra_metrics:
             stability_scores = calculate_stability_score(
@@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module):
             areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1)
             return upscaled_masks, scores, stability_scores, areas, masks

-        return upscaled_masks, scores, masks
+        return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr
```

### How has this been tested?
<!-- Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc. -->

### Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply.
If an item isn't applicable for some reason, then ~~explicitly
strikethrough~~ the whole
line. If you don't do that, GitHub will show incorrect progress for the
pull request.
If you're unsure about any of these, don't hesitate to ask. We're here
to help! -->
- [x] I submit my changes into the `develop` branch
- [x] I have added a description of my changes into the
[CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md)
file
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [x] I have linked related issues (see [GitHub docs](

https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword))
- [x] I have increased versions of npm packages if it is necessary

([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning),

[cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning),

[cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning)
and

[cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning))

### License

- [x] I submit _my code changes_ under the same [MIT License](
https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the
project.
  Feel free to contact the maintainers if that's a concern.
@bsekachev
Copy link
Member

Hi @descilla

Thank you for reporting.
Let's have a dedicated issue about bboxes support and why it is necessary.

@bsekachev
Copy link
Member

Hi @shortcipher3

Let's also have another issue about SAM tracker if necessary.

mikhail-treskin pushed a commit to retailnext/cvat that referenced this issue Jul 1, 2023
Idea of the PR is to finish this one cvat-ai#5990
Deploy for GPU: ``./deploy_gpu.sh pytorch/facebookresearch/sam/nuclio/``
Deploy for CPU: ``./deploy_cpu.sh pytorch/facebookresearch/sam/nuclio/``


If you want to use GPU, be sure you setup docker for this
[guide](https://github.com/NVIDIA/nvidia-docker/blob/master/README.md#quickstart).

Resolved issue cvat-ai#5984
But the interface probably can be improved

Co-authored-by: Alx-Wo <alexander.wolpert@googlemail.com>
mikhail-treskin pushed a commit to retailnext/cvat that referenced this issue Jul 1, 2023
<!-- Raise an issue to propose your change
(https://github.com/opencv/cvat/issues).
It helps to avoid duplication of efforts from multiple independent
contributors.
Discuss your ideas with maintainers to be sure that changes will be
approved and merged.
Read the [Contribution
guide](https://opencv.github.io/cvat/docs/contributing/). -->

<!-- Provide a general summary of your changes in the Title above -->

### Motivation and context
Resolved cvat-ai#5984 
Resolved cvat-ai#6049
Resolved cvat-ai#6041

- Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to
re-export ONNX mask decoder with some custom model changes (see below)
to support other weights (or just download them using links below)
- Need to redeploy the serverless function because its interface has
been changed.

Decoders for other weights:
sam_vit_l_0b3195.pth:
[Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing)
sam_vit_b_01ec64.pth:
[Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing)

Changes done in ONNX part:
```
git diff scripts/export_onnx_model.py
diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py
index 8441258..18d5be7 100644
--- a/scripts/export_onnx_model.py
+++ b/scripts/export_onnx_model.py
@@ -138,7 +138,7 @@ def run_export(

     _ = onnx_model(**dummy_inputs)

-    output_names = ["masks", "iou_predictions", "low_res_masks"]
+    output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"]

     with warnings.catch_warnings():
         warnings.filterwarnings("ignore", category=torch.jit.TracerWarning)
bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py
diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py
index 3196bdf..85729c1 100644
--- a/segment_anything/utils/onnx.py
+++ b/segment_anything/utils/onnx.py
@@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module):
         orig_im_size = orig_im_size.to(torch.int64)
         h, w = orig_im_size[0], orig_im_size[1]
         masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False)
-        return masks
+        masks = torch.gt(masks, 0).to(torch.uint8)
+        nonzero = torch.nonzero(masks)
+        xindices = nonzero[:, 3:4]
+        yindices = nonzero[:, 2:3]
+        ytl = torch.min(yindices).to(torch.int64)
+        ybr = torch.max(yindices).to(torch.int64)
+        xtl = torch.min(xindices).to(torch.int64)
+        xbr = torch.max(xindices).to(torch.int64)
+        return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr

     def select_masks(
         self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int
@@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module):
         if self.return_single_mask:
             masks, scores = self.select_masks(masks, scores, point_coords.shape[1])

-        upscaled_masks = self.mask_postprocessing(masks, orig_im_size)
+        upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size)

         if self.return_extra_metrics:
             stability_scores = calculate_stability_score(
@@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module):
             areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1)
             return upscaled_masks, scores, stability_scores, areas, masks

-        return upscaled_masks, scores, masks
+        return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr
```

### How has this been tested?
<!-- Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc. -->

### Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply.
If an item isn't applicable for some reason, then ~~explicitly
strikethrough~~ the whole
line. If you don't do that, GitHub will show incorrect progress for the
pull request.
If you're unsure about any of these, don't hesitate to ask. We're here
to help! -->
- [x] I submit my changes into the `develop` branch
- [x] I have added a description of my changes into the
[CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md)
file
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [x] I have linked related issues (see [GitHub docs](

https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword))
- [x] I have increased versions of npm packages if it is necessary

([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning),

[cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning),

[cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning)
and

[cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning))

### License

- [x] I submit _my code changes_ under the same [MIT License](
https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the
project.
  Feel free to contact the maintainers if that's a concern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
9 participants