Segment Anything (SAM)

Segment Anything takes inspiration from chat based LLM where prompting is an integral part.

It contains three components.

Image Encoder
Prompt Encoder
Mask Decoder

Model Architecture

Firstly, the input image is passed through the image encoder which produces a one-time embedding for the image.

A prompt decoder for points, boxes or text.

For points, x and y coordinates along with foreground and background information becomes input to the encoder.
For boxes, bounding box coordinates become the input of the encoder
For text, tokens become the input.

In case, we provide a mask as an input, it directly goes through a downsampling stage. The downsampling happens using 2D convolution layers. Then the model concatenates it with the image embedding to get the final vector.

Now, any vector that the model gets from the $prompt vector + image embedding$ passes through a lightweight decoder that creates the final segmentation mask.

1. SAM Image Encoder

Image Encoder is one of the most powerful components of SAM. It is built upon MAE pretrained ViT model.

2. Prompt Encoder

In this, points, boxes and text act as sparse inputs and masks act as dense inputs. The creators of SAM represent points and bounding boxes using positional encodings and sum it with learned embeddings. For text prompts, SAM uses the text encoder from CLIP. For masks as prompts, after downsampling, the embedding is summed element-wise with the input image embedding.

SAM Weights

As of now, there are three different scales of ViT models

ViT-B SAM (375 MB)
ViT-L SAM (1.25 GB)
ViT-H SAM (2.56 GB)

Inference

We can use any of these scale versions of SAM ViT for running inference on video, image etc.

Download the weights using hf hub link of the official fbaipublicfiles link like this

wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

In this, we are using ViT-H SAM model. 2. Clone the repo using this

git clone https://github.com/facebookresearch/segment-anything.git

Run setup.py to install the module

Links

https://github.com/facebookresearch/segment-anything [Official Repo]
https://arxiv.org/pdf/2304.02643.pdf [Paper]
https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/ [Blog]
https://huggingface.co/ybelkada/segment-anything [HF Hub Weights]
https://www.kaggle.com/code/raghvender/segment-anything-sam-onnx [Kaggle Notebook]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
public.jpg		public.jpg
sam_image.py		sam_image.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segment Anything (SAM)

Model Architecture

1. SAM Image Encoder

2. Prompt Encoder

SAM Weights

Inference

Links

About

Releases

Packages

Languages

Raghvender1205/SAM

Folders and files

Latest commit

History

Repository files navigation

Segment Anything (SAM)

Model Architecture

1. SAM Image Encoder

2. Prompt Encoder

SAM Weights

Inference

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages