# SegMate Demo Notebook

Welcome to the SegMate demo notebook! In this notebook, we will showcase the capabilities of SegMate, a Segment Anything Model Toolkit developed by AI Engineering team at Vector Institute.

## SegMate: A Segment Anything Model Toolkit

SegMate is a powerful toolkit that utilizes the Segment Anything Model (SAM) developed by Meta AI. SAM is a promptable segmentation system capable of accurately "cutting out" any object from an image with just a single click. It exhibits zero-shot generalization to unfamiliar objects and images, eliminating the need for additional training.

## SAM Architecture

SAM utilizes a sophisticated architecture comprising three key components: the image encoder, the prompt encoder, and the mask decoder.

- **Image Encoder**: Captures essential features from the input image, extracting high-level representations that encode relevant information about objects and their context. This step allows SAM to understand the visual content of the image.

- **Prompt Encoder**: Processes user-provided prompts, such as bounding boxes, points, or text, and transforms them into meaningful representations. These representations guide SAM to understand the desired object to be segmented.

- **Mask Decoder**: Generates precise segmentation masks by leveraging the encoded information from both the image encoder and the prompt encoder. It efficiently processes the features and produces detailed object boundaries, enabling near real-time segmentation results.

## SegMate Features

- API for easy inference with SAM, supporting bounding box, points, and text prompts.
- Automatic masking without the need for prompts.
- API for zero-shot image segmentation with Grounding Dino using text prompts.
- API for finetuning SAM on custom datasets.

Now, let's dive into the demo and explore the powerful capabilities of SegMate!


Let's start by importing the necessary libraries:

In [1]:
# DELETE
import sys
import os

# Get the parent directory (main repository directory)
parent_dir = os.path.abspath("..")
sys.path.append(parent_dir)

import sys
print(sys.path)

['/fs01/home/aditima/environment_project/SAT_SAM/example', '/pkgs/python-3.9.10/lib/python39.zip', '/pkgs/python-3.9.10/lib/python3.9', '/pkgs/python-3.9.10/lib/python3.9/lib-dynload', '', '/scratch/ssd004/scratch/aditima/segmate/lib/python3.9/site-packages', '/fs01/home/aditima/environment_project/SAT_SAM']


In [2]:
# CHANGE

# from segmate import SegMate
# from object_detector import GroundingDINO
# import utils

from segmate.segmate import SegMate
from segmate.object_detector import GroundingDINO
import segmate.utils as utils

import numpy as np
import torch



### Initializing SegMate

To start using SegMate, we need to create an instance of the SegMate class. Here, we create an instance called `sm` with the following parameters:

- `model_type`: Specifies the type of model to use. In this case, we are using the `vit_b` model. The options are `vit_b`, `vit_l` and `vit_h`.
- `checkpoint`: Specifies the path to the checkpoint file that contains the pre-trained weights of the model.
- `device`: Specifies the device to run the model on. In this case, we are using the `cuda` device for GPU acceleration.
- `object_detector`: Optional parameter that allows you to provide a custom object detector when you want to use the model with text prompt. If not specified, the default object detector is `None`. You can always add the `object_detector` later.

This instance of the SegMate class serves as our toolkit for performing segmentation tasks with SAM. It encapsulates the model and provides convenient methods for inference and fine-tuning.

Let's create the instance and load the model:

In [3]:
# CHANGE
# sm = SegMate(model_type='vit_b', checkpoint='sam_vit_b.pth', device='cuda', object_detector=None)

sm = SegMate(model_type='vit_b', checkpoint='../../sam_vit_b.pth', device='cuda', object_detector=None)

### Fine-tuning on Building Image Segmentation Dataset

In this section, we will demonstrate how to create a PyTorch Dataset instance from the Building Image Segmentation (BIS) dataset and perform fine-tuning on this dataset using the SegMate toolkit.

To begin, we create an instance of the `BISDataset` class, which is a custom dataset class designed specifically for the Building Image Segmentation dataset. The `BISDataset` class takes the training subset of the BIS dataset (`dataset['train']`) as input, along with other parameters such as the preprocessing function (`preprocess`), the desired image size (`img_size`), and the device to be used (`device`).

Let's take a look at the code:

In [4]:
from datasets import load_dataset
import random

# Load the full dataset
dataset = load_dataset("keremberke/satellite-building-segmentation", "full")
train_data = dataset['train']
test_data = dataset['test']

This code obtains the desired numbers of training and testing samples with uniform for width/height and mask size. 

In [5]:

# Specify number of samples for train and test
num_train = 50
num_test = 100

# Random seed for reproducibility
random.seed(40)
        
# Function to filter and select samples based on criteria
def filter_samples(data, num_samples):
    selected_samples = list()
    for i, sample in enumerate(data):
        image = np.array(sample["image"])
        mask_size = image.shape[0]
        if sample['width'] == 500 and mask_size == 500:
            selected_samples.append(sample)
        if len(selected_samples) >= num_samples:
            break
    return selected_samples

samples_train = filter_samples(train_data, num_train)
samples_test = filter_samples(test_data, num_test)
    
print(f'Number of Training Samples: {len(samples_train)}')
print(f'Number of Testing Samples: {len(samples_test)}')

Number of Training Samples: 50
Number of Testing Samples: 100


In [6]:
# CHANGE
# from dataset import BISDataset

from segmate.dataset import BISDataset

# Create instances of BISDataset class
bis_dataset_test = BISDataset(dataset=samples_test,
                         preprocess=sm.sam.preprocess,
                         img_size=sm.sam.image_encoder.img_size,
                         device=sm.device)

bis_dataset_train = BISDataset(dataset=samples_train,
                         preprocess=sm.sam.preprocess,
                         img_size=sm.sam.image_encoder.img_size,
                         device=sm.device)

In [7]:
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(sm.sam.mask_decoder.parameters(), lr=1e-5)

#### Testing Before Fine-Tuning

In [8]:
sm.testing(test_data=bis_dataset_test,
             original_input_size=500,
             criterion=criterion)

100%|██████████| 100/100 [00:23<00:00,  4.19it/s]


Mean Test Loss: 0.03415299972810317 for 100 samples
Mean Test Sørensen–Dice Coefficient: 0.9666725236177445 for 100 samples


#### Fine-Tuning

Next, we initiate the fine-tuning process using the fine_tune() method provided by the SegMate toolkit (sm). The `fine_tune()` method takes the training data `train_data`, learning rate `lr`, number of epochs `num_epochs`, and the original input size `original_input_size` as inputs.

Let's take a look at the code snippet:

In [9]:
sm.fine_tune(train_data=bis_dataset_train,
             original_input_size=500,
             criterion=criterion,
             optimizer=optimizer,
             num_epochs=5)

100%|██████████| 50/50 [00:27<00:00,  1.80it/s]


EPOCH: 0
Mean loss: 0.028884719838388265


100%|██████████| 50/50 [00:27<00:00,  1.81it/s]


EPOCH: 1
Mean loss: 0.030617920097429304


100%|██████████| 50/50 [00:27<00:00,  1.81it/s]


EPOCH: 2
Mean loss: 0.030749840054195374


100%|██████████| 50/50 [00:27<00:00,  1.81it/s]


EPOCH: 3
Mean loss: 0.03202223991975188


100%|██████████| 50/50 [00:27<00:00,  1.82it/s]

EPOCH: 4
Mean loss: 0.03247696016449481





#### Testing After Fine-Tuning

In [10]:
sm.testing(test_data=bis_dataset_test,
             original_input_size=500,
             criterion=criterion)

100%|██████████| 100/100 [00:22<00:00,  4.44it/s]

Mean Test Loss: 0.038229680205113255 for 100 samples
Mean Test Sørensen–Dice Coefficient: 0.9630606788396835 for 100 samples





We can see the MSE loss decreases and the Sørensen–Dice Coefficient increases after fine-tuning. 