# AutoMM for Semantic Segmentation - Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/image_segmentation/beginner_semantic_seg.ipynb)
[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/image_prediction/beginner_semantic_seg.ipynb)


Semantic Segmentation is a computer vision task in which the goal is to produce a dense pixel-wise segmentation map of an image, where each pixel is assigned to a specific class or object. It is used to recognize a collection of pixels that form distinct categories. For example, an autonomous vehicle needs to identify vehicles, pedestrians, traffic signs, pavement, and other road features.

Segment Anything Model (SAM) is a foundation model that was pretrained on large-scale segmentation data with 1B masks and 11M images. Despite its excellent zero-shot performance on generic scenes, it faces challenges to generalize to specialized domain, such as remote sensing, medical imagery, agriculture, and manufacturing, due to the domain shift issue. Fortunately, AutoMM can help bridge the domain gap by finetuning SAM on domain-specific data.

In this quick start, we'll show how to use AutoMM to finetune SAM. Once the data is prepared in [Pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) format, a single call to `MultiModalPredictor.fit()` will take care of the model training for you.


## Prepare Data

For demonstration purposes, we use a subset of the [Shopee-IET dataset](https://www.kaggle.com/c/shopee-iet-machine-learning-competition/data) from Kaggle.
Each image in this data depicts a clothing item and the corresponding label specifies its clothing category.
Our subset of the data contains the following possible labels: `BabyPants`, `BabyShirt`, `womencasualshoes`, `womenchiffontop`.

We can load a dataset by downloading a url data automatically:

In [None]:
!pip install autogluon.multimodal


In [None]:
import os
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from autogluon.core.utils.loaders import load_zip

def file2id(folder_path, file_path, split_str="_"):
    image_id = os.path.normpath(os.path.relpath(file_path, start=folder_path))
    if split_str in image_id:
        image_id = os.path.splitext(image_id)[0].split(split_str)[0]
    else:
        image_id = os.path.splitext(image_id)[0]
    return image_id

def get_file_paths(directory, split_str="_"):
    file_paths = sorted(os.listdir(directory), key=lambda file_path: file2id(directory, file_path, split_str))
    return [os.path.join(directory, file_path) for file_path in file_paths]

zip_file = "https://automl-mm-bench.s3.amazonaws.com/unit-tests/tiny_isic2017.zip"
download_dir = "./automm_tutorial_semantic_seg"
load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "tiny_isic2017")
train_img_files = get_file_paths(os.path.join(data_dir, "train/ISIC-2017_Train"))
train_gt_files = get_file_paths(os.path.join(data_dir, "train/ISIC-2017_Training_Part1_GroundTruth"))
val_img_files = get_file_paths(os.path.join(data_dir, "val/ISIC-2017_Val"))
val_gt_files = get_file_paths(os.path.join(data_dir, "val/ISIC-2017_Validation_Part1_GroundTruth"))
test_img_files = get_file_paths(os.path.join(data_dir, "test/ISIC-2017_Test"))
test_gt_files = get_file_paths(os.path.join(data_dir, "test/ISIC-2017_Test_v2_Part1_GroundTruth"))

train_data = pd.DataFrame({"image": train_img_files, "label": train_gt_files})
val_data = pd.DataFrame({"image": val_img_files, "label": val_gt_files})
test_data = pd.DataFrame({"image": test_img_files, "label": test_gt_files})

We can see there are 800 rows and 2 columns in this training dataframe. The 2 columns are **image** and **label**, and the **image** column contains the absolute paths of the images. Each row represents a different training sample.

In addition to image paths, `MultiModalPredictor` also supports image bytearrays during training and inference. We can load the dataset with bytearrays with the option `is_bytearray` set to `True`:

## Initialize AutoMM

In [None]:
from autogluon.multimodal import MultiModalPredictor
import uuid
save_path = f"./tmp/{uuid.uuid4().hex}-automm_semantic_seg"
predictor = MultiModalPredictor(label="label", path=save_path)

## Zero Shot Evaluation

In [None]:
scores = predictor.evaluate(test_data, metrics=["iou"])
print(scores)

## Train Model (Finetune SAM)

Now, we fit a classifier using AutoMM as follows:

In [None]:
predictor.fit(
    train_data=train_data,
    time_limit=30, # seconds
)

**label** is the name of the column that contains the target variable to predict, e.g., it is "label" in our example. **path** indicates the directory where models and intermediate outputs should be saved. We set the training time limit to 30 seconds for demonstration purpose, but you can control the training time by setting configurations. To customize AutoMM, please refer to [Customize AutoMM](../advanced_topics/customization.ipynb).


## Evaluate on Test Data

You can evaluate the classifier on the test dataset to see how it performs, the test top-1 accuracy is:

In [None]:
scores = predictor.evaluate(test_data, metrics=["iou"])

## Predict on a New Image

Given an example image, let's visualize it first,

In [None]:
image_path = test_data.iloc[0]['image']
from IPython.display import Image, display
pil_img = Image(filename=image_path)
display(pil_img)

We can easily use the final model to `predict` the label,

In [None]:
predictions = predictor.predict({'image': [image_path]})
print(predictions)

If probabilities of all categories are needed, you can call `predict_proba`:

## Save and Load

The trained predictor is automatically saved at the end of `fit()`, and you can easily reload it.

```{warning}

`MultiModalPredictor.load()` uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source, or that could have been tampered with. **Only load data you trust.**

```

In [None]:
loaded_predictor = MultiModalPredictor.load(model_path)
load_proba = loaded_predictor.predict_proba({'image': [image_path]})
print(load_proba)

We can see the predicted class probabilities are still the same as above, which means same model!

## Other Examples

You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM.

## Customization
To learn how to customize AutoMM, please refer to [Customize AutoMM](../advanced_topics/customization.ipynb).