*In this section, our goal is to fast finetune a pretrained model on a small dataset in COCO format, and evaluate on its test set. Both training and test sets are in COCO format.*

*To start, let's import MultiModalPredictor:*

In [None]:
!pip install autogluon.multimodal



*Make sure mmcv and mmdet are installed:*

In [None]:
!mim install "mmcv==2.1.0"
!pip install "mmdet==3.2.0"

Looking in links: https://download.openmmlab.com/mmcv/dist/cu121/torch2.3.0/index.html
Collecting mmcv==2.1.0
  Downloading mmcv-2.1.0.tar.gz (471 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.4/471.4 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting addict (from mmcv==2.1.0)
  Downloading addict-2.4.0-py3-none-any.whl.metadata (1.0 kB)
Collecting mmengine>=0.3.0 (from mmcv==2.1.0)
  Downloading mmengine-0.10.5-py3-none-any.whl.metadata (20 kB)
Collecting yapf (from mmcv==2.1.0)
  Downloading yapf-0.40.2-py3-none-any.whl.metadata (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.4/45.4 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Downloading mmengine-0.10.5-py3-none-any.whl (452 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m452.3/452.3 kB[0m [31m37.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading addict-2.4.0-py3-none-any.whl (3.8 kB)
Downloading 

In [None]:
!pip install torch==2.0.0+cu117 torchaudio==2.0.0 torchvision==0.15.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu117
Collecting torch==2.0.0+cu117
  Using cached https://download.pytorch.org/whl/cu117/torch-2.0.0%2Bcu117-cp310-cp310-linux_x86_64.whl (1843.9 MB)
Collecting torchvision==0.15.0+cu117
  Using cached https://download.pytorch.org/whl/cu117/torchvision-0.15.0%2Bcu117-cp310-cp310-linux_x86_64.whl (6.1 MB)
Collecting triton==2.0.0 (from torch==2.0.0+cu117)
  Using cached https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
Installing collected packages: triton, torch, torchvision
  Attempting uninstall: triton
    Found existing installation: triton 2.3.1
    Uninstalling triton-2.3.1:
      Successfully uninstalled triton-2.3.1
  Attempting uninstall: torch
    Found existing installation: torch 2.3.1
    Uninstalling torch-2.3.1:
      Successfully uninstalled torch-2.3.1
  Attempting uninstall: torchvision
    Found existing installation: torch

In [None]:
from autogluon.multimodal import MultiModalPredictor

  check_for_updates()


And also import some other packages that will be used

In [None]:
import os
import time

from autogluon.core.utils.loaders import load_zip

*We have the sample dataset ready in the cloud. Let's download it:*

In [None]:
zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip"
download_dir = "./tiny_motorbike_coco"

load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "tiny_motorbike")
train_path = os.path.join(data_dir, "Annotations", "trainval_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "test_cocoformat.json")

*We select the "medium_quality" presets, which uses a YOLOX-large model pretrained on COCO dataset. This preset is fast to finetune or inference, and easy to deploy.*

In [None]:
presets = "medium_quality"

*We create the MultiModalPredictor with selected presets. We need to specify the problem_type to "object_detection", and also provide a sample_data_path for the predictor to infer the catgories of the dataset. Here we provide the train_path, and it also works using any other split of this dataset. And we also provide a path to save the predictor. It will be saved to a automatically generated directory with timestamp under AutogluonModels if path is not specified.*

In [None]:
!pip install -U openmim
!mim install "mmcv==2.1.0"

Looking in links: https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.0/index.html
Collecting mmcv==2.1.0
  Downloading https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.0/mmcv-2.1.0-cp310-cp310-manylinux1_x86_64.whl (98.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.5/98.5 MB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting addict (from mmcv==2.1.0)
  Using cached addict-2.4.0-py3-none-any.whl.metadata (1.0 kB)
Collecting mmengine>=0.3.0 (from mmcv==2.1.0)
  Using cached mmengine-0.10.5-py3-none-any.whl.metadata (20 kB)
Collecting yapf (from mmcv==2.1.0)
  Using cached yapf-0.40.2-py3-none-any.whl.metadata (45 kB)
Using cached mmengine-0.10.5-py3-none-any.whl (452 kB)
Using cached addict-2.4.0-py3-none-any.whl (3.8 kB)
Using cached yapf-0.40.2-py3-none-any.whl (254 kB)
Installing collected packages: addict, yapf, mmengine, mmcv
Successfully installed addict-2.4.0 mmcv-2.1.0 mmengine-0.10.5 yapf-0.40.2


In [None]:
# Init predictor
import uuid

model_path = f"./tmp/{uuid.uuid4().hex}-quick_start_tutorial_temp_save"

predictor = MultiModalPredictor(
    problem_type="object_detection",
    sample_data_path=train_path,
    presets=presets,
    path=model_path,
)

*Finetuning the Model*

In [None]:
!mim install "mmengine"

Looking in links: https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.0/index.html


In [None]:
# Init predictor
import uuid
import mmengine # Explicitly import mmengine

model_path = f"./tmp/{uuid.uuid4().hex}-quick_start_tutorial_temp_save"

predictor = MultiModalPredictor(
    problem_type="object_detection",
    sample_data_path=train_path,
    presets=presets,
    path=model_path,
)

In [None]:
import time # Import the time module to record timestamps

# Record start time
start = time.time()

# ... your finetuning code ...

# Record end time
train_end = time.time()

print("This finetuning takes %.2f seconds." % (train_end - start))

This finetuning takes 0.00 seconds.


*To evaluate the model we just trained, run following code.*

In [None]:
# Init predictor
import uuid
import mmengine # Explicitly import mmengine
import os # Import os to manipulate file paths

model_path = f"./tmp/{uuid.uuid4().hex}-quick_start_tutorial_temp_save"

# Check if the model directory exists. If not, create it
if not os.path.exists(model_path):
    os.makedirs(model_path)

predictor = MultiModalPredictor(
    problem_type="object_detection",
    sample_data_path=test_path,
    presets=presets,
    path=model_path,
)
predictor.evaluate(test_path)
eval_end = time.time()

Loads checkpoint by local backend from path: yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth




Loads checkpoint by local backend from path: yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth


Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `root=...` if you feel it is wrong...


The model and loaded state dict do not match exactly

size mismatch for bbox_head.multi_level_conv_cls.0.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([10, 256, 1, 1]).
size mismatch for bbox_head.multi_level_conv_cls.0.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for bbox_head.multi_level_conv_cls.1.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([10, 256, 1, 1]).
size mismatch for bbox_head.multi_level_conv_cls.1.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for bbox_head.multi_level_conv_cls.2.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([10, 256, 1, 1]).
size mismatch for bbox_head.multi_level_conv

INFO: You are using a CUDA device ('NVIDIA A100-SXM4-40GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision


*Print out the evaluation time*

In [None]:
print("The evaluation takes %.2f seconds." % (eval_end - train_end))

*We can load a new predictor with previous save_path, and we can also reset the number of GPUs to use if not all the devices are available*

In [None]:
# Load and reset num_gpus
new_predictor = MultiModalPredictor.load(model_path)
new_predictor.set_num_gpus(1)

In [None]:
# Evaluate new predictor
new_predictor.evaluate(test_path)

In [None]:
pred = predictor.predict(test_path)
print(pred)

In [None]:
pred = predictor.predict(test_path, save_results=True)

*Visualizing Results*

In [None]:
!pip install opencv-python

In [None]:
from autogluon.multimodal.utils import ObjectDetectionVisualizer

conf_threshold = 0.4  # Specify a confidence threshold to filter out unwanted boxes
image_result = pred.iloc[30]

img_path = image_result.image  # Select an image to visualize

visualizer = ObjectDetectionVisualizer(img_path)  # Initialize the Visualizer
out = visualizer.draw_instance_predictions(image_result, conf_threshold=conf_threshold)  # Draw detections
visualized = out.get_image()  # Get the visualized image

from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized, 'RGB')
display(img)

*Testing on Your Own Data*

In [None]:
from autogluon.multimodal import download
image_url = "https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/detection/street_small.jpg"
test_image = download(image_url)

In [None]:
import json

# create a input file for demo
data = {"images": [{"id": 0, "width": -1, "height": -1, "file_name": test_image}], "categories": []}
os.mkdir("input_data_for_demo")
input_file = "input_data_for_demo/demo_annotation.json"
with open(input_file, "w+") as f:
    json.dump(data, f)

pred_test_image = predictor.predict(input_file)
print(pred_test_image)

In [None]:
pred_test_image = predictor.predict([test_image])
print(pred_test_image)