# AutoMM Detection

Our goal is to fast finetune a pretrained model on a dataset in COCO format, and evaluate on its test set. Both training and test sets are in COCO format.

In [None]:
from autogluon.multimodal import MultiModalPredictor
import os

We using COCO format dataset, the input is the json annotation file of the dataset split. In this example, trainval_cocoformat.json is the annotation file of the train-and-validate split, and text_cocoformat.json is annotation file of the test split.

In [None]:
data_dir = os.path.join(download_dir, "tiny_motorbike")
train_path = os.path.join(data_dir, "Annotations", "trainval_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "test_cocoformat.json")

## Creating the MultiModalPredictor

We select the `"medium_quality"` presets, which uses a YOLOX-large model pretrained on COCO dataset. This preset is fast to finetune or inference,
and easy to deploy. We also provide presets `"high_quality"` with a DINO-Resnet50 model and `"best quality"` with a DINO-SwinL model, with much higher performance but also slower and with higher GPU memory usage.ovide presets high_quality with a DINO-Resnet50 model and best_quality with a DINO-SwinL model, with much hgiher performance but also slower and with higher GPU memory usage.

In [None]:
presets = 'medium_quality'

We create the MultiModalPredictor with selected presets. 
We need to specify the problem_type to `"object_detection"` 
and also provide a `sample_data_path` for the predictor to infer the catgories of the datase .
Here we provide the `train_path`, and it also works using any other split of this datas t.
And we also provide a `path` to save the predic r. 
It will be saved to a automatically generated directory with timestamp under `AutogluonModels` if `path` is not specified.

In [None]:
predictor = MultiModalPredictor(
    problem_type="object_detection",
    sample_data_path=train_path,
    presets=presets,
)

## Finetuning the Model

Learning rate, number of epochs, and batch_size are included in the presets, and thus no need to specif .
Note that we use a two-stage learning rate option during finetuning by defau t,
and the model head will have 100x learning r te.
Using a two-stage learning rate with high learning rate only on head layers  akes
the model converge faster during finetuning. It usually gives better performance as well,
especially on small datasets with hundreds or thousands of  mages.
We also com ute the time of the fit process here for better understanding thown below:

In [None]:
predictor.fit(train_path)  # Fit

## Evaluation

To evaluate the model we just trained, run following code.

And the evaluation results are shown in command line output. 
The first line is mAP in COCO standard, and the second line is mAP in VOC standard (or mAP50).  Note that for presenting a fast finetuning we use presets "medium_quality", you could get better result on this dataset by simply using "high_quality" or "best_quality" presets, 
or customize your own model and hyperparameter settings.atne_coco).

In [None]:
predictor.evaluate(test_path)

In [None]:
pred = predictor.predict(test_path, save_results=True)

## Visualizing Results

To visualize the detection bounding boxes, run the following:

In [None]:
from autogluon.multimodal.utils import ObjectDetectionVisualizer

conf_threshold = 0.4  # Specify a confidence threshold to filter out unwanted boxes
image_result = pred.iloc[30]

img_path = image_result.image  # Select an image to visualize

visualizer = ObjectDetectionVisualizer(img_path)  # Initialize the Visualizer
out = visualizer.draw_instance_predictions(image_result, conf_threshold=conf_threshold)  # Draw detections
visualized = out.get_image()  # Get the visualized image

from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized, 'RGB')
display(img)

## Testing on Your Own Data
You can also predict on your own images with various input format. The follow is an example:

Download the example image:

In [None]:
from autogluon.multimodal import download
image_url = "https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/detection/street_small.jpg"
test_image = download(image_url)

Run inference on data in a json file of COCO format.

In [None]:
import json

# create a input file for demo
data = {"images": [{"id": 0, "width": -1, "height": -1, "file_name": test_image}], "categories": []}
os.mkdir("input_data_for_demo")
input_file = "input_data_for_demo/demo_annotation.json"
with open(input_file, "w+") as f:
    json.dump(data, f)

pred_test_image = predictor.predict(input_file)
print(pred_test_image)

Run inference on data in a list of image file names:

In [None]:
conf_threshold = 0.4  # Specify a confidence threshold to filter out unwanted boxes
image_result = pred_test_image.iloc[0]

img_path = image_result.image  # Select an image to visualize
visualizer = ObjectDetectionVisualizer(img_path)  # Initialize the Visualizer
out = visualizer.draw_instance_predictions(image_result, conf_threshold=conf_threshold)  # Draw detections
visualized = out.get_image()  # Get the visualized image

from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized, 'RGB')
display(img)