title	titleSuffix	description	author	ms.author	ms.service	ms.subservice	ms.topic	ms.custom	ms.reviewer	ms.date
Prepare data for computer vision tasks	Azure Machine Learning	Image data preparation for Azure Machine Learning automated ML to train computer vision models on classification, object detection, and segmentation	vadthyavath	rvadthyavath	machine-learning	automl	how-to	template-how-to, update-code, sdkv2,	ssalgado	03/26/2024

Prepare data for computer vision tasks with automated machine learning

[!INCLUDE dev v2]

Important

Support for training computer vision models with automated ML in Azure Machine Learning is an experimental public preview feature. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, you learn how to prepare image data for training computer vision models with automated machine learning in Azure Machine Learning.

To generate models for computer vision tasks with automated machine learning, you need to bring labeled image data as input for model training in the form of an MLTable.

You can create an MLTable from labeled training data in JSONL format. If your labeled training data is in a different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and then create an MLTable. Alternatively, you can use Azure Machine Learning's data labeling tool to manually label images, and export the labeled data to use for training your AutoML model.

Prerequisites

Familiarize yourself with the accepted schemas for JSONL files for AutoML computer vision experiments.

Get labeled data

In order to train computer vision models using AutoML, you need to first get labeled training data. The images need to be uploaded to the cloud and label annotations need to be in JSONL format. You can either use the Azure Machine Learning Data Labeling tool to label your data or you could start with prelabeled image data.

Using Azure Machine Learning Data Labeling tool to label your training data

If you don't have prelabeled data, you can use Azure Machine Learning's data labeling tool to manually label images. This tool automatically generates the data required for training in the accepted format.

It helps to create, manage, and monitor data labeling tasks for

Image classification (multi-class and multi-label)
Object detection (bounding box)
Instance segmentation (polygon)

If you already have labeled data you want to use, you can export your labeled data as an Azure Machine Learning Dataset and then access the dataset under 'Datasets' tab in Azure Machine Learning studio. This exported dataset can then be passed as an input using azureml:<tabulardataset_name>:<version> format. Here's an example of how to pass existing dataset as input for training computer vision models.

Azure CLI

[!INCLUDE cli v2]

training_data:
  path: azureml:odFridgeObjectsTrainingDataset:1
  type: mltable
  mode: direct

Python SDK

[!INCLUDE sdk v2]

from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.ai.ml import Input

# Training MLTable with v1 TabularDataset
my_training_data_input = Input(
    type=AssetTypes.MLTABLE, path="azureml:odFridgeObjectsTrainingDataset:1",
    mode=InputOutputModes.DIRECT
)

Studio

Refer to CLI/SDK tabs for reference.

Using prelabeled training data from local machine

If you have labeled data that you would like to use to train your model, you need to upload the images to Azure. You can upload the your images to the default Azure Blob Storage of your Azure Machine Learning Workspace and register it as a data asset.

The following script uploads the image data on your local machine at path "./data/odFridgeObjects" to datastore in Azure Blob Storage. It then creates a new data asset with the name "fridge-items-images-object-detection" in your Azure Machine Learning Workspace.

If there already exists a data asset with the name "fridge-items-images-object-detection" in your Azure Machine Learning Workspace, it updates the version number of the data asset and points it to the new location where the image data uploaded.

Azure CLI

[!INCLUDE cli v2]

Create an .yml file with the following configuration.

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: fridge-items-images-object-detection
description: Fridge-items images Object detection
path: ./data/odFridgeObjects
type: uri_folder

To upload the images as a data asset, you run the following CLI v2 command with the path to your .yml file, workspace name, resource group, and subscription ID.

az ml data create -f [PATH_TO_YML_FILE] --workspace-name [YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP] --subscription [YOUR_AZURE_SUBSCRIPTION]

Python SDK

[!INCLUDE sdk v2]

[!Notebook-python[] (~/azureml-examples-main/sdk/python/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items/automl-image-object-detection-task-fridge-items.ipynb?name=upload-data)]

Studio

If you already have your data present in an existing datastore and want to create a data asset out of it, you can do so by providing the path to the data in the datastore, instead of providing the path of your local machine. Update the code above with the following snippet.

Azure CLI

[!INCLUDE cli v2]

Create an .yml file with the following configuration.

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: fridge-items-images-object-detection
description: Fridge-items images Object detection
path: azureml://subscriptions/<my-subscription-id>/resourcegroups/<my-resource-group>/workspaces/<my-workspace>/datastores/<my-datastore>/paths/<path_to_image_data_folder>
type: uri_folder

Python SDK

my_data = Data(
    path="azureml://subscriptions/<my-subscription-id>/resourcegroups/<my-resource-group>/workspaces/<my-workspace>/datastores/<my-datastore>/paths/<path_to_image_data_folder>",
    type=AssetTypes.URI_FOLDER,
    description="Fridge-items images Object detection",
    name="fridge-items-images-object-detection",
)

Studio

Next, you need to get the label annotations in JSONL format. The schema of labeled data depends on the computer vision task at hand. Refer to schemas for JSONL files for AutoML computer vision experiments to learn more about the required JSONL schema for each task type.

If your training data is in a different format (like, pascal VOC or COCO), helper scripts to convert the data to JSONL are available in notebook examples.

Once you created jsonl file following the above steps, you can register it as a data asset using UI. Make sure you select stream type in schema section as shown in this animation.

Using prelabeled training data from Azure Blob storage

If you have your labeled training data present in a container in Azure Blob storage, then you can access it directly from there by creating a datastore referring to that container.

Create MLTable

Once you have your labeled data in JSONL format, you can use it to create MLTable as shown in this yaml snippet. MLtable packages your data into a consumable object for training.

paths:
  - file: ./train_annotations.jsonl
transformations:
  - read_json_lines:
        encoding: utf8
        invalid_lines: error
        include_path_column: false
  - convert_column_types:
      - columns: image_url
        column_type: stream_info

You can then pass in the MLTable as a data input for your AutoML training job.

Next steps

Train computer vision models with automated machine learning.
Train a small object detection model with automated machine learning.
Tutorial: Train an object detection model (preview) with AutoML and Python.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how-to-prepare-datasets-for-automl-images.md

how-to-prepare-datasets-for-automl-images.md

Prepare data for computer vision tasks with automated machine learning

Prerequisites

Get labeled data

Using Azure Machine Learning Data Labeling tool to label your training data

Azure CLI

Python SDK

Studio

Using prelabeled training data from local machine

Azure CLI

Python SDK

Studio

Azure CLI

Python SDK

Studio

Using prelabeled training data from Azure Blob storage

Create MLTable

Next steps

Files

how-to-prepare-datasets-for-automl-images.md

Latest commit

History

how-to-prepare-datasets-for-automl-images.md

File metadata and controls

Prepare data for computer vision tasks with automated machine learning

Prerequisites

Get labeled data

Using Azure Machine Learning Data Labeling tool to label your training data

Using prelabeled training data from local machine

Using prelabeled training data from Azure Blob storage

Create MLTable

Next steps