# Train a classification model with Automated Machine Learning

There are many kinds of machine learning algorithm that you can use to train a model, and sometimes it's not easy to determine the most effective algorithm for your particular data and prediction requirements. Additionally, you can significantly affect the predictive performance of a model by preprocessing the training data, using techniques such as normalization, missing feature imputation, and others. In your quest to find the best model for your requirements, you may need to try many combinations of algorithms and preprocessing transformations; which takes a lot of time and compute resources.

Azure Machine Learning enables you to automate the comparison of models trained using different algorithms and preprocessing options. You can use the visual interface in [Azure Machine Learning Studio](https://ml/azure.com) or the Python SDK (v2) to leverage this capability. The Python SDK gives you greater control over the settings for the automated machine learning job, but the visual interface is easier to use.

## Before you start

You'll need the latest version of the  **azure-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.


In [1]:
!pip install azure-ai-ml
!pip show azure-ai-ml

Collecting azure-ai-ml
  Downloading azure_ai_ml-1.24.0-py3-none-any.whl (12.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m93.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting azure-storage-file-datalake>=12.2.0
  Downloading azure_storage_file_datalake-12.18.1-py3-none-any.whl (258 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.3/258.3 kB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m
Collecting azure-monitor-opentelemetry
  Downloading azure_monitor_opentelemetry-1.6.4-py3-none-any.whl (23 kB)
Collecting marshmallow>=3.5
  Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
Collecting pydash>=6.0.0
  Downloading pydash-8.0.5-py3-none-any.whl (102 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.1/102.1 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
Collecting strict

## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

To connect to a workspace, we need identifier parameters - a subscription ID, resource group name, and workspace name. Since you're working with a compute instance, managed by Azure Machine Learning, you can use the default values to connect to the workspace.

In [2]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

In [3]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

Found the config file in: /config.json


## Prepare data

You don't need to create a training script for automated machine learning, but you do need to load the training data. 

In this case, I use a dataset containing details of accident victims. 

To pass a dataset as an input to an automated machine learning job, the data must be in tabular form and include a target column. For the data to be interpreted as a tabular dataset, the input dataset must be a **MLTable**.

A MLTable data asset will be created during set-up. You can explore the data asset by navigating to the **Data** page. You'll retrieve the data asset here by specifying its name  and version `1`. 

In [4]:
import pandas as pd
data = pd.read_csv('accident.csv')
data.head()

Unnamed: 0,Age,Gender,Speed_of_Impact,Helmet_Used,Seatbelt_Used,Survived
0,56,Female,27.0,No,No,1
1,69,Female,46.0,No,Yes,1
2,46,Male,46.0,Yes,Yes,0
3,32,Male,117.0,No,Yes,0
4,60,Female,40.0,Yes,Yes,0


In [5]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
local_path = './'

my_data = Data(
    path=local_path,
    type=AssetTypes.MLTABLE,
    description="MLTable pointing to accident.csv in the folder",
    name="accident-training"   
)

ml_client.data.create_or_update(my_data)

[32mUploading 06 (5.66 MBs):   0%|          | 0/5664955 [00:00<?, ?it/s][32mUploading 06 (5.66 MBs): 100%|██████████| 5664955/5664955 [00:00<00:00, 38034578.37it/s][32mUploading 06 (5.66 MBs): 100%|██████████| 5664955/5664955 [00:00<00:00, 37743365.94it/s]
[39m



Data({'path': 'azureml://subscriptions/cda9116f-5326-4a9b-9407-bc3a4391c27c/resourcegroups/rg-dp100/workspaces/gabbyworkout/datastores/workspaceblobstore/paths/LocalUpload/a70919a9c0826f32cdb2fac14909a725/06/', 'skip_validation': False, 'mltable_schema_url': None, 'referenced_uris': ['./accident.csv'], 'type': 'mltable', 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'accident-training', 'description': 'MLTable pointing to accident.csv in the folder', 'tags': {}, 'properties': {}, 'print_as_yaml': False, 'id': '/subscriptions/cda9116f-5326-4a9b-9407-bc3a4391c27c/resourceGroups/rg-dp100/providers/Microsoft.MachineLearningServices/workspaces/gabbyworkout/data/accident-training/versions/1', 'Resource__source_path': '', 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/captgt0071/code/Users/captgt007/azure-ml-labs/Labs/06', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x7f924d287b20>, 'serialize': <msr

In [6]:
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml import Input

# creates a dataset based on the files in the local data folder
my_training_data_input = Input(type=AssetTypes.MLTABLE, path="azureml:accident-training:1")

In [7]:
my_training_data_input

{'type': 'mltable', 'path': 'azureml:accident-training:1'}

## Configure automated machine learning job

Now, you're ready to configure the automated machine learning experiment.

When you run the code below, it will create an automated machine learning job that:

- Uses the compute cluster named `captgt0071`
- Sets `survived` as the target column
- Sets `accuracy` as the primary metric
- Times out after `60` minutes of total training time 
- Trains a maximum of `5` models
- No model will be trained with the `LogisticRegression` algorithm

In [8]:
from azure.ai.ml import automl

# configure the classification job
classification_job = automl.classification(
    compute="captgt0071",
    experiment_name="auto-ml-class-dev",
    training_data=my_training_data_input,
    target_column_name="Survived",
    primary_metric="accuracy",
    n_cross_validations=5,
    enable_model_explainability=True
)

# set the limits (optional)
classification_job.set_limits(
    timeout_minutes=60, 
    trial_timeout_minutes=20, 
    max_trials=5,
    enable_early_termination=True,
)

# set the training properties (optional)
classification_job.set_training(
    blocked_training_algorithms=["LogisticRegression"], 
    enable_onnx_compatible_models=True
)

## Run an automated machine learning job

OK, you're ready to go. Let's run the automated machine learning experiment.

> **Note**: This may take some time!

In [9]:
# Submit the AutoML job
returned_job = ml_client.jobs.create_or_update(
    classification_job
)  

# submit the job to the backend
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

Monitor your job at https://ml.azure.com/runs/lemon_feijoa_vnnqbxhh83?wsid=/subscriptions/cda9116f-5326-4a9b-9407-bc3a4391c27c/resourcegroups/rg-dp100/workspaces/gabbyworkout&tid=aef6e45c-850f-4f38-a10b-1df3ad33cdb0


While the job is running, you can monitor it in the Studio.