# Run a training script with the Python SDK

You can use the Python SDK for Azure Machine Learning to submit scripts as jobs. By using jobs, you can easily keep track of the input parameters and outputs when training a machine learning model.

## Before you start

You'll need the latest version of the **azure-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.

In [1]:
pip show azure-ai-ml

Name: azure-ai-ml
Version: 1.22.4
Summary: Microsoft Azure Machine Learning Client Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python
Author: Microsoft Corporation
Author-email: azuresdkengsysadmins@microsoft.com
License: MIT License
Location: c:\users\alienware\miniconda3\envs\py310\lib\site-packages
Requires: azure-common, azure-core, azure-mgmt-core, azure-storage-blob, azure-storage-file-datalake, azure-storage-file-share, colorama, isodate, jsonschema, marshmallow, msrest, opencensus-ext-azure, opencensus-ext-logging, pydash, pyjwt, pyyaml, strictyaml, tqdm, typing-extensions
Required-by: 
Note: you may need to restart the kernel to use updated packages.




In [2]:
#### imports here ####

from azure.ai.ml.entities import Data
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential, AzureCliCredential
from azure.ai.ml import MLClient, command, Input

import configparser
import os

## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

To connect to a workspace, we need identifier parameters - a subscription ID, resource group name, and workspace name. Since you're working with a compute instance, managed by Azure Machine Learning, you can use the default values to connect to the workspace.

In [3]:
# Load environment variables from config.ini file
config = configparser.ConfigParser()

# Use an absolute path to avoid path issues
config_file_path = "G:/My Drive/Ingegneria/Data Science GD/My-Practice/my models/Azure ML/config.ini"
config.read(config_file_path)

# Check if the 'azure' section exists
if 'azure' in config:
    os.environ['AZURE_CLIENT_ID'] = config['azure']['client_id']
    os.environ['AZURE_CLIENT_SECRET'] = config['azure']['client_secret']
    os.environ['AZURE_TENANT_ID'] = config['azure']['tenant_id']
    os.environ['AZURE_SUBSCRIPTION_ID'] = config['azure']['subscription_id']

    # Attempt to use DefaultAzureCredential
    try:
        credential = DefaultAzureCredential()
        credential.get_token("https://management.azure.com/.default")
    except Exception as ex:
        print("DefaultAzureCredential failed, falling back to InteractiveBrowserCredential.")
        credential = InteractiveBrowserCredential()

    # Initialize MLClient with the obtained credential
    ml_client = MLClient(
        credential=credential,
        subscription_id=os.environ['AZURE_SUBSCRIPTION_ID'],
        resource_group_name="rg-dp100-labs",
        workspace_name="mlw-dp100-labs"
    )

    # List workspaces to verify the connection
    try:
        workspaces = ml_client.workspaces.list()
        for ws in workspaces:
            print(ws.name)
    except Exception as e:
        print(f"Failed to list workspaces: {e}")

else:
    print("The 'azure' section is missing in the config.ini file.")


mlw-dp100-labs


## Use the Python SDK to train a model

To train a model, you'll first create the **diabetes_training.py** script in the **src** folder. The script uses the **diabetes.csv** file in the same folder as the training data.

In [4]:
%%writefile src/diabetes-training.py
# import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# load the diabetes dataset
print("Loading Data...")
diabetes = pd.read_csv('src\diabetes.csv')

# separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# set regularization hyperparameter
reg = 0.01

# train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))


Overwriting src/diabetes-training.py


Run the cell below to submit the job that trains a classification model to predict diabetes. 

In [10]:
# Run the script using %run
%run src/diabetes-training.py # modify the path of the diabetes.cvs in the py file, before uploading them to azure: diabetes = pd.read_csv('src/diabetes.csv')

Loading Data...
Training a logistic regression model with regularization rate of 0.01
Accuracy: 0.774
AUC: 0.8484909696999852


In [13]:
# Create a Data asset for the entire src directory
data_asset = Data(
    path="./src",  # Local path to your src directory
    type="uri_folder",  # Type of data asset as a folder
    name="src-folder-dataset",
    version="2"  # Ensure version is a string
)

# Upload the Data asset
ml_client.data.create_or_update(data_asset)

Data({'path': 'azureml://subscriptions/a90ed0cd-b0b9-4e3a-bd85-67272a44de15/resourcegroups/rg-dp100-labs/workspaces/mlw-dp100-labs/datastores/workspaceblobstore/paths/LocalUpload/262cdbadf171d4037c6e97e879d03619/src/', 'skip_validation': False, 'mltable_schema_url': None, 'referenced_uris': None, 'type': 'uri_folder', 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'src-folder-dataset', 'description': None, 'tags': {}, 'properties': {}, 'print_as_yaml': False, 'id': '/subscriptions/a90ed0cd-b0b9-4e3a-bd85-67272a44de15/resourceGroups/rg-dp100-labs/providers/Microsoft.MachineLearningServices/workspaces/mlw-dp100-labs/data/src-folder-dataset/versions/2', 'Resource__source_path': '', 'base_path': 'G:\\My Drive\\Ingegneria\\Data Science GD\\My-Practice\\my models\\Azure ML\\azure-ml-labs\\Labs\\02', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x0000013A6EC32C50>, 'serialize': <msrest.serialization.Serializer object

In [15]:
# Verify the dataset exists and retrieve the path
dataset = ml_client.data.get(name="src-folder-dataset", version="2")
print("Dataset path:", dataset.path)


Dataset path: azureml://subscriptions/a90ed0cd-b0b9-4e3a-bd85-67272a44de15/resourcegroups/rg-dp100-labs/workspaces/mlw-dp100-labs/datastores/workspaceblobstore/paths/LocalUpload/262cdbadf171d4037c6e97e879d03619/src/


In [16]:
# Define the job
job = command(
    code="./src",
    command="python diabetes-training.py --data ${{inputs.diabetes_data}}",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="diabetes-pythonv2-train",
    experiment_name="diabetes-training",
    inputs={
        "diabetes_data": Input(type="uri_file", path=dataset.path)  # Use the verified dataset path
    }
)

#then submit the job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

# then check the job and the logs

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
[32mUploading src (0.53 MBs): 100%|##

Monitor your job at https://ml.azure.com/runs/bubbly_bucket_xvxz2m8xml?wsid=/subscriptions/a90ed0cd-b0b9-4e3a-bd85-67272a44de15/resourcegroups/rg-dp100-labs/workspaces/mlw-dp100-labs&tid=8bfc37bf-8e21-4420-841d-49303c72ec1a
