# Diabetes classification - Auto-MachineLearning

This is a basic example of supervised learning classification using tabular data.

File information:

* **File name**: diabetes.csv
* **Features**:
  * **PatientID**: ID
  * **Pregnancie**: Number of times pregnant
  * **PlasmaGlucose**: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  * **DiastolicBloodPressure**: Diastolic blood pressure (mm Hg)
  * **TricepsThickness**: Triceps skin fold thickness (mm)
  * **SerumInsulin**: 2-Hour serum insulin (mu U/ml)
  * **BMI**: Body mass index (weight in kg/(height in m)^2)
  * **DiabetesPedigree**: Diabetes pedigree function
  * **Age**: years
* **Target variable**:
  * **Diabetic**: 0 (no diabetic) or 1 (diabetic)

## 1. Login

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
ml_client = MLClient.from_config(credential = credential)

## 2. Prepare data

The data is stored in an storage account (anonymous access allowed). So, in order to make easy to get access, a data asset type file is used.

In [None]:
import mltable
from mltable import MLTableHeaders, MLTableFileEncoding, DataType
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

### 2.1. Create data asset (MLTable)

In [None]:
# Identify storage location
storage_account = "stai20240731"
container_name = "tabulardata"
file_name = "diabetes.csv"
path = f"wasbs://{container_name}@{storage_account}.blob.core.windows.net/{file_name}"

# Create path for data files
paths = [{"file": path}]

# Create schema as an MLTable
tbl = mltable.from_delimited_files(
    paths = paths,
    delimiter = ",",
    header = MLTableHeaders.all_files_same_headers,
    infer_column_types = True,
    include_path_column = False,
    encoding = MLTableFileEncoding.utf8
)

# Drop columns that does not help for training
tbl = tbl.drop_columns(["PatientID"])

# Show the first few records
print(tbl.show(5))

# Save the data loading steps in an MLTable file
mltable_folder = "./diabetes"
tbl.save(mltable_folder)

# Define the Data asset object
data_asset_name = "diabetes_tabular_mltable"
data_asset_version = "1.0"

my_data = Data(
    path = mltable_folder,
    type = AssetTypes.MLTABLE,
    description = "Diabetes dataset MLTable",
    name = data_asset_name,
    version = data_asset_version
)

# Create the data asset in the workspace
ml_client.data.create_or_update(my_data)

### 2.2. Read data asset

In [None]:
# Get data asset
data_asset = ml_client.data.get(name = data_asset_name, version = data_asset_version)

# Read data asset
tbl = mltable.load(f"azureml:/{data_asset.id}")
df = tbl.to_pandas_dataframe()
df.head()

In [None]:
df.info()

## 3. Create compute resource

Auto-machine needs a compute cluster to work.

In [None]:
from azure.ai.ml.entities import AmlCompute

cc_name = "cc-standard-DS3-v2"

cluster_basic = AmlCompute(
    name = cc_name,
    type = "amlcompute",
    size = "STANDARD_DS3_v2",
    location = "eastus2",
    min_instances = 0,
    max_instances = 2,
    idle_time_before_scale_down = 120,
    tier = "dedicated",
)

ml_client.begin_create_or_update(cluster_basic).result()

## 4. Project code

In [None]:
from azure.ai.ml import automl
from azure.ai.ml import Input

In [None]:
# Get data asset
training_data_input = Input(type = AssetTypes.MLTABLE, path = f"azureml:{data_asset_name}:{data_asset_version}")

In [None]:
# Configure the classification job

classification_job = automl.classification(
    compute = cc_name,
    experiment_name = "automl-diabetes-classification",
    training_data = training_data_input,
    target_column_name = "Diabetic",
    primary_metric = "accuracy",
    n_cross_validations = 5,
    enable_model_explainability = True
)

In [None]:
# Set the limits
# Min iterations = 4

classification_job.set_limits(
    timeout_minutes = 20, 
    trial_timeout_minutes = 10, 
    max_trials = 5,
    enable_early_termination = True,
)

In [None]:
# Set the training properties

classification_job.set_training(
    allowed_training_algorithms = ["LogisticRegression", "DecisionTree"], 
    enable_onnx_compatible_models = True
)

In [None]:
# Submit the AutoML job
returned_job = ml_client.jobs.create_or_update(
    classification_job
)  

# Submit the job to the backend
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)