# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [2]:
import logging
import os
import csv


import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset

from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

from azureml.pipeline.steps import AutoMLStep

from azureml.widgets import RunDetails

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.52.0


**Compute cluster**


In [25]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'project3'
project_folder = './automl-project3'

experiment=Experiment(ws, experiment_name)

amlcompute_cluster_name = "udacity"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           min_node_count = 1,
                                                           max_nodes=6)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

run = experiment.start_logging()

Found existing cluster, use it.


## Dataset

### Overview
The dataset is from kaggle and includes the C02 emissions of cars. The emission of CO2 differs e.g. with the number of cylinders, the engine size or the number of gears in the gearbox.

https://www.kaggle.com/code/bhuviranga/linear-regression-co2-emissions/input


Get the data from blobbstorage

In [26]:
# azureml-core of version 1.0.72 or higher is required
# azureml-dataprep[pandas] of version 1.1.34 or higher is required

subscription_id = '7c03dd83-6b95-43b1-9f53-23dfd07e8803'
resource_group = 'AZP-102-Temp_AI-RG'
workspace_name = 'AZP-102_Temp_AI_ML_POC'

workspace = Workspace(subscription_id, resource_group, workspace_name)

dataset = Dataset.get_by_name(workspace, name='udacityProject3')
df = dataset.to_pandas_dataframe()

In [58]:
cleanedDataset = dataset.drop_columns(['Make', 'Model', 'Vehicle Class', 'Fuel Type', 'Fuel Consumption Comb (L/100 km)','Fuel Consumption Comb (mpg)', 'Transmission'])

In [59]:
dfCleaned = cleanedDataset.to_pandas_dataframe()
dfCleaned

Unnamed: 0,Engine Size(L),Cylinders,Fuel Consumption City (L/100 km),Fuel Consumption Hwy (L/100 km),CO2 Emissions(g/km)
0,2.00,4,9.90,6.70,196
1,2.40,4,11.20,7.70,221
2,1.50,4,6.00,5.80,136
3,3.50,6,12.70,9.10,255
4,3.50,6,12.10,8.70,244
...,...,...,...,...,...
7380,2.00,4,10.70,7.70,219
7381,2.00,4,11.20,8.30,232
7382,2.00,4,11.70,8.60,240
7383,2.00,4,11.20,8.30,232


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [29]:
automl_settings = {    "experiment_timeout_minutes": 25,
                        "max_concurrent_iterations": 5,
                        "primary_metric" : 'r2_score', 
                        "additional_metrics":['mean_absolute_error', 'root_mean_squared_error']
                    }


automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "regression",
                             training_data=dataset,
                             label_column_name="CO2 Emissions(g/km)",   
                             path = project_folder,
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             enable_voting_ensemble=False,
                             enable_stack_ensemble=False,
                             **automl_settings
                            )

In [31]:
# TODO: Submit your experiment
automl_run = experiment.submit(automl_config) 
automl_run.wait_for_completion(show_output=True)



Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
project3,AutoML_61370fab-2ee2-4b00-861a-b5d16fb17840,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


Experiment,Id,Type,Status,Details Page,Docs Page
project3,AutoML_61370fab-2ee2-4b00-861a-b5d16fb17840,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  In order to accurately evaluate the model(s) trained by AutoML, we leverage a dataset that the model is not trained on. Hence, if the user doesn't provide an explicit validation dataset, a part of the training dataset is used to achieve this. For smaller datasets (fewer than 20,000 samples), cross-validation is leveraged, else a single hold-out set is split from the training data to serve as the validation dataset. Hence, for your input data we leverage cross-validation with 10 folds, if the number o



{'runId': 'AutoML_61370fab-2ee2-4b00-861a-b5d16fb17840',
 'target': 'udacity',
 'status': 'Completed',
 'startTimeUtc': '2023-08-09T10:03:32.121357Z',
 'endTimeUtc': '2023-08-09T10:21:08.181329Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'r2_score',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'udacity',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"c163a65e-49af-4b56-901e-aa52133f237a\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'regression',
  'dependencies_versions': '{"azureml-dataprep-native": "3

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [32]:
RunDetails(automl_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…



## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [47]:
# Get the best model
#best_run, fitted_model = automl_run.get_output()
best_run, fitted_model = automl_run.get_output()

# Print the details of the best model
print("Best run details:", best_run)
print("Best model:", fitted_model)



  If you are loading a serialized model (like pickle in Python, RDS in R) generated by
  older XGBoost, please export the model by calling `Booster.save_model` from that version
  first, then load it back in current version. See:

    https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html

  for more details about differences between saving model and serializing.

  If you are loading a serialized model (like pickle in Python, RDS in R) generated by
  older XGBoost, please export the model by calling `Booster.save_model` from that version
  first, then load it back in current version. See:

    https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html

  for more details about differences between saving model and serializing.

Best run details: Run(Experiment: project3,
Id: AutoML_61370fab-2ee2-4b00-861a-b5d16fb17840_14,
Type: None,
Status: Completed)
Best model: RegressionPipeline(pipeline=Pipeline(memory=None,
                                     steps=[('dat

In [52]:
ws

Workspace.create(name='AZP-102_Temp_AI_ML_POC', subscription_id='7c03dd83-6b95-43b1-9f53-23dfd07e8803', resource_group='azp-102-temp_ai-rg')

In [51]:
from azureml.core import Workspace, Model
# Replace 'your_model_id' with the actual model ID
model_id = 'AutoML_61370fab-2ee2-4b00-861a-b5d16fb17840_14'

# Get the model object
model = Model(ws, model_id)

model.download(target_dir='./automl_best_model', exist_ok=True)

print("Model downloaded successfully")

WebserviceException: WebserviceException:
	Message: ModelNotFound: Model with name AutoML_61370fab-2ee2-4b00-861a-b5d16fb17840_14 not found in provided workspace
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "ModelNotFound: Model with name AutoML_61370fab-2ee2-4b00-861a-b5d16fb17840_14 not found in provided workspace"
    }
}

In [38]:
#TODO: Save the best model
model_name = 'best_automl_model'
fitted_model.save(model_name)
print("Saved model: {}".format(model_name))

AttributeError: 'RegressionPipeline' object has no attribute 'save'

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
