Licensed under the MIT License.
# Train scikit-learn models with Azure Machine Learning Python SDK

<b>Note:</b>

Select Kernel = Python 3.6 - Azure ML when prompted.

In this article, learn how to run your scikit-learn training scripts with Azure Machine Learning.

The example scripts in this article are used to classify iris flower images to build a machine learning model based on scikit-learn's iris dataset(https://archive.ics.uci.edu/ml/datasets/iris)

Whether you're training a machine learning scikit-learn model from the ground-up or you're bringing an existing model into the cloud, you can use Azure Machine Learning to scale out open-source training jobs using elastic cloud compute resources. You can build, deploy, version, and monitor production-grade models with Azure Machine Learning.

## Create Workspace Config File

Note: Skip this section if using Compute Instance. Go to <b>Initalise a workspace</b>.

The workspace configuration file is a JSON file that tells the SDK how to communicate with your Azure Machine Learning workspace. 
The file is named config.json, and it has the following format:

![workspace-config-image](.././Images/3.png "workspace-config")

You can create a config.json file manually with the above format or can directly download the file from Machine Learning workspace as shown below: 
<br><br>
Enter https://portal.azure.com in web browser and sign in.
<br><br>
Click Resource groups -> rg-user-XX -> aml-workspace-XX
<br><br>
Click Download config.json

![workspace-config-download-image](.././Images/4.png "workspace-config-download")

Save the config.json file within the Azure ML Labs directory.

![workspace-config-save-image](.././Images/5.png "workspace-config-save")

Note: Please skip this step if already done.

## Initialize a workspace

Note: Ensure that Microsoft Edge was selected as the default browser before running cell below.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

Note: This code snippet expects the workspace configuration to be saved in the current directory or its parent.

You will get pop-up to sign into Azure ML Service. Enter your credentials and you will see message below. Close the browser and return to Jupyter notebook for Lab-03

![login-image](.././Images/15.png "Login")

## Define custom environment - Create YAML file

Define conda dependencies in a YAML file, save this file with name conda_dependencies.yml within Lab-03 directory.

Note: Please use VSCode editor to create the YAML file.

![conda-yaml-image](.././Images/6.png "conda-yaml")

## Create custom environment

In [None]:
from azureml.core import Environment

sklearn_env = Environment.from_conda_specification(name='sklearn-env', file_path='.\conda_dependencies.yml')

## Prepare training script

1. Review train_iris.py training script file using Visual Studio Code.

2. Complete the missing bits in the training code.

Note: The incomplete script will result in experiment run to fail.

## Create a ScriptRunConfig

Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. Any arguments to your training script will be passed via command line if specified in the arguments parameter.

Replace my-cluster-name with the name of compute cluster.

In [None]:
from azureml.core import ScriptRunConfig

compute_target = ws.compute_targets['<my-cluster-name>']   #Replace <my-cluster-name> with actual AML Cluster name from your Compute list.
src = ScriptRunConfig(source_directory='.',
                      script='train_iris.py',
                      arguments=['--kernel', 'linear', '--penalty', 1.0],
                      compute_target=compute_target,
                      environment=sklearn_env)

## Submit Experiment Run

As the run is executed, it goes through the following stages:

- Preparing: A docker image is created according to the environment defined. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress. If a curated environment is specified instead, the cached image backing that curated environment will be used.

- Scaling: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available.

- Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.

- Post-Processing: The ./outputs folder of the run is copied over to the run history.

In [None]:
from azureml.core import Experiment

run = Experiment(ws,'Lab-03-PythonSDK-Iris').submit(src)
run.wait_for_completion(show_output=True)

## Review the Experiment log in Azure ML Studio

1. Open Azure ML Studio (https://ml.azure.com) and navigate to Experiments.
2. Click on Lab-03-PythonSDK-Iris experiment to view the completed run.
3. Select the latest run number
4. Explore Outputs + logs (Trained model is saved with name model.joblib)
5. Explore Snapshot. You will see the content of the Lab-03 directory. Click Lab-03-PythonSDK-Iris and review the code that was run.

## Save and register the model

Once you've trained the model, you can save and register it to your workspace. Model registration lets you store and version your models in your workspace to simplify model management and deployment.

Register the model to your workspace with the following code. By specifying the parameters model_framework, model_framework_version, and resource_configuration, no-code model deployment becomes available. No-code model deployment allows you to directly deploy your model as a web service from the registered model, and the ResourceConfiguration object defines the compute resource for the web service.

In [None]:
from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration

model = run.register_model(model_name='Lab-03-sklearn-iris', 
                           model_path='outputs/model.joblib',
                           model_framework=Model.Framework.SCIKITLEARN,
                           model_framework_version='0.19.1',
                           resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))

View the registered model:
1. Open Azure ML Studio ((https://ml.azure.com)
2. Click on Models and view details of the Lab-03-sklearn-iris model that you have recently registered.

### --- End ---

In [7]:
#Increase width
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))