# Chapter 6 Prep Model Creation & Registration - Model Deployment

## In this notebook we will:

  - Connect to your workspace.
  - Create a virtual environment and leverage in this notebook
  - Create Compute for running a job
  - Create a job
  - Configure your job
  - Run the command
  - Register the model

## Setting yourself up for success

- When creating a model, one of the major obstacles is having an environment that has the required dependencies.  We will create and register an AML environment and use on our compute instance.  This will allow us to leverage the model we build on a compute cluster on our compute instance.  The same packages and versions leveraged to build the model will be used to consume the model later in this notebook

Steps to setup our environment include:
- Connecting to our workspace
- Defining and registering the environment
- Making the environment available to our compute instance 
- Making the environment available to our jupyter notebook

Let's get started

Initially Select **Kernel** > **Change Kernel** > **Python 3.10 - SDK V2**

or if you already setup the virtual environment in Chapter 4:

Select **Kernel** > **Change Kernel** > **job_env**


In [1]:
#import required libraries
import pandas as pd
import time
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml.entities import Environment, BuildContext

In [2]:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace  = "<AML_WORKSPACE_NAME>"

## Connecting to your workspace

Once you connect to your workspace, you will create a new cpu target which you will provide an environment to.

- Configure your credential.  We are using `DefaultAzureCredential`.  It will request a token using multiple identities, stopping once a token is found

In [3]:
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

## Setup enviroment

### Creating environment from docker image with a conda YAML

Azure ML allows you to leverage curated environments, as well as to build your own environment from:

    - existing docker image
    - base docker image with a conda yml file to customize
    - a docker build content
    
We will proceed with creating an environment from a docker build plus a conda yml file.

In [4]:
import os
script_folder = os.path.join(os.getcwd(), "conda-yamls")
print(script_folder)
os.makedirs(script_folder, exist_ok=True)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/devamlcompute/code/Users/memasanz/Azure-Machine-Learning-Engineering/Chapter06/conda-yamls


### Create job environment in a yml file

The yml file below can be used to create the conda environment for running this notebook provided the kernel `job_env` is not currently available for you

In [5]:
%%writefile conda-yamls/job_env.yml
name: job_env
dependencies:
- python=3.10
- scikit-learn=1.1.3
- ipykernel
- matplotlib
- pandas
- pip
- pip:
  - mlflow==2.0.1
  - azure-ai-ml==1.1.2
  - mltable==1.0.0
  - azureml-mlflow==1.48.0

Writing conda-yamls/job_env.yml


The yml file below will be used for creating your model.  It is nearly the same as the job_env, but given our model will not be leveraging `mltable` we have excluded it from the model build environment.

In [6]:
%%writefile conda-yamls/job_env_for_build.yml
name: job_env
dependencies:
- python=3.10
- scikit-learn=1.1.3
- ipykernel
- matplotlib
- pandas
- pip
- pip:
  - mlflow==2.0.1
  - azure-ai-ml==1.1.2
  - azureml-mlflow==1.48.0

Writing conda-yamls/job_env_for_build.yml


### Use your virtual environment in this notebook

If you do not already have the virtual environment `job_env` available when you go to `Kernel` -> `Change Kernel`, you can follow the instructions below to upate your virtual enviornment available to your jupyter notebook.  If you created the virtual environment in **Chapter 4**, then you can use it now, else follow the instructions below to create the environment.


We can actually use that virtual environment on our compute instance and in this very jupyter notebook.
Open a terminal session, and cd into your conda-yamls folder and run the following commands:

```
cd Azure-Machine-Learning-Engineering/
cd Chapter06
cd conda-yamls/
conda env create -f job_env.yml
conda activate job_env
ipython kernel install --user --name job_env --display-name "job_env"
```
* After the environment has been made available to Jupyter, Refresh this session (F5, or Hit refresh on your browser)

When you go to your `Kernel` -> `Change Kernel`, it will be available to select.  You will have to rerun the notebook from the beginning, but when you download the model, you will be using all of the correct versions of libraries.

If you run the next cell, and you get an error message, `No module named 'sklearn'` that means that you did not setup the conda virtual environment acess mentioned here.

In [7]:
import sklearn
import mlflow
import azure.ai.ml
print ('sklearn: {}'. format (sklearn. __version__))
print('azure.ai.ml: {}'.format(azure.ai.ml._version.VERSION))

print("This notebook was created using sklearn: 1.1.3")
print("This notebook was created using azure.ai.ml: 1.1.2")

sklearn: 1.1.3
azure.ai.ml: 1.1.2
This notebook was created using sklearn: 1.1.3
This notebook was created using azure.ai.ml: 1.1.2


### Getting the most current and up-to-date base image

Default images are always changing.  
Note the base image is defined in the property `image` below.  These images are defined at [https://hub.docker.com/_/microsoft-azureml](https://hub.docker.com/_/microsoft-azureml)

The current image we have selected for this notebook is `mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04`, but based on image availability, that will change in the future.  In additon, note the python version specified in your conda environment file is `python=3.10`, as this will evolve over time as well. 

In [8]:
env_docker_conda = Environment(
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    conda_file="conda-yamls/job_env_for_build.yml",
    name="job_base_for_build_env",
    description="Environment created from a Docker image plus Conda environment.",
)
env = ml_client.environments.create_or_update(env_docker_conda)



In [9]:
print(env.name)
print(env.version)

job_base_for_build_env
1


In the previous chapter, you registered a dataset, if you have not already registered the dataset, it has beeen added to this chapter and will be registered below

In [10]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

try:
    registered_data_asset = ml_client.data.get(name='titanic_prepped', version=1)
    print('data asset is registered')
except:
    print('register data asset')
    my_data = Data(
        path="./prepped_data/titanic_prepped.csv",
        type=AssetTypes.URI_FILE,
        description="Titanic CSV",
        name="titanic_prepped",
        version="1",
    )

    ml_client.data.create_or_update(my_data)

data asset is registered


## Create Compute 

In [11]:
from azure.ai.ml.entities import AmlCompute

# specify aml compute name.
cpu_compute_target = "cpu-cluster"

try:
    ml_client.compute.get(cpu_compute_target)
except Exception:
    print("Creating a new cpu compute target...")
    compute = AmlCompute(
        name=cpu_compute_target, size="STANDARD_D2_V2", min_instances=0, max_instances=4, idle_time_before_scale_down = 3600
    )
    ml_client.compute.begin_create_or_update(compute)

## Creating code to generate Basic Model

We will first create a model using the job command

In [12]:
script_folder = os.path.join(os.getcwd(), "src")
print(script_folder)
os.makedirs(script_folder, exist_ok=True)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/devamlcompute/code/Users/memasanz/Azure-Machine-Learning-Engineering/Chapter06/src


## Create main.py file for running in your command

In [13]:
%%writefile ./src/main.py
import os
import argparse
import mlflow
import mlflow.sklearn
from mlflow.models import infer_signature
from mlflow.utils.environment import _mlflow_conda_env
from mlflow.tracking import MlflowClient
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler, LabelEncoder
from sklearn.metrics import roc_auc_score,roc_curve
from sklearn.metrics import accuracy_score, precision_score, recall_score


# define functions
def main(args):
    # enable auto logging
    current_run = mlflow.start_run()
    mlflow.sklearn.autolog(log_models=False)

    # read in data
    df = pd.read_csv(args.titanic_csv)
    model = model_train('Survived', df, args.randomstate)
    mlflow.end_run()

def model_train(LABEL, df, randomstate):
    print('df.columns = ')
    print(df.columns)
    
    df['Embarked'] = df['Embarked'].astype(object)
    df['Loc'] = df['Loc'].astype(object)
    df['Loc'] = df['Sex'].astype(object)
    df['Pclass'] = df['Pclass'].astype(float)
    df['Age'] = df['Age'].astype(float)
    df['Fare'] = df['Fare'].astype(float)
    df['GroupSize'] = df['GroupSize'].astype(float)

    y_raw           = df[LABEL]
    columns_to_keep = ['Embarked', 'Loc', 'Sex','Pclass', 'Age', 'Fare', 'GroupSize']
    X_raw           = df[columns_to_keep]

    print(X_raw.columns)
     # Train test split
    X_train, X_test, y_train, y_test = train_test_split(X_raw, y_raw, test_size=0.2, random_state=randomstate)
    
    #use Logistic Regression estimator from scikit learn
    lg = LogisticRegression(penalty='l2', C=1.0, solver='liblinear')
    preprocessor = buildpreprocessorpipeline(X_train)
    
    #estimator instance
    clf = Pipeline(steps=[('preprocessor', preprocessor),
                               ('regressor', lg)], verbose=True)

    model = clf.fit(X_train, y_train)
    
    print('type of X_test = ' + str(type(X_test)))
          
    y_pred = model.predict(X_test)
    
    print('*****X_test************')
    print(X_test)
    
    #get the active run.
    run = mlflow.active_run()
    print("Active run_id: {}".format(run.info.run_id))

    acc = model.score(X_test, y_test )
    print('Accuracy:', acc)
    MlflowClient().log_metric(run.info.run_id, "test_acc", acc)
    
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test,y_scores[:,1])
    print('AUC: ' , auc)
    MlflowClient().log_metric(run.info.run_id, "test_auc", auc)
    
    
    # Signature
    signature = infer_signature(X_test, y_test)

    # Conda environment
    custom_env =_mlflow_conda_env(
        additional_conda_deps=["scikit-learn==1.1.3"],
        additional_pip_deps=["mlflow<=1.30.0"],
        additional_conda_channels=None,
    )

    # Sample
    input_example = X_train.sample(n=1)

    # Log the model manually
    mlflow.sklearn.log_model(model, 
                             artifact_path="model", 
                             conda_env=custom_env,
                             signature=signature,
                             input_example=input_example)


    
    return model



def buildpreprocessorpipeline(X_raw):

    categorical_features = X_raw.select_dtypes(include=['object', 'bool']).columns
    numeric_features = X_raw.select_dtypes(include=['float','int64']).columns

    #categorical_features = ['Sex', 'Embarked', 'Loc']
    categorical_transformer = Pipeline(steps=[('onehotencoder', 
                                               OneHotEncoder(categories='auto', sparse=False, handle_unknown='ignore'))])


    #numeric_features = ['Pclass', 'Age', 'Fare', 'GroupSize']    
    numeric_transformer1 = Pipeline(steps=[('scaler1', SimpleImputer(missing_values=np.nan, strategy = 'mean'))])
    

    preprocessor = ColumnTransformer(
        transformers=[
            ('numeric1', numeric_transformer1, numeric_features),
            ('categorical', categorical_transformer, categorical_features)], remainder='drop')
    
    return preprocessor



def parse_args():
    # setup arg parser
    parser = argparse.ArgumentParser()

    # add arguments
    parser.add_argument("--titanic-csv", type=str)
    parser.add_argument("--randomstate", type=int, default=42)

    # parse args
    args = parser.parse_args()
    print(args)
    # return args
    return args


# run script
if __name__ == "__main__":
    # parse args
    args = parse_args()

    # run main function
    main(args)

Writing ./src/main.py


## Configure Command

- `display_name` display name for the job
- `description`  the description of the experiment
- `code` path where the code is located
- `command` command to run
- `inputs`  dictionary of name value pairs using `${{inputs.<input_name>}}`
    
    - To use files or folder - using the `Input` class
        
        - `type` defaults to a `uri_folder` but this can be set to `uri_file` or `uri_folder`
        - `path` is the path to the file or folder.  These can be local or remote leveraging **https, http, wasb`
        
            - To use an Azure ML dataset, this would be an Input `Input(type='uri_folder', path='my_dataset:1')`
            
            - `mode` is how the data should be delivered to the compute which include `ro_mount`(default), `rw_mount` and `download`

- `environment`: environment to be used by compute when running command
- `compute`: can be `local`, or a specificed compute name
- `distribution`: distribution to leverage for distributed training scenerios including:
        
    - `Pytorch`
    - `TensorFlow`
    - `MPI`
            

In [14]:
# create the command
from azure.ai.ml import command
from azure.ai.ml import Input

my_job = command(
    code="./src",  # local path where the code is stored
    command="python main.py --titanic ${{inputs.titanic}} --randomstate ${{inputs.randomstate}}",
    inputs={
        "titanic": Input(
            type="uri_file",
            path="azureml:titanic_prepped:1",
        ),
        "randomstate": 0,
    },
    environment="job_base_for_build_env@latest",
    compute="cpu-cluster",
    display_name="sklearn-titanic",
    # description,
    # experiment_name
)

In [15]:
script_folder = os.path.join(os.getcwd(), "job")
print(script_folder)
os.makedirs(script_folder, exist_ok=True)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/devamlcompute/code/Users/memasanz/Azure-Machine-Learning-Engineering/Chapter06/job


## Run Command with SDK

In [16]:
# submit the command
returned_job = ml_client.create_or_update(my_job)

[32mUploading src (0.0 MBs): 100%|██████████| 4621/4621 [00:00<00:00, 109746.43it/s]
[39m



### Register the Model 

Using the Python SDK V2 - we can register the Model for use.  

Parameters for model registration include:

- `path` - A remote uri or local path pointing at the model
- `name` - A string value
- `description` - A description for the model
- `type` - valid values include: 
    - "custom_model"
    - "mlflow_model" 
    - "triton_model".  
    
* Instead of typing out the `type`, you can use the AssetTypes in the namespace azure.ai.ml.constants as we have done below




In [17]:
run_id = returned_job.name
print('runid:' + run_id)
experiment = returned_job.experiment_name
print("experiment:" + experiment)

runid:cyan_hair_shrrkkx651
experiment:Chapter06


### Checking on job status

When the job is created, the image will be prepared, and pushed to your Azure Container Registry.  If the compute cluster is down, it will also be spun up, the image will be loaded onto the compute cluster, and the job will be started.  Initially, this image does not exist, so you will see that the first time you submit your job, it will take some time to complete, but future runs will be able to re-use this image and will start up right away provided your compute cluster is up

In [18]:
exp = mlflow.get_experiment_by_name(experiment)
last_run = mlflow.search_runs(exp.experiment_id, output_format="list")[-1]

if last_run.info.run_id != run_id:
    print('run ids were not the same - waiting for run id to update')
    time.sleep(5)
    exp = mlflow.get_experiment_by_name(experiment)
    last_run = mlflow.search_runs(exp.experiment_id, output_format="list")[-1]

while last_run.info.status == 'SCHEDULED':
  print('run is being scheduled')
  time.sleep(15)
  last_run = mlflow.search_runs(exp.experiment_id, output_format="list")[-1]

while last_run.info.status == 'RUNNING':
  print('job is being run')
  time.sleep(15)
  last_run = mlflow.search_runs(exp.experiment_id, output_format="list")[-1]

print("run_id:{}".format(last_run.info.run_id))
print('----------')
print("run_id:{}".format(last_run.info.status))

run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is being scheduled
run is bein

## Register the Model

In the next notebook we will get the model directly from the run, but you can register a model from a run as shown below, and review the model in your AMLS workspace

In [19]:
from azure.ai.ml.constants import ModelType
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

run_model = Model(
    path="azureml://jobs/" + last_run.info.run_id  + "/outputs/artifacts/paths/model/",
    name="chapter6_titanic_model",
    description="Model created from run.",
    type=AssetTypes.MLFLOW_MODEL
)

ml_client.models.create_or_update(run_model) 

Model({'job_name': 'cyan_hair_shrrkkx651', 'is_anonymous': False, 'auto_increment_version': False, 'name': 'chapter6_titanic_model', 'description': 'Model created from run.', 'tags': {}, 'properties': {}, 'id': '/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourceGroups/aml-dev-rg/providers/Microsoft.MachineLearningServices/workspaces/aml-ws/models/chapter6_titanic_model/versions/1', 'Resource__source_path': None, 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/devamlcompute/code/Users/memasanz/Azure-Machine-Learning-Engineering/Chapter06', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x7f66c04d8970>, 'serialize': <msrest.serialization.Serializer object at 0x7f66c04d8a60>, 'version': '1', 'latest_version': None, 'path': 'azureml://subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourceGroups/aml-dev-rg/workspaces/aml-ws/datastores/workspaceartifactstore/paths/ExperimentRun/dcid.cyan_hair_shrrkkx651/model', 'datastore': None, 'utc

In [20]:
run_model

Model({'job_name': None, 'is_anonymous': False, 'auto_increment_version': True, 'name': 'chapter6_titanic_model', 'description': 'Model created from run.', 'tags': {}, 'properties': {}, 'id': None, 'Resource__source_path': None, 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/devamlcompute/code/Users/memasanz/Azure-Machine-Learning-Engineering/Chapter06', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7f66c04bfca0>, 'version': None, 'latest_version': None, 'path': 'azureml://jobs/cyan_hair_shrrkkx651/outputs/artifacts/paths/model/', 'datastore': None, 'utc_time_created': None, 'flavors': None, 'arm_type': 'model_version', 'type': 'mlflow_model'})

In [21]:
run_model.path

'azureml://jobs/cyan_hair_shrrkkx651/outputs/artifacts/paths/model/'