# TRAIN AND DEPLOY HOUSE PRICE PREDICTION MODEL, WITH AZURE MACHINE LEARNING SDK

## This file includes the following:
Create & publish pipeline, using Azure Machine Learning SDK

Train & deploy model, using Azure Machine Learning SDK 

#### REFERENCE DOCS:
https://docs.microsoft.com/en-us/learn/



In [1]:
import azureml.core
from azureml.core import Workspace, Dataset, Environment, ScriptRunConfig, Experiment

from azureml.core.compute import ComputeTarget, AmlCompute


Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (pywin32 302 (d:\anaconda\lib\site-packages), Requirement.parse('pywin32==227; sys_platform == "win32"'), {'docker'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (pywin32 302 (d:\anaconda\lib\site-packages), Requirement.parse('pywin32==227; sys_platform == "win32"'), {'docker'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (pywin32 302 (d:\anaconda\lib\site-packages), Requirement.parse('pywin32==227; sys_platform == "win32"'), {'docker'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (pywin32 3

#### GET WORKSPACE

In [2]:
ws = Workspace.get(name='aml_workspace',
                   subscription_id='0e1c43f5-07fe-4a3e-a2be-743edb639c94',
                   resource_group='aml_resource_group')

#### UPLOAD DATA TO A DATASTORE

In [3]:

# Get the default datastore
default_ds= ws.get_default_datastore()

# Enumerate all datastores, indicating which is the default
for ds_name in ws.datastores:
    print(ds_name, "- Default =", ds_name == default_ds.name)
    
# Upload data to a datastore
default_ds.upload_files(files=['d:/my_functions/house-prices-advanced-regression-techniques/train.csv'], 
                         overwrite=True, show_progress=True )


#Create a tabular dataset from the path on the datastore (this may take a short while)
tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'train.csv'))

# Register the tabular dataset
tab_data_set = tab_data_set.register(workspace=ws, 
                                        name='housing dataset',
                                        )


"datastore.upload_files" is deprecated after version 1.0.69. Please use "FileDatasetFactory.upload_directory" instead. See Dataset API change notice at https://aka.ms/dataset-deprecation.


workspaceblobstore - Default = True
workspacefilestore - Default = False
workspaceartifactstore - Default = False
workspaceworkingdirectory - Default = False
Uploading an estimated of 1 files
Uploading d:/my_functions/house-prices-advanced-regression-techniques/train.csv
Uploaded d:/my_functions/house-prices-advanced-regression-techniques/train.csv, 1 files out of an estimated total of 1
Uploaded 1 files


#### CREATE COMPUTE

In [4]:


# Specify a name for the compute (unique within the workspace)
compute_name = 'aml-cluster'

# Define compute configuration
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2',
                                                       min_nodes=0, max_nodes=4,
                                                       vm_priority='dedicated')

# Create the compute
aml_cluster = ComputeTarget.create(ws, compute_name, compute_config)
aml_cluster.wait_for_completion(show_output=True)


SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


#### CREATE ENVIRONMENT 

In [5]:

#Creating an environment by specifying packages

from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

env = Environment('training_environment')
deps = CondaDependencies.create(conda_packages=['scikit-learn','pandas','numpy','matplotlib'],
                                pip_packages=['azureml-defaults'])
env.python.conda_dependencies = deps

#Register environment
env.register(workspace=ws)



{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20211124.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "training_environment",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-

#### CREATE SCRIPTS FOR PIPELINE STEPS

First, let's create a folder for the script files we'll use in the pipeline steps.

In [11]:
import os
# Create a folder for the pipeline step files

os.makedirs('azure_pipeline', exist_ok=True)


Now let's create the first script, which will read data from the  dataset and apply some simple pre-processing to remove any rows with missing data and normalize the numeric features so they're on a similar scale.

The script includes a argument named --prepped-data, which references the folder where the resulting data should be saved.

In [26]:
%%writefile 'azure_pipeline/perp_data.py'

# Import libraries
import os
import argparse
import pandas as pd
from azureml.core import Run
from sklearn.preprocessing import StandardScaler
from pandas.api.types import is_numeric_dtype

# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument('--ds', type=str, dest='dataset_id', help='raw dataset')
parser.add_argument('--prepped-data', type=str, dest='prepped_data',  help='Folder for results')
args = parser.parse_args()
save_folder = args.prepped_data

# Get the experiment run context
run = Run.get_context()

# load the data (passed as an input dataset)
print("Loading Data...")
dataset = run.input_datasets['my_dataset']
train = dataset.to_pandas_dataframe()

# Log raw row count
row_count = (len(train))
run.log('raw_rows', row_count)

# Add features
train['HouseAge']=train['YrSold']-train['YearBuilt']
train['GarageAge']=train['YrSold']-train['GarageYrBlt']
train['RemodAge']=train['YearRemodAdd']-train['YearBuilt']

#Remove columns
dropcols=[ 'YearBuilt', 'YearRemodAdd','GarageYrBlt','YrSold', '3SsnPorch','ScreenPorch', 'MiscVal', 'LowQualFinSF', 'MasVnrArea']
train= train.drop(dropcols, axis=1)


# Split in categorial and numerical
def feature_list(df, dropcols,target, ordlist_cutoff=16):
    """
    Categorising features columns
    Arguments: 
    df - - dataframe
    dropcols - - list/single entry of columns that are not considered
    ordlist_cutoff - - a number that maximum number of uniques values to be considered for ordinal list
    Returns: numcollist and catcollist corresponding to numeric and categorical dtypes""" 

    numcollist=[col for col in df.drop(dropcols, axis=1).columns if is_numeric_dtype(df[col]) ]
    ordlist=[col for col in numcollist if df[col].nunique() <= ordlist_cutoff ]
    for col in ordlist:
        numcollist.remove(col)
    catcollist=[col for col in df.drop(dropcols, axis=1).columns if not is_numeric_dtype(df[col])]

    catcollist=catcollist+ ordlist
    numcollist.remove(target)
    return numcollist, catcollist

df=train
target='SalePrice'
numcollist, catcollist= feature_list(df, 'Id', target,ordlist_cutoff=16)

# DROP OUTLIERS FOR TRAIN SET
idx1=train[((train['LotArea']- train['LotArea'].mean())/train['LotArea'].std())>4].index.to_list()
idx2=train[((train['LotFrontage']- train['LotFrontage'].mean())/train['LotFrontage'].std())>4].index.to_list()
idx3=train[((train['EnclosedPorch']- train['EnclosedPorch'].mean())/train['EnclosedPorch'].std())>8].index.to_list()
idx4=train[((train['GarageArea']- train['GarageArea'].mean())/train['GarageArea'].std())>3.5].index.to_list()
idx5=train[(((train['GrLivArea']- train['GrLivArea'].mean())/train['GrLivArea'].std())>4.5) &( train['SalePrice']<300000)].index.to_list()
idx6=train[((train['1stFlrSF']- train['1stFlrSF'].mean())/train['1stFlrSF'].std())>8].index.to_list()
idx7=train[((train['TotalBsmtSF']- train['TotalBsmtSF'].mean())/train['TotalBsmtSF'].std())>10].index.to_list()
idx8=train[((train['BsmtFinSF2']- train['BsmtFinSF2'].mean())/train['BsmtFinSF2'].std())>8].index.to_list()
idx9=train[((train['BsmtFinSF1']- train['BsmtFinSF1'].mean())/train['BsmtFinSF1'].std())>10].index.to_list()
drop_rows_idx = set(idx1+idx2+idx3+idx4+idx5+idx6+idx7+idx8+idx9)

train=train.drop(drop_rows_idx, axis=0)

# Remove nulls
for col in numcollist:
    train[col]=train[col].fillna(train[col].mean())

# For now we work with the numerical features
# Normalize the numeric columns
scaler = StandardScaler()
train[numcollist]= scaler.fit_transform(train[numcollist])

# Log processed rows
row_count = (len(train))
run.log('processed_rows', row_count)

numcollist.append('SalePrice')

# Save the prepped data
print("Saving Data...")
os.makedirs(save_folder, exist_ok=True)
save_path = os.path.join(save_folder,'data.csv')
train[numcollist].to_csv(save_path, index=False, header=True)


# End the run
run.complete()

Overwriting azure_pipeline/perp_data.py


Now you can create the script for the second step, which will train a model. The script includes a argument named --training-data, which references the location where the prepared data was saved by the previous step.

In [27]:
%%writefile 'azure_pipeline/training_data.py'


# Import libraries
from azureml.core import Run, Model
import argparse
import pandas as pd
import numpy as np
import joblib
import os
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt


# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument("--training-data", type=str, dest='training_data', help='training data')
args = parser.parse_args()
training_data = args.training_data


# Get the experiment run context
run = Run.get_context()

# load the prepared data file in the training folder
print("Loading Data...")
file_path = os.path.join(training_data,'data.csv')
#shutil.copy('data/diabetes.csv',  file_path)
train = pd.read_csv(file_path)


# Separate features and labels
X=train.drop(['SalePrice'], axis=1)
y=train['SalePrice']

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a GB model
print('Training a Gradient Boosting Regressor model...')
model = GradientBoostingRegressor().fit(X_train, y_train)

# evaluate metrics
test_pred = model.predict(X_test)
rmse=np.sqrt(mean_squared_error(y_test, test_pred))
print('Root Mean Squared Error: ', rmse)
run.log('Root Mean Squared Error', rmse)

r_square= r2_score(y_test, test_pred)
print('R square: ', r_square)
run.log('R square', r_square)


# Plot predicted vs actual
fig = plt.figure(figsize=(6, 4))
plt.scatter(y_test, test_pred)
plt.xlabel('Actual Labels')
plt.ylabel('Predicted Labels')
plt.title('House Price Predictions')
plt.plot(y_test,y_test, color='magenta')

run.log_image(name = "Predictions", plot = fig)
plt.show()

# Save the trained model in the outputs folder
print("Saving model...")
os.makedirs('outputs', exist_ok=True)
model_file = os.path.join('outputs', 'housing_model.pkl')
joblib.dump(value=model, filename=model_file)

# Register the model
print('Registering model...')
Model.register(workspace=run.experiment.workspace,
               model_path = model_file,
               model_name = 'housing_model',
              )


run.complete()




Overwriting azure_pipeline/training_data.py


#### PREPARE A COMPUTE ENVIRONMENT FOR THE PIPELINE STEPS

In this exercise, you'll use the same compute for both steps, but it's important to realize that each step is run independently; so you could specify different compute contexts for each step if appropriate.



In [28]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core import Environment
from azureml.core.runconfig import RunConfiguration


cluster_name = 'aml-cluster'
pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
training_env = Environment.get(workspace=ws, name='training_environment')

# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()

# Use the compute you created above. 
pipeline_run_config.target = pipeline_cluster

# Assign the environment to the run configuration
pipeline_run_config.environment = training_env

print ("Run configuration created.")

Run configuration created.


#### CREATE AND RUN A PIPELINE

Now you're ready to create and run a pipeline.

First you need to define the steps for the pipeline, and any data references that need to be passed between them. In this case, the first step must write the prepared data to a folder that can be read from by the second step. Since the steps can be run on remote compute (and in fact, could each be run on different compute), the folder path must be passed as a data reference to a location in a datastore within the workspace. The OutputFileDatasetConfig object is a special kind of data reference that is used for interim storage locations that can be passed between pipeline steps, so you'll create one and use at as the output for the first step and the input for the second step. Note that you need to pass it as a script argument so your code can access the datastore location referenced by the data reference.

In [29]:
from azureml.data import OutputFileDatasetConfig
from azureml.pipeline.steps import PythonScriptStep

# Get the training dataset
tab_dataset= ws.datasets['housing dataset']

# Create an OutputFileDatasetConfig (temporary Data Reference) for data passed from step 1 to step 2
prepped_data = OutputFileDatasetConfig("prepped_data")


# Step 1, Run the data prep script
prep_step = PythonScriptStep(name = "Prepare Data",
                                source_directory = 'azure_pipeline',
                                script_name = 'perp_data.py',
                                arguments = ['--ds', tab_dataset.as_named_input('my_dataset'),
                                             '--prepped-data', prepped_data],
                                compute_target = pipeline_cluster,
                                runconfig = pipeline_run_config,
                                allow_reuse = True)



# Step 2, run the training script
train_step = PythonScriptStep(name = "Train and Register Model",
                                source_directory = 'azure_pipeline',
                                script_name = 'training_data.py',
                                arguments = ['--training-data', prepped_data.as_input()],
                                compute_target = pipeline_cluster,
                                runconfig = pipeline_run_config,
                                allow_reuse = True)

print("Pipeline steps defined")

Pipeline steps defined


OK, you're ready build the pipeline from the steps you've defined and run it as an experiment.

In [31]:
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline
from azureml.widgets import RunDetails

# Construct the pipeline
pipeline_steps = [prep_step, train_step]
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
experiment = Experiment(workspace=ws, name = 'housing-price-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
print("Pipeline submitted for execution.")
RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion(show_output=True)

Pipeline is built.
Created step Prepare Data [8211fdd0][c215ee2b-bd3b-4188-af03-d6e464544fc2], (This step will run and generate new outputs)
Created step Train and Register Model [d4b10e4e][c0c9bf49-a412-43f8-b3aa-0d17ed2f84c5], (This step will run and generate new outputs)
Submitted PipelineRun ede6e212-cbe1-4694-b743-9ed5040da0f9
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/ede6e212-cbe1-4694-b743-9ed5040da0f9?wsid=/subscriptions/0e1c43f5-07fe-4a3e-a2be-743edb639c94/resourcegroups/aml_resource_group/workspaces/aml_workspace&tid=a49bbff2-fe01-487f-9141-122113ada106
Pipeline submitted for execution.


_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

PipelineRunId: ede6e212-cbe1-4694-b743-9ed5040da0f9
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/ede6e212-cbe1-4694-b743-9ed5040da0f9?wsid=/subscriptions/0e1c43f5-07fe-4a3e-a2be-743edb639c94/resourcegroups/aml_resource_group/workspaces/aml_workspace&tid=a49bbff2-fe01-487f-9141-122113ada106
PipelineRun Status: NotStarted
PipelineRun Status: Running


Expected a StepRun object but received <class 'azureml.core.run.Run'> instead.
This usually indicates a package conflict with one of the dependencies of azureml-core or azureml-pipeline-core.
Please check for package conflicts in your python environment







Expected a StepRun object but received <class 'azureml.core.run.Run'> instead.
This usually indicates a package conflict with one of the dependencies of azureml-core or azureml-pipeline-core.
Please check for package conflicts in your python environment






PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': 'ede6e212-cbe1-4694-b743-9ed5040da0f9', 'status': 'Completed', 'startTimeUtc': '2022-01-25T05:19:26.801672Z', 'endTimeUtc': '2022-01-25T05:24:03.377728Z', 'services': {}, 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}', 'azureml.continue_on_step_failure': 'False', 'azureml.pipelineComponent': 'pipelinerun'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://amlworkspace5708480350.blob.core.windows.net/azureml/ExperimentRun/dcid.ede6e212-cbe1-4694-b743-9ed5040da0f9/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=AYHi5xHy2HMbOj8K75iEMJpmd6jz%2BYmg42SPLxZGz%2FU%3D&skoid=7c70f442-ba12-4608-b995-7cda80df4f94&sktid=a49bbff2-fe01-487f-9141-122113ada106&skt=2022-01-25T05%3A09%3A28Z&ske=2022-01-26T13%3A19%3A28Z&sks=b&skv=2019-07-07&st=2022-01-25T05%3A09%3A28Z&se=2022-01-25T13%3A19%3A28Z&sp=r', 

'Finished'

In [32]:
for run in pipeline_run.get_children():
    print(run.name, ':')
    metrics = run.get_metrics()
    for metric_name in metrics:
        print('\t',metric_name, ":", metrics[metric_name])

Train and Register Model :
	 Root Mean Squared Error : 28506.082515964164
	 R square : 0.8820669915121216
	 Predictions : aml://artifactId/ExperimentRun/dcid.cb991434-9b78-4c5d-8a5a-771d695e0d4c/Predictions_1643088229.png
Prepare Data :
	 raw_rows : 1460
	 processed_rows : 1440


#### PUBLISH THE PIPELINE

After you've created and tested a pipeline, you can publish it as a REST service.

In [86]:
# Publish the pipeline from the run
published_pipeline = pipeline_run.publish_pipeline(
    name="diabetes-training-pipeline", description="Trains diabetes model", version="1.0")

published_pipeline

Name,Id,Status,Endpoint
diabetes-training-pipeline,78fae10d-e1cf-41e7-8bb0-1db05a2e5393,Active,REST Endpoint


Note that the published pipeline has an endpoint, which you can see in the Endpoints page (on the Pipeline Endpoints tab) in Azure Machine Learning studio. You can also find its URI as a property of the published pipeline object:

In [87]:
rest_endpoint = published_pipeline.endpoint
print(rest_endpoint)

https://centralindia.api.azureml.ms/pipelines/v1.0/subscriptions/0e1c43f5-07fe-4a3e-a2be-743edb639c94/resourceGroups/aml_resource_group/providers/Microsoft.MachineLearningServices/workspaces/aml_workspace/PipelineRuns/PipelineSubmit/78fae10d-e1cf-41e7-8bb0-1db05a2e5393


#### CALL THE PIPELINE ENDPOINT

To use the endpoint, client applications need to make a REST call over HTTP. This request must be authenticated, so an authorization header is required. A real application would require a service principal with which to be authenticated, but to test this out, we'll use the authorization header from current connection to Azure workspace, which you can get using the following code:

In [88]:
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()
print("Authentication header ready.")

Authentication header ready.


now we're ready to call the REST interface. The pipeline runs asynchronously, so we'll get an identifier back, which we can use to track the pipeline experiment as it runs:

In [89]:
import requests

experiment_name = 'housing-price-pipeline'

rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": experiment_name})
run_id = response.json()["Id"]
run_id

'a9b67f00-bb9c-4956-9962-b0aed2cfb976'

Since you have the run ID, you can use it to wait for the run to complete.



In [90]:
from azureml.pipeline.core.run import PipelineRun

published_pipeline_run = PipelineRun(ws.experiments[experiment_name], run_id)
published_pipeline_run.wait_for_completion(show_output=True)

PipelineRunId: a9b67f00-bb9c-4956-9962-b0aed2cfb976
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/a9b67f00-bb9c-4956-9962-b0aed2cfb976?wsid=/subscriptions/0e1c43f5-07fe-4a3e-a2be-743edb639c94/resourcegroups/aml_resource_group/workspaces/aml_workspace&tid=a49bbff2-fe01-487f-9141-122113ada106

PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': 'a9b67f00-bb9c-4956-9962-b0aed2cfb976', 'status': 'Completed', 'startTimeUtc': '2022-01-24T10:35:54.440751Z', 'endTimeUtc': '2022-01-24T10:35:56.540831Z', 'services': {}, 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'Unavailable', 'runType': 'HTTP', 'azureml.parameters': '{}', 'azureml.continue_on_step_failure': 'False', 'azureml.pipelineComponent': 'pipelinerun', 'azureml.pipelineid': '78fae10d-e1cf-41e7-8bb0-1db05a2e5393'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://amlworkspace5708480350.blob.core.windows.net/azureml/Experiment

'Finished'

We can create a schedule for pipeline run with Schedule.create(). Schedule can be created as per fixed time interval(ScheduleRecurrence), or whever there is a change in data.

#### DEPLOY THE MODEL AS A REAL TIME INFERENCING SERVICE

The first step is to register the model and the same has been done.

from azureml.core import Model

model = Model.register(workspace=ws,
                       model_name='house_price_prediction_model',
                       model_path='housing_model', # local path
                       description='A house price pred model')

In [91]:
model = ws.models['housing_model']

Next we need to define the script and environment for the service.
A script to load the model and return predictions for submitted data.
An environment in which the script will be run.

#### CREATE AN ENTRY SCRIPT

Create the entry script (sometimes referred to as the scoring script) for the service as a Python (.py) file. It must include two functions:

init(): Called when the service is initialized.
run(raw_data): Called when new data is submitted to the service.
Typically, you use the init function to load the model from the model registry, and use the run function to generate predictions from the input data. The following example script shows this pattern:

In [99]:
%%writefile 'azure_pipeline/load_model.py'

import json
import joblib
import numpy as np
import os

# Called when the service is loaded
def init():
    global model
    # Get the path to the registered model file and load it
    #model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
    #model_file = os.path.join('outputs', 'housing_model.pkl')
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'housing_model.pkl')
    model = joblib.load(model_path)

# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    # Return the predictions as any JSON serializable format
    return predictions.tolist()
    #return predictions

Overwriting azure_pipeline/load_model.py


Create an environment

In [100]:
service_env = Environment(name='service-env')
python_packages = ['scikit-learn', 'numpy', 'pandas','azureml-defaults',  'azure-ml-api-sdk'] # whatever packages your entry script uses
for package in python_packages:
    service_env.python.conda_dependencies.add_pip_package(package)

Combine the script and environment in an InferenceConfig

In [101]:
from azureml.core.model import InferenceConfig

regression_inference_config = InferenceConfig(source_directory = 'azure_pipeline',
                                              entry_script='load_model.py',
                                                  environment=service_env)

Define the deployment configuration and deploy model

In [102]:
from azureml.core.webservice import AciWebservice
from azureml.core.model import Model

# Configure the web service container
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

# Deploy the model as a service

model = ws.models['housing_model']

print('Deploying model...')
service_name = "houseprice-service"
service = Model.deploy(ws, service_name, [model], regression_inference_config, deployment_config, overwrite=True)
service.wait_for_deployment(True)
print(service.state)

Deploying model...
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-01-24 16:14:53+05:30 Creating Container Registry if not exists.
2022-01-24 16:14:54+05:30 Building image..
2022-01-24 16:20:53+05:30 Generating deployment configuration.
2022-01-24 16:20:54+05:30 Submitting deployment to compute..
2022-01-24 16:20:59+05:30 Checking the status of deployment houseprice-service..
2022-01-24 16:25:35+05:30 Checking the status of inference endpoint houseprice-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


In [103]:
#for troubleshoot
print(service.get_logs())

2022-01-24T10:54:44,855216100+00:00 - rsyslog/run 
2022-01-24T10:54:44,862271100+00:00 - iot-server/run 
2022-01-24T10:54:44,875589800+00:00 - gunicorn/run 
Dynamic Python package installation is disabled.
Starting HTTP server
2022-01-24T10:54:44,918836600+00:00 - nginx/run 
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T10:54:45,283592700+00:00 - iot-server/finish 1 0
2022-01-24T10:54:45,285280900+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 20.1.0
Listening at: http://127.0.0.1:31311 (72)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 101
SPARK_HOME not set. Skipping PySpark Initialization.
Initializing logger
2022-01-24 10:54:46,393 | root | INFO | Starting up app insights client
logging socket was found. logging is available.
logging socket was found. logging is available.
2022-01-24 10:54:46,401 | root | INFO | Starting up request id generator
2022-01-24 10:54:46,401 | root | INFO | Sta

Take a look at THE workspace in Azure Machine Learning Studio and view the Endpoints page, which shows the deployed services in your workspace.

You can also retrieve the names of web services in  workspace by running the following code:

In [104]:
for webservice_name in ws.webservices:
    print(webservice_name)

houseprice-service


#### USE THE WEB SERVICE 

With the service deployed, now you can consume it from a client application.

In [109]:
import json

x_new = [[-1.71882123,  0.82720887,  0.94761458,  0.69004471, -0.28877977,
       -1.14297759, -0.61039742, -0.96816863,  0.51214674, -0.28190747,
        0.05274251, -0.42980633, -0.24585151, -0.36365729, -0.68057648,
       -0.55502046, -0.47314739]]


# Convert the array to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Call the web service, passing the input data (the web service will also accept the data in binary format)
predictions = service.run(input_data = input_json)

# Get the prediction
print(predictions)




[183302.1255239806]


The code above uses the Azure Machine Learning SDK to connect to the containerized web service and use it to generate predictions from your diabetes classification model. In production, a model is likely to be consumed by business applications that do not use the Azure Machine Learning SDK, but simply make HTTP requests to the web service.

Let's determine the URL to which these applications must submit their requests:

In [110]:
endpoint = service.scoring_uri
print(endpoint)

http://2fd6e687-0935-4f60-bbae-fa78d6c8a179.centralindia.azurecontainer.io/score


Now that you know the endpoint URI, an application can simply make an HTTP request, sending the patient data in JSON format, and receive back the predicted

In [115]:
import requests
import json

x_new = [[-1.71882123,  0.82720887,  0.94761458,  0.69004471, -0.28877977,
       -1.14297759, -0.61039742, -0.96816863,  0.51214674, -0.28190747,
        0.05274251, -0.42980633, -0.24585151, -0.36365729, -0.68057648,
       -0.55502046, -0.47314739]]

# Convert the array to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Set the content type
headers = { 'Content-Type':'application/json' }

predictions = requests.post(endpoint, input_json, headers = headers)
predictions.json()
#predicted = json.loads(predictions.json())

[183302.1255239806]

Delete the service
When you no longer need your service, you should delete it to avoid incurring unecessary charges.



In [116]:
service.delete()
print ('Service deleted.')

Service deleted.
