# Training, Registering and Deploying a Linear Regression Model 

## 1. Set up the Environment.

Import the relevant libraries and authenticate Jupyter with Azure. 

In [1]:
from azureml.core import Workspace

## 2. Feature Engineering

Connect to the dataset we created earlier. Convert this to a pandas dataframe. 

Inspect the dataset and one-hot-encode the categorical variables. 

In [3]:
# azureml-core of version 1.0.72 or higher is required
# azureml-dataprep[pandas] of version 1.1.34 or higher is required
from azureml.core import Workspace, Dataset

subscription_id = 'xxx'
resource_group = 'yyy'
workspace_name = 'zzz'

workspace = Workspace(subscription_id, resource_group, workspace_name)

dataset = Dataset.get_by_name(workspace, name='parquet_CDS')
dataset.to_pandas_dataframe()

Unnamed: 0,TITLE,ARTIST,COUNTRY,COMPANY,PRICE,YEAR,COUNTRY_FULL
0,Empire Burlesque,Bob Dylan,USA,Columbia,10.9,1985,United States of America
1,Hide your heart,Bonnie Tyler,UK,CBS Records,9.9,1988,United Kingodm
2,Greatest Hits,Dolly Parton,USA,RCA,9.9,1982,United States of America
3,Still got the blues,Gary Moore,UK,Virgin records,10.2,1990,United Kingodm
4,Eros,Eros Ramazzotti,EU,BMG,9.9,1997,European Union
5,One night only,Bee Gees,UK,Polydor,10.9,1998,United Kingodm
6,Sylvias Mother,Dr.Hook,UK,CBS,8.1,1973,United Kingodm
7,Maggie May,Rod Stewart,UK,Pickwick,8.5,1990,United Kingodm
8,Romanza,Andrea Bocelli,EU,Polydor,10.8,1996,European Union
9,When a man loves a woman,Percy Sledge,USA,Atlantic,8.7,1987,United States of America


In [73]:
#one hot encoding

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')

df = dataset.to_pandas_dataframe()
x_col = ['COUNTRY', 'COMPANY', 'YEAR']
y_col = ['PRICE']
x_df = df.loc[:, x_col]
y_df = df.loc[:, y_col]

dfx_enc = enc.fit_transform(x_df)
dfx_enc

<26x42 sparse matrix of type '<class 'numpy.float64'>'
	with 78 stored elements in Compressed Sparse Row format>

## 3. Train and Test the Model

Use scikitlearn to train and test a linear regression model. 

In [43]:
import os

from azureml.core import Dataset, Run
from sklearn.model_selection import train_test_split
#from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LinearRegression
# sklearn.externals.joblib is removed in 0.23
from sklearn import __version__ as sklearnver
from packaging.version import Version
if Version(sklearnver) < Version("0.23.0"):
    from sklearn.externals import joblib
else:
    import joblib

run = Run.get_context()

#dividing X,y into train and test data
x_train, x_test, y_train, y_test = train_test_split(dfx_enc, y_df, test_size=0.2, random_state=223)

data = {'train': {'X': x_train, 'y': y_train},

        'test': {'X': x_test, 'y': y_test}}

clf = LinearRegression().fit(data['train']['X'], data['train']['y'])
model_file_name = 'linear_regression.pkl'

print('Accuracy of Linear Regression Model on training set: {:.2f}'.format(clf.score(x_train, y_train)))
print('Accuracy of Linear Regression Model on test set: {:.2f}'.format(clf.score(x_test, y_test)))

os.makedirs('./outputs', exist_ok=True)
with open(model_file_name, 'wb') as file:
    joblib.dump(value=clf, filename='outputs/' + model_file_name)

Accuracy of Linear Regression Model on training set: 1.00
Accuracy of Linear Regression Model on test set: -0.56


## 4. Register the Model

### 4.1 Create the environment file
Define a conda environment YAML file with your training script dependencies and create an Azure ML environment.

In [25]:
%%writefile service_files/conda_dependencies.yml

dependencies:
- python=3.6.2
- scikit-learn
- pip:
  - azureml-defaults
  - packaging

Writing service_files/conda_dependencies.yml


In [27]:
from azureml.core import Environment

sklearn_env = Environment.from_conda_specification(name = 'sklearn-env', file_path = './service_files/conda_dependencies.yml')

### 4.2 Register the Trained Model
After successfully training a model, you must register it in your Azure Machine Learning workspace. 

To register a model from a local file, you can use the register method of the Model object as shown here:

In [44]:
from azureml.core import Model

linear_regression_model = Model.register(workspace=ws,
                       model_name='linear_regression',
                       model_path='outputs/linear_regression.pkl', # local path
                       description='A linear model')

Registering model linear_regression


### 4.3 Define an Inference Configuration
The model will be deployed as a service that consist of:

A script to load the model and return predictions for submitted data.
An environment in which the script will be run.
You must therefore define the script and environment for the service.

#### Creating an entry script
Create the entry script (sometimes referred to as the scoring script) for the service as a Python (.py) file. It must include two functions:

init(): Called when the service is initialized.  
run(raw_data): Called when new data is submitted to the service.  

Typically, you use the init function to load the model from the model registry, and use the run function to generate predictions from the input data. The following example script shows this pattern:

In [9]:
# this is put in the score.py script

import json
import joblib
import numpy as np
from azureml.core.model import Model

# Called when the service is loaded
def init():
    global model
    # Get the path to the registered model file and load it
    model_path = Model.get_model_path('linear_regression')
    model = joblib.load(model_path)

# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    # Return the predictions as any JSON serializable format
    return predictions.tolist()

#### Combining the script and environment in an InferenceConfig
After creating the entry script and environment configuration file, you can combine them in an InferenceConfig for the service like this:

In [42]:
from azureml.core.model import InferenceConfig

lr_inference_config = InferenceConfig(runtime= "python",
                                              source_directory = 'service_files',
                                              entry_script="score.py",
                                              conda_file="conda_dependencies.yml")

#### Define a deployment configuration
Now that you have the entry script and environment, you need to configure the compute to which the service will be deployed. If you are deploying to an AKS cluster, you must create the cluster and a compute target for it before deploying:

In [31]:
from azureml.core.compute import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'
compute_config = AksCompute.provisioning_configuration(location='australiaeast')
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)

Creating...........................................................................................
SucceededProvisioning operation finished, operation "Succeeded"


With the compute target created, you can now define the deployment configuration, which sets the target-specific compute specification for the containerized deployment:

In [33]:
from azureml.core.webservice import AksWebservice

lr_deploy_config = AksWebservice.deploy_configuration(cpu_cores = 1,
                                                              memory_gb = 1)

#### Deploy the model
After all of the configuration is prepared, you can deploy the model. The easiest way to do this is to call the deploy method of the Model class, like this:

In [45]:
from azureml.core.model import Model

model = ws.models['linear_regression']
service = Model.deploy(workspace=ws,
                       name = 'lr-service',
                       models = [model],
                       inference_config = lr_inference_config,
                       deployment_config = lr_deploy_config,
                       deployment_target = production_cluster)
service.wait_for_deployment(show_output = True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running............
Succeeded
AKS service creation operation finished, operation "Succeeded"


## 5. Consuming a real-time inferencing service
After deploying a real-time service, you can consume it from client applications to predict labels for new data cases.

Create a new dataset to pass to the model. Perform one-hot-encoding and serialise to JSON.

In [75]:
import pandas as pd
import json

# An array of new data cases
x_new = {'COUNTRY': ['USA'],
         'COMPANY': ['Columbia'],
         'YEAR': ['1984']}

        
dfx_new = pd.DataFrame(x_new, columns = ['COUNTRY', 'COMPANY', 'YEAR'])

#use the same encoding that was used previously
xnew_enc = enc.transform(dfx_new).toarray()
xnew_enc

array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [87]:
import json
from json import JSONEncoder
import numpy

class NumpyArrayEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, numpy.ndarray):
            return obj.tolist()
        return JSONEncoder.default(self, obj)

# Serialization
numpyData = {"data": xnew_enc}
encodedNumpyData = json.dumps(numpyData, cls=NumpyArrayEncoder)  # use dump() to write array into file
print("Printing JSON serialized NumPy array")
print(encodedNumpyData)


Printing JSON serialized NumPy array
{"data": [[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]}


### 5.1 Using the Azure Machine Learning SDK
For testing, you can use the Azure Machine Learning SDK to call a web service through the run method of a WebService object that references the deployed service. Typically, you send data to the run method in JSON format.


In [90]:
# Call the web service, passing the input data
response = service.run(input_data = encodedNumpyData)
print('Price prediction:')
response

Price prediction:


[[10.124019709059656]]

### 5.2 Using a REST endpoint
In production, most client applications will not include the Azure Machine Learning SDK, and will consume the service through its REST interface. You can determine the endpoint of a deployed service in Azure machine Learning studio, or by retrieving the scoring_uri property of the Webservice object in the SDK.

With the endpoint known, you can use an HTTP POST request with JSON data to call the service. The following example shows how to do this using Python:

In [112]:
import requests

endpoint = service.scoring_uri
print(endpoint)

primary_key, secondary_key = service.get_keys()

# Set the content type in the request headers
request_headers = { "Content-Type":"application/json",
                    "Authorization":"Bearer " + primary_key }

# Call the service
response = requests.post(url = endpoint,
                         data = encodedNumpyData,
                         headers = request_headers)

# Get the predictions from the JSON response
predictions = response.json()
print('Price prediction:')
predictions



http://104.209.92.63:80/api/v1/service/lr-service/score
Price prediction:


[[10.124019709059656]]