# Set Up a Scoring Pipeline
Pipelines are reusable objects you can use to transform, move and score data.  Scoring pipelines are used for batch execution jobs.<br>
You should use batch execution scoring jobs when you do not immediately need to receive a prediction from the process.  Some time lag is okay.<br> 
Scoring pipelines are inherently slower than AKS because your compute cluster takes time to spin up.  However, they are much cheaper as a result.<br>
Typical scenarios for scoring pipelines are when you run a machine model on a daily, weekly or monthly basis and use the output in other reports.

To learn more about scoring pipelines, click here:  https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-batch-scoring-classification

In [23]:
# Load Azure Libaries
from azureml.core import Datastore
from azureml.core.dataset import Dataset
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.experiment import Experiment
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.automl.core.featurization import FeaturizationConfig
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.explain.model._internal.explanation_client import ExplanationClient

#Load Libraries for Deployment
from azureml.core.model import Model
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import AksWebservice
from azureml.pipeline.steps import PythonScriptStep
from azureml.data.data_reference import DataReference
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.pipeline.core import Pipeline, PipelineData, PublishedPipeline, PipelineEndpoint
from azureml.core.runconfig import RunConfiguration, CondaDependencies, DEFAULT_CPU_IMAGE
from azureml.widgets import RunDetails

##load libraries for math and data manipulation
import os
import math
import json
import logging
import requests
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [24]:
# Retrieve your workspace by name by filling in the lower case values between double quotes
ws = Workspace.get(name="ancient-rivers-ml-workspace",
        subscription_id="47a7ec0c-37ad-428b-9114-b87ea1057632",
        resource_group="xeek-ancient-rivers")

In [25]:
# Retrieve your Datastore by name by filling in the lower case values between double quotes
datastore_name = "ancientrivers"
datastore = Datastore.get(ws, datastore_name)

In [26]:
# Retrieve your Datasets by name by filling in the lower case values between double quotes
dataset_name_test  = "ancient-rivers-test-transformed"
dataset_name_train = "ancient-rivers-train-transformed"

# Load Data in as Tabular Datasets
testing_data  = Dataset.get_by_name(ws, dataset_name_test, version='latest')
training_data = Dataset.get_by_name(ws, dataset_name_train, version='latest')

In [27]:
# Convert your tabular dataset to pandas data frames
testTransformedDF = testing_data.to_pandas_dataframe()
trainTransformedDF = training_data.to_pandas_dataframe()

In [28]:
# Retrieve your Compute Targets for Running AutoML
cpu_compute_target = ComputeTarget(ws, 'cpu-cluster-h')
# Retrieve a GPU cluster for Deep Learning Runs
gpu_compute_target = ComputeTarget(ws, 'gpu-cluster')

In [29]:
# Retrieve your AutoML Model
model = Model(ws, 'AutoML37a31f86512')
# Assign a variable to your model name
model_name = model.name

In [30]:
# Retrieve your AutoML generated environment
environment = Environment.get(ws, 'automl-environment')

In [31]:
# Retrieve your AutoML generated entry script
entry_script = 'inference/score.py'

### First, you must create your scoring script.
A scoring script is simply a script that pulls in your model, pulls in data, scores the data with the model, and saves it to an output.<br>If you used AutoML, you can use your score.py as a starting point, but you will need to modify it to read and write data to a specified location.


In [32]:
# Look at your AutoML Scoring Script
with open(os.path.join(entry_script)) as inference_file:
    print(inference_file.read())

# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
import json
import pickle
import numpy as np
import pandas as pd
import azureml.train.automl
from sklearn.externals import joblib
from azureml.core.model import Model

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType


input_sample = pd.DataFrame(data=[{'RowID': 0, 'GR': 99.0, 'smoothedGR': 104.0, 'FirstRow': 'First', 'smoothedGR1': None, 'smoothedGR-1': 104.0, 'smoothedGR2': None, 'smoothedGR-2': 104.1, 'smoothedGR3': None, 'smoothedGR-3': 104.1, 'smoothedGR4': None, 'smoothedGR-4': 104.2, 'smoothedDifference': 0.0, 'smoothedDifference_SMA': -2.8, 'smoothedDifference_SMA_4': None, 'smoothedDifference_SMA_8': None,

### Write your Scoring Script
The code below is based on AutoML.  Follow these steps:

1.  Import all of the libraries used to run your score.py file
2.  Pass in Arguments.  These are values that will be passed in via your pipeline script.  Model, Input and Output location are common arguments.
3.  Retrieve and load your model by name and set it to a global variable.
4.  Write your main function.  This code will score the model and write it to a location on your datastore.  However, this code can do whatever you wish.  You can transform your data here, write and output graphs and score a number of metrics.
5.  Run your main function.

In [34]:
%%writefile inference/ancient-rivers-scoring-script.py
# Pull in Libaries
import json
import pickle
import argparse
import os
from fbprophet import Prophet
import numpy as np
import pandas as pd
import azureml.train.automl
import joblib
from azureml.core.model import Model
from azureml.core import Workspace, Datastore, Dataset, Run

# Pass in Arguments
parser = argparse.ArgumentParser()
parser.add_argument('--model-name', dest="model_name", required=True)
parser.add_argument('--scoring-directory', dest="scoring_directory", required=True)
parser.add_argument('--input-data', dest="input_data", required=True)
args = parser.parse_args() 

global model
# Retrieve Model
model_path = Model.get_model_path(model_name = args.model_name)
model = joblib.load(model_path)

def main():
    #create output directories if they do not exist
    os.makedirs(args.scoring_directory, exist_ok=True)

    # Pull in an input dataset
    DataPath = args.input_data

    # Convert to Pandas Dataframe
    DataDF = pd.read_csv(DataPath)

    # Score Data
    scoredDataResults = pd.Series(model.predict(DataDF))

    # Join Results to original data
    scoredData = DataDF
    scoredData['Label'] = scoredDataResults

    #Save Results
    scoredFileName = "prediction"
    scoredPath = os.path.join(args.scoring_directory, scoredFileName)
    scoredData.to_csv(scoredPath, index = False)
    
if __name__ == '__main__':
    main()

Overwriting inference/ancient-rivers-scoring-script.py


### Next, write an environment that supports the scoring script.  
Examine the packages that you used in the scoring script and do a pip freeze to list their versions.<br>
Also examine the environment file that was initially used to create the model to obtain dependencies.

Learn more about environments here: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments

In [35]:
# Examine your environment file
environment

{
    "name": "automl-environment",
    "version": "1",
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "python": {
        "userManagedDependencies": false,
        "interpreterPath": "python",
        "condaDependenciesFile": null,
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge"
            ],
            "dependencies": [
                "python=3.6.2",
                {
                    "pip": [
                        "azureml-train-automl-runtime==1.0.85.5",
                        "inference-schema",
                        "azureml-explain-model==1.0.85",
                        "azureml-defaults==1.0.85.1"
                    ]
                },
                "numpy>=1.16.0,<=1.16.2",
                "pandas>=0.21.0,<=0.23.4",
                "scikit-learn>=0.19.0,<=0.20.3",
                "py-xgboost<=0.80",
                "fbprophet==0.5",
                

In [37]:
conda list

# packages in environment at /anaconda/envs/azureml_py36:
#
# Name                    Version                   Build  Channel
_anaconda_depends         2019.03                  py36_0  
_libgcc_mutex             0.1                        main  
absl-py                   0.9.0                     <pip>
adal                      1.2.2                     <pip>
alabaster                 0.7.12                   py36_0  
alembic                   1.3.2                     <pip>
anaconda                  custom                   py36_1  
anaconda-client           1.7.2                    py36_0  
anaconda-project          0.8.3                      py_0  
ansiwrap                  0.8.4                     <pip>
applicationinsights       0.11.9                    <pip>
asn1crypto                1.0.1                    py36_0  
astor                     0.8.1                     <pip>
astroid                   2.3.1                    py36_0  
astropy                   3.


Note: you may need to restart the kernel to use updated packages.


In [44]:
# Configure your environment by setting up a Run Configuration

# First, set your libraries here based on your scoring script and your environment file
# Pay attention to the dependencies in your environment file and the packages you use in your script
# If a library is in the conda list, set it as a conda package, otherwise, set it as a pip package
# Always use conda over pip when both are available, as conda takes care of underlying dependencies for you
cd = CondaDependencies.create(conda_packages = ["py-xgboost==0.80","numpy==1.16.2","pandas==0.23.4","psutil==5.6.3", \
                                               "fbprophet==0.5", "scikit-learn==0.20.3","joblib==0.14.1"],
                              pip_packages=["azureml-train-automl-runtime==1.0.85.5","inference-schema", \
                                           "azureml-explain-model==1.0.85","azureml-defaults==1.0.85.1"])

# Create a Run Configuration within a Docker Container and your environment settings by using the code below
amlcompute_run_config = RunConfiguration(conda_dependencies=cd)
amlcompute_run_config.environment.docker.enabled = True
amlcompute_run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE # Use DEFAULT_GPU_IMAGE for Deep Learning Jobs
amlcompute_run_config.environment.python.user_managed_dependencies = False # Set to False in Most Cases

### Specify Output Directories and Input Data
These are simply places in your datastore from where you read and write data.  We use a Data Reference object.

Learn more here:  https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.data_reference.datareference?view=azure-ml-py

In [45]:
# Create an input directory or file path

input_data = DataReference(
    datastore = datastore,
    data_reference_name="ancient_rivers_landing_zone",
    path_on_datastore="test/landing/input"
)

# Create an output directory or file path

scoring_directory = DataReference(
    datastore = datastore,
    data_reference_name="ancient_rivers_scored_zone",
    path_on_datastore="test/scored"
)

### Combine everything to create your own Pipeline Configuation using a Custom Python Script
A Pipeline built using a python script requires a name, a scoring script name, a local path which holds the scoring script, arguments to pass into the scoring script (like your model name, input directory and output directory), a compute target, a list of inputs and a run configuration specifying your environment.<br>  In this case, your inputs are your input and output directories.<br><br>
The output configuration is only to pass on data in a multi-step pipeline and will be left blank for this step.<br>

For more about Python Script Step configuration settings, click the link below:<br>
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py

In [46]:
batch_score_step = PythonScriptStep(
    name="ancient-rivers-scoring-step",
    source_directory = "inference",
    script_name = "ancient-rivers-scoring-script.py",
    arguments= ["--model-name", model_name,
                "--scoring-directory", scoring_directory,
                "--input-data", input_data
               ],
    compute_target=cpu_compute_target,
    inputs=[input_data, scoring_directory],
    #outputs=[output_dir],  # Only necessary if there's another step in the pipeline
    runconfig=amlcompute_run_config
)

### Run your Pipeline

In [47]:
# Create your pipeline
pipeline = Pipeline(workspace=ws, steps=[batch_score_step])
pipeline_run = Experiment(ws, 'ancient-rivers-scoring-pipeline').submit(pipeline,pipeline_parameters={}, show_output=True)

Created step ancient-rivers-scoring-step [5c117d8c][c21a1451-7155-4256-b1ee-bccb12856800], (This step is eligible to reuse a previous run's output)
Using data reference ancient_rivers_landing_zone for StepId [993106b5][2d9c60f6-6a01-4d6f-baa6-2977460c64e5], (Consumers of this data are eligible to reuse prior runs.)
Using data reference ancient_rivers_scored_zone for StepId [9886102c][7ad25d16-ef97-40e4-85a9-753e6923a4c7], (Consumers of this data are eligible to reuse prior runs.)
Submitted PipelineRun 9c415af0-45fc-4c1a-9f7c-f60c717be6a3
Link to Azure Machine Learning studio: https://ml.azure.com/experiments/ancient-rivers-scoring-pipeline/runs/9c415af0-45fc-4c1a-9f7c-f60c717be6a3?wsid=/subscriptions/47a7ec0c-37ad-428b-9114-b87ea1057632/resourcegroups/xeek-ancient-rivers/workspaces/ancient-rivers-ml-workspace


In [48]:
# GUI to see your Pipeline Run
RunDetails(pipeline_run).show() 

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

In [49]:
# Console logs for your Pipeline
pipeline_run.wait_for_completion(show_output=True)

PipelineRunId: 9c415af0-45fc-4c1a-9f7c-f60c717be6a3
Link to Portal: https://ml.azure.com/experiments/ancient-rivers-scoring-pipeline/runs/9c415af0-45fc-4c1a-9f7c-f60c717be6a3?wsid=/subscriptions/47a7ec0c-37ad-428b-9114-b87ea1057632/resourcegroups/xeek-ancient-rivers/workspaces/ancient-rivers-ml-workspace
PipelineRun Status: Running


StepRunId: 61a4cb8e-59c4-4fe1-8b55-b770560e3b7f
Link to Portal: https://ml.azure.com/experiments/ancient-rivers-scoring-pipeline/runs/61a4cb8e-59c4-4fe1-8b55-b770560e3b7f?wsid=/subscriptions/47a7ec0c-37ad-428b-9114-b87ea1057632/resourcegroups/xeek-ancient-rivers/workspaces/ancient-rivers-ml-workspace
StepRun( ancient-rivers-scoring-step ) Status: NotStarted
StepRun( ancient-rivers-scoring-step ) Status: Running

Streaming azureml-logs/20_image_build_log.txt
2020/03/01 23:29:24 Downloading source code...
2020/03/01 23:29:25 Finished downloading source code
2020/03/01 23:29:26 Creating Docker network: acb_default_network, driver: 'bridge'
2020/03/01 23:29:26


libgomp-9.2.0        | 816 KB    |            |   0% [0m[91m
libgomp-9.2.0        | 816 KB    | #########5 |  95% [0m[91m
libgomp-9.2.0        | 816 KB    | ########## | 100% [0m[91m

libcblas-3.8.0       | 10 KB     |            |   0% [0m[91m
libcblas-3.8.0       | 10 KB     | ########## | 100% [0m[91m

freetype-2.10.0      | 884 KB    |            |   0% [0m[91m
freetype-2.10.0      | 884 KB    | ########5  |  86% [0m[91m
freetype-2.10.0      | 884 KB    | ########## | 100% [0m[91m

binutils_impl_linux- | 9.1 MB    |            |   0% [0m[91m
binutils_impl_linux- | 9.1 MB    | #####5     |  55% [0m[91m
binutils_impl_linux- | 9.1 MB    | #######6   |  76% [0m[91m
binutils_impl_linux- | 9.1 MB    | #########3 |  93% [0m[91m
binutils_impl_linux- | 9.1 MB    | ########## | 100% [0m[91m

ncurses-6.0          | 920 KB    |            |   0% [0m[91m
ncurses-6.0          | 920 KB    | #######9   |  79% [0m[91m
ncurses-6.0          | 920 KB    | ########9  |  9

Verifying transaction: ...working... done
Executing transaction: ...working... 
done
Collecting azureml-train-automl-runtime==1.0.85.5
  Downloading azureml_train_automl_runtime-1.0.85.5-py3-none-any.whl (77 kB)
Collecting inference-schema
  Downloading inference_schema-1.0.1-py3-none-any.whl (18 kB)
Collecting azureml-explain-model==1.0.85
  Downloading azureml_explain_model-1.0.85-py3-none-any.whl (22 kB)
Collecting azureml-defaults==1.0.85.1
  Downloading azureml_defaults-1.0.85.1-py2.py3-none-any.whl (3.0 kB)
Collecting sklearn-pandas<=1.7.0,>=1.4.0
  Downloading sklearn_pandas-1.7.0-py2.py3-none-any.whl (10 kB)
Collecting azureml-automl-runtime==1.0.85.*
  Downloading azureml_automl_runtime-1.0.85.5-py3-none-any.whl (1.9 MB)
Collecting resource>=0.1.8
  Downloading Resource-0.2.1-py2.py3-none-any.whl (25 kB)
Collecting azureml-train-automl-client==1.0.85.*
  Downloading azureml_train_automl_client-1.0.85.4-py3-none-any.whl (69 kB)
Collecting azureml-core==1.0.85.*
  Downloading az

Collecting fusepy>=3.0.1; extra == "fuse"
  Downloading fusepy-3.0.1.tar.gz (11 kB)
Collecting pyarrow==0.15.*; extra == "pandas"
  Downloading pyarrow-0.15.1-cp36-cp36m-manylinux2010_x86_64.whl (59.2 MB)
Collecting smart-open>=1.8.1
  Downloading smart_open-1.9.0.tar.gz (70 kB)
Collecting keras2onnx
  Downloading keras2onnx-1.6.0-py3-none-any.whl (219 kB)
Collecting interpret-community==0.4.*
  Downloading interpret_community-0.4.1-py3-none-any.whl (23.4 MB)
Collecting click>=5.1
  Downloading Click-7.0-py2.py3-none-any.whl (81 kB)
Collecting Jinja2>=2.10
  Downloading Jinja2-2.11.1-py2.py3-none-any.whl (126 kB)
Collecting itsdangerous>=0.24
  Downloading itsdangerous-1.1.0-py2.py3-none-any.whl (16 kB)
Collecting liac-arff>=2.1.1
  Downloading liac-arff-2.4.0.tar.gz (15 kB)
Collecting jsonschema
  Downloading jsonschema-3.2.0-py2.py3-none-any.whl (56 kB)
Collecting PyYAML
  Downloading PyYAML-5.3.tar.gz (268 kB)
Collecting requests-oauthlib>=0.5.0
  Downloading requests_oauthlib-1.3.0

  Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4830 sha256=ccbf52bacb4f9a54180bc095f04b27ef4a43d3fe5157cb7340fff0f5e4156ce1
  Stored in directory: /root/.cache/pip/wheels/93/2a/eb/e58dbcbc963549ee4f065ff80a59f274cc7210b6eab962acdc
Successfully built dill wrapt json-logging-py py-cpuinfo JsonSir JsonForm fusepy smart-open liac-arff PyYAML fire shap pyrsistent pycparser termcolor
Installing collected packages: scipy, sklearn-pandas, patsy, statsmodels, pmdarima, JsonSir, attrs, zipp, importlib-metadata, pyrsistent, jsonschema, JsonForm, PyYAML, python-easyconfig, resource, protobuf, typing-extensions, onnx, onnxconverter-common, skl2onnx, cloudpickle, distro, dotnetcore2, azureml-dataprep-native, fusepy, pyarrow, azureml-dataprep, boto, idna, chardet, urllib3, requests, docutils, jmespath, botocore, s3transfer, boto3, smart-open, gensim, lightgbm, termcolor, fire, keras2onnx, onnxmltools, dill, azureml-automl-core, py-cpuinfo, wheel, nimbusml, azureml-autom


Streaming azureml-logs/55_azureml-execution-tvmps_2a4a06821f6562f35440589203c0af982a1395217a9f15d07b9ef55f716c56e7_d.txt
2020-03-01T23:45:30Z Starting output-watcher...
2020-03-01T23:45:30Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_2c6e63ed5158ba5dbcb9b13d72e8814a
a1298f4ce990: Pulling fs layer
04a3282d9c4b: Pulling fs layer
9b0d3db6dc03: Pulling fs layer
8269c605f3f1: Pulling fs layer
6504d449e70c: Pulling fs layer
4e38f320d0d4: Pulling fs layer
b0a763e8ee03: Pulling fs layer
11917a028ca4: Pulling fs layer
a6c378d11cbf: Pulling fs layer
6cc007ad9140: Pulling fs layer
6c1698a608f3: Pulling fs layer
1460a62ff947: Pulling fs layer
a9c966eafa61: Pulling fs layer
0b4c4154ff8b: Pulling fs layer
333603dc44f9: Pulling fs layer
e8f479870769: Pulling fs layer
bd12bd79bfb9: Pulling fs layer
6cc007ad9140: Waiting
1460a62ff947: Waiting
a9c966eafa61: Waiting
6c1698a608f3: Waiting
8269c605f3f1: Waitin



PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '9c415af0-45fc-4c1a-9f7c-f60c717be6a3', 'status': 'Completed', 'startTimeUtc': '2020-03-01T23:28:55.791528Z', 'endTimeUtc': '2020-03-01T23:47:52.02492Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}'}, 'inputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://ancientriversm8752540079.blob.core.windows.net/azureml/ExperimentRun/dcid.9c415af0-45fc-4c1a-9f7c-f60c717be6a3/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=2Sm2AhOE5Bh29Ya05aa2ExIQfoyeAvpP4CEpqZPaqEQ%3D&st=2020-03-01T23%3A37%3A56Z&se=2020-03-02T07%3A47%3A56Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://ancientriversm8752540079.blob.core.windows.net/azureml/ExperimentRun/dcid.9c415af0-45fc-4c1a-9f7c-f60c717be6a3/logs/azureml/stderrlogs.txt?sv=2019-02-02&sr=b&sig=%2Ft6IhAcQfgfGN2ZOIUvTSD7NSQx0MSwyVvR84dnXORM%3D&st=2020-03-01T23%3A37%3A56Z&se=2020-03-02T07%3

'Finished'

### Publish your pipeline
Once your pipeline is running, pubish it for later reuse.  When you publish a pipeline, you create a REST endpoint that you can use in Azure Data Factory.

In [50]:
# Match the name to your pipeline experiment
published_pipeline = pipeline_run.publish_pipeline(
    name="ancient-rivers-scoring-pipeline",\
    description="Ancient Rivers Scoring Batch Execution Pipeline for ADF Use", version="1.0")

published_pipeline

Name,Id,Status,Endpoint
ancient-rivers-scoring-pipeline,2f6a87b0-bfcc-4f22-8190-8264589ab32b,Active,REST Endpoint
