# Set Up a Scoring Pipeline
Pipelines are reusable objects you can use to transform, move and score data.  Scoring pipelines are used for batch execution jobs.<br>
You should use batch execution scoring jobs when you do not immediately need to receive a prediction from the process.  Some time lag is okay.<br> 
Scoring pipelines are inherently slower than AKS because your compute cluster takes time to spin up.  However, they are much cheaper as a result.<br>
Typical scenarios for scoring pipelines are when you run a machine model on a daily, weekly or monthly basis and use the output in other reports.

To learn more about scoring pipelines, click here:  https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-batch-scoring-classification

In [4]:
# Load Azure Libaries
from azureml.core import Datastore
from azureml.core.dataset import Dataset
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.experiment import Experiment
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.automl.core.featurization import FeaturizationConfig
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.explain.model._internal.explanation_client import ExplanationClient

#Load Libraries for Deployment
from azureml.core.model import Model
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import AksWebservice
from azureml.pipeline.steps import PythonScriptStep
from azureml.data.data_reference import DataReference
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.pipeline.core import Pipeline, PipelineData, PublishedPipeline, PipelineEndpoint
from azureml.core.runconfig import RunConfiguration, CondaDependencies, DEFAULT_CPU_IMAGE

from azureml.core.runconfig import DEFAULT_CPU_IMAGE


##load libraries for math and data manipulation
import os
import math
import json
import logging
import requests
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Retrieve your workspace by name by filling in the lower case values between double quotes
ws = Workspace.get(name="<my-workspace>",
        subscription_id="<my-azure-subscription-id>",
        resource_group="<my-resource-group>")

In [None]:
# Retrieve your Datastore by name by filling in the lower case values between double quotes
datastore_name = "<my-datastore-name>"
datastore = Datastore.get(ws, datastore_name)

In [None]:
# Retrieve your Refined Datasets by name by filling in the lower case values between double quotes
dataset_name_test = "<my-transformed-dataset-name>"
dataset_name_train  = "<my-transformed-dataset-name>"

# Load Data in as Tabular Datasets
testing_data  = Dataset.get_by_name(ws, dataset_name_test, version='latest')
training_data = Dataset.get_by_name(ws, dataset_name_train, version='latest')

In [None]:
# Convert your tabular dataset to pandas data frames
testTransformedDF = testing_data.to_pandas_dataframe()
trainTransformedDF = training_data.to_pandas_dataframe()

In [None]:
# Retrieve your Compute Target for most Machine Learning Models
cpu_compute_target = ComputeTarget(ws, '<my-cpu-cluster>')
# Retrieve a GPU cluster if your model involves Deep Learning
gpu_compute_target = ComputeTarget(ws, '<my-gpu-cluster>')

In [None]:
# Retrieve your Model
model = Model(ws, '<my-model-name>')
# Assign a variable to your model name
model_name = model.name

In [7]:
# Retrieve your AutoML generated environment
environment = Environment.get(ws, 'automl-environment')

In [2]:
# Retrieve your AutoML generated entry script
entry_script = 'inference/score.py'

### First, you must create your scoring script.
A scoring script is simply a script that pulls in your model, pulls in data, scores the data with the model, and saves it to an output.<br>If you used AutoML, you can use your score.py as a starting point, but you will need to modify it to read and write data to a specified location.


In [3]:
# Look at your AutoML Scoring Script
with open(os.path.join(entry_script)) as inference_file:
    print(inference_file.read())

# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
import json
import pickle
import numpy as np
import pandas as pd
import azureml.train.automl
from sklearn.externals import joblib
from azureml.core.model import Model

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType


input_sample = pd.DataFrame(data=[{'RowID': 0, 'GR': 99.0, 'smoothedGR': 104.0, 'FirstRow': 'First', 'smoothedGR1': None, 'smoothedGR-1': 104.0, 'smoothedGR2': None, 'smoothedGR-2': 104.1, 'smoothedGR3': None, 'smoothedGR-3': 104.1, 'smoothedGR4': None, 'smoothedGR-4': 104.2, 'smoothedDifference': 0.0, 'smoothedDifference_SMA': -2.8, 'smoothedDifference_SMA_4': None, 'smoothedDifference_SMA_8': None,

### Write your Scoring Script
The code below is based on AutoML.  Follow these steps:

1.  Import all of the libraries used to run your score.py file
2.  Pass in Arguments.  These are values that will be passed in via your pipeline script.  Model, Input and Output location are common arguments.
3.  Retrieve and load your model by name and set it to a global variable.
4.  Write your main function.  This code will score the model and write it to a location on your datastore.  However, this code can do whatever you wish.  You can transform your data here, write and output graphs and score a number of metrics.
5.  Run your main function.


In [None]:
%%writefile inference/score.py
# Pull in Libaries
import json
import pickle
import argparse
import os
from fbprophet import Prophet
import numpy as np
import pandas as pd
import azureml.train.automl
import joblib
from azureml.core.model import Model
from azureml.core import Workspace, Datastore, Dataset, Run

# Pass in Arguments
parser = argparse.ArgumentParser()
parser.add_argument('--model-name', dest="model_name", required=True)
parser.add_argument('--scoring-directory', dest="scoring_directory", required=True)
parser.add_argument('--input-data', dest="input_data", required=True)
args = parser.parse_args() 

global model
# Retrieve Model
model_path = Model.get_model_path(model_name = args.model_name)
model = joblib.load(model_path)

def main():
    #create output directories if they do not exist
    os.makedirs(args.scoring_directory, exist_ok=True)

    # Pull in an input dataset
    DataPath = args.input_data

    # Convert to Pandas Dataframe
    DataDF = pd.read_csv(DataPath)

    # Score Data
    scoredDataResults = pd.Series(model.predict(DataDF))

    # Join Results to original data
    scoredData = DataDF
    scoredData['<my-prediction-column>'] = scoredDataResults

    #Save Results
    scoredFileName = "<my-scored-output-file-name>"
    scoredPath = os.path.join(args.scoring_directory, scoredFileName)
    scoredData.to_csv(scoredPath, index = False)
    
if __name__ == '__main__':
    main()

### Next, write an environment that supports the scoring script.  
Examine the packages that you used in the scoring script and do a pip freeze to list their versions.<br>
Also examine the environment file that was initially used to create the model to obtain dependencies.

Learn more about environments here: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments

In [10]:
# Examine your environment file
environment

{
    "name": "automl-environment",
    "version": "1",
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "python": {
        "userManagedDependencies": false,
        "interpreterPath": "python",
        "condaDependenciesFile": null,
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge"
            ],
            "dependencies": [
                "python=3.6.2",
                {
                    "pip": [
                        "azureml-train-automl-runtime==1.0.85.5",
                        "inference-schema",
                        "azureml-explain-model==1.0.85",
                        "azureml-defaults==1.0.85.1"
                    ]
                },
                "numpy>=1.16.0,<=1.16.2",
                "pandas>=0.21.0,<=0.23.4",
                "scikit-learn>=0.19.0,<=0.20.3",
                "py-xgboost<=0.80",
                "fbprophet==0.5",
                

In [11]:
conda list

# packages in environment at /anaconda/envs/azureml_py36:
#
# Name                    Version                   Build  Channel
_anaconda_depends         2019.03                  py36_0  
_libgcc_mutex             0.1                        main  
absl-py                   0.9.0                     <pip>
adal                      1.2.2                     <pip>
alabaster                 0.7.12                   py36_0  
alembic                   1.3.2                     <pip>
anaconda                  custom                   py36_1  
anaconda-client           1.7.2                    py36_0  
anaconda-project          0.8.3                      py_0  
ansiwrap                  0.8.4                     <pip>
applicationinsights       0.11.9                    <pip>
asn1crypto                1.0.1                    py36_0  
astor                     0.8.1                     <pip>
astroid                   2.3.1                    py36_0  
astropy                   3.


Note: you may need to restart the kernel to use updated packages.


In [None]:
# Configure your environment by setting up a Run Configuration

# First, set your libraries here based on your scoring script and your environment file
# Pay attention to the dependencies in your environment file and the packages you use in your script
# If a library is in the conda list, set it as a conda package, otherwise, set it as a pip package
# Always use conda over pip when both are available, as conda takes care of underlying dependencies for you
cd = CondaDependencies.create(conda_packages =["<my-conda-package-1==version>","<my-conda-package-2==version>"],
                                pip_packages=["<my-pip-package-1==version>","<my-pip-package-2==version>"])

# Create a Run Configuration within a Docker Container and your environment settings by using the code below
amlcompute_run_config = RunConfiguration(conda_dependencies=cd)
amlcompute_run_config.environment.docker.enabled = True
amlcompute_run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE # Use DEFAULT_GPU_IMAGE for Deep Learning Jobs
amlcompute_run_config.environment.python.user_managed_dependencies = False # Set to False in Most Cases

### Specify Output Directories and Input Data
These are simply places in your datastore from where you read and write data.  We use a Data Reference object.

Learn more here:  https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.data_reference.datareference?view=azure-ml-py

In [None]:
# Create an input directory or file path

input_data = DataReference(
    datastore = datastore,
    data_reference_name="<my-data-reference-input-name>",
    path_on_datastore="<my-input-file-path>"
)

# Create an output directory or file path

scoring_directory = DataReference(
    datastore = datastore,
    data_reference_name="<my-data-reference-output-name>",
    path_on_datastore="<my-output-file-path>"
)

### Combine everything to create your own Pipeline Configuation using a Custom Python Script
A Pipeline built using a python script requires a name, a scoring script name, a local path which holds the scoring script, arguments to pass into the scoring script (like your model name, input directory and output directory), a compute target, a list of inputs and a run configuration specifying your environment.<br>  In this case, your inputs are your input and output directories.<br><br>
The output configuration is only to pass on data in a multi-step pipeline and will be left blank for this step.<br>

For more about Python Script Step configuration settings, click the link below:<br>
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py

In [None]:
batch_score_step = PythonScriptStep(
    name="<my-pipeline-step=name>",
    source_directory = "inference",
    script_name = "<my-scoring-script>",
    arguments= ["--model-name", model_name,
                "--scoring-directory", scoring_directory,
                "--input-data", input_data
               ],
    compute_target=cpu_compute_target,
    inputs=[input_data, scoring_directory],
    #outputs=[output_dir],  # Only necessary if there's another step in the pipeline
    runconfig=amlcompute_run_config
)

### Run your Pipeline

In [None]:
# Create your pipeline
pipeline = Pipeline(workspace=ws, steps=[batch_score_step])
pipeline_run = Experiment(ws, '<my-pipeline-name>').submit(pipeline,pipeline_parameters={}, show_output=True)

In [None]:
# GUI to see your Pipeline Run
from azureml.widgets import RunDetails
RunDetails(pipeline_run).show() 

In [None]:
# Console logs for your Pipeline
pipeline_run.wait_for_completion(show_output=True)

### Publish your pipeline
Once your pipeline is running, pubish it for later reuse.  When you publish a pipeline, you create a REST endpoint that you can use in Azure Data Factory.

In [None]:
# Match the name to your pipeline experiment
published_pipeline = pipeline_run.publish_pipeline(
    name="<my-pipeline-name>", description="<my-pipeline-description>", version="1.0")

published_pipeline