# Flight Delay Demo - MLOps

## Configure Datasheets

Define helper functions to enable model data sheets.

In [None]:
from markdown import markdown

def get_tag(tagname):
    text = ''
    try:
        text = tags[tagname]
    except:
        print('Missing tag ' + tagname)
    finally:
        return text

def get_datasheet(tags):
    title = get_tag('title')
    description = get_tag('datasheet_description')
    details = get_tag('details')
    date = get_tag('date')
    modeltype = get_tag('type')
    version = get_tag('version')
    helpresources = get_tag('help')
    usecase_primary = get_tag('usecase_primary')
    usecase_secondary = get_tag('usecase_secondary')
    usecase_outofscope = get_tag('usecase_outofscope')
    dataset_description = get_tag('dataset_description')
    motivation = get_tag('motivation')
    caveats = get_tag('caveats')

    datasheet = ''
    datasheet+=markdown(f'# {title} \n {description} \n')
    datasheet+=markdown(f'## Model Details \n {details} \n')
    datasheet+=markdown(f'### Model date \n {date} \n')
    datasheet+=markdown(f'### Model type \n {modeltype} \n')
    datasheet+=markdown(f'### Model version \n {version} \n')
    datasheet+=markdown(f'### Where to send questions or comments about the model \n Please send questions or concerns using [{helpresources}]({helpresources}) \n')
    datasheet+=markdown('## Intended Uses:\n')
    datasheet+=markdown(f'### Primary use case \n {usecase_primary} \n')
    datasheet+=markdown(f'### Secondary use case \n {usecase_secondary} \n')
    datasheet+=markdown(f'### Out of scope \n {usecase_outofscope} \n')
    datasheet+=markdown('## Evaluation Data:\n')
    datasheet+=markdown(f'### Datasets \n {dataset_description} \n')
    datasheet+=markdown(f'### Motivation \n {motivation} \n')
    datasheet+=markdown(f'### Caveats \n {caveats} \n')

    return datasheet

In [None]:
import warnings
warnings.filterwarnings("ignore")

import logging
logging.basicConfig(level = logging.ERROR)

## Connect to Workspace

In the next cell, we create a new Workspace config object using the `<subscription_id>`, `<resource_group_name>`, and `<workspace_name>`. This will fetch the matching Workspace and prompt you for authentication. Please click on the link and input the provided details.

For more information on **Workspace**, please visit: [Microsoft Workspace Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py)

`<subscription_id>` = You can get this ID from the landing page of your Resource Group.

`<resource_group_name>` = This is the name of your Resource Group.

`<workspace_name>` = This is the name of your Workspace.

In [None]:
from azureml.core.workspace import Workspace

try:    
    # Get instance of the Workspace and write it to config file
    ws = Workspace(
        subscription_id = '<subscription_id>', 
        resource_group = '<resource_group>', 
        workspace_name = '<workspace_name>')

    # Writes workspace config file
    ws.write_config()
    
    print('Library configuration succeeded')
except Exception as e:
    print(e)
    print('Workspace not found')

# Data Drift

Data drift is one of the top reasons model accuracy degrades over time. For machine learning models, data drift is the change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues.

Causes of data drift include:

* Upstream process changes, such as a sensor being replaced that changes the units of measurement from inches to centimeters.
* Data quality issues, such as a broken sensor always reading 0.
* Natural drift in the data, such as mean temperature changing with the seasons.
* Change in relation between features, or covariate shift.

## Load Dataset

First step is to get our data using Dataset, the function `Dataset.get_by_name()` returns a registered Dataset from a given `workspace` and its registration `name`.

`workspace` = The existing AzureML workspace in which the Dataset was registered..

`name` = The registration name.

`dataframe.take() ` = Function returns the elements in the given positional indices along an axis. 

For more information on **Dataset**, please visit: [Microsoft Dataset Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-by-name-workspace--name--version--latest--)


In [None]:
from azureml.core import Dataset, Datastore

tabular = Dataset.get_by_name(ws, 'flightdelayweather_ds')

data = tabular.to_pandas_dataframe()
tabular.take(3).to_pandas_dataframe()

## Create AML Compute Cluster

Firstly, check for the existence of the cluster. If it already exists, we are able to reuse it. Checking for the existence of the cluster can be performed by calling the constructor `ComputeTarget()` with the current workspace and name of the cluster.

In case the cluster does not exist, the next step will be to provide a configuration for the new AML cluster by calling the function `AmlCompute.provisioning_configuration()`. It takes as parameters the VM size and the max number of nodes that the cluster can scale up to. After the configuration has executed, `ComputeTarget.create()` should be called with the previously configuration object and the workspace object.

For more information on **ComputeTarget**, please visit: [Microsoft ComputeTarget Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.computetarget?view=azure-ml-py)

For more information on **AmlCompute**, please visit: [Microsoft AmlCompute Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.akscompute?view=azure-ml-py)


**Note:** Please wait for the execution of the cell to finish before moving forward.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

### Create AML CPU Compute Cluster

try:
    compute_target = ComputeTarget(workspace=ws, name='cpucluster')
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_v2',
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, 'cpucluster', compute_config)

    compute_target.wait_for_completion(show_output=True)

## Create baseline for Data Drift Monitor

Specify a baseline dataset - usually the training dataset for a model. A target dataset - usually model input data - is compared over time to your baseline dataset.

The `from_delimited_files` creates a TabularDataset to represent tabular data in delimited files (e.g. CSV and TSV).

The `with_timestamp_columns` defines timestamp columns for the dataset.

In [None]:
import pandas as pd

data_drift = tabular.to_pandas_dataframe()
data_drift.dropna()
data_drift['Date'] = pd.to_datetime(dict(year=2008, month=data_drift.Month, day=data_drift.DayofMonth), errors='coerce')
data_drift = data_drift[data_drift['Date'].notna()]
file_name = 'flight_delay_ds_wDate.csv'
data_drift.to_csv(file_name, index=False)
data_store = Datastore.get_default(ws)
data_store.upload_files(['./' + file_name], overwrite=True)
datastore_path = [(data_store, file_name)]

drift_tabular = Dataset.Tabular.from_delimited_files(datastore_path)

# assign the timestamp attribute to a real or virtual column in the dataset
drift_tabular = drift_tabular.with_timestamp_columns('Date')

drift_tabular = drift_tabular.register(workspace=ws,
                           name='target',
                           create_new_version=True)

drift_tabular.take(3).to_pandas_dataframe()

## Create Data Drift Monitor

The DataDriftDetector class enables you to configure a data monitor object which then can be run as a job to analyze data drift. Data drift jobs can be run interactively or enabled to run on a schedule. 

The `get_by_name` retrieves a unique DataDriftDetector object for a given workspace and name.

The `create_from_datasets` creates a new DataDriftDetector object from a baseline tabular dataset and a target time series dataset.

For more information on **DataDriftDetector Class**, please visit: [Microsoft DataDriftDetector Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-datadrift/azureml.datadrift.datadriftdetector.datadriftdetector?view=azure-ml-py)

In [None]:
from azureml.datadrift import DataDriftDetector
from datetime import datetime

target = Dataset.get_by_name(ws, 'target')

# set the baseline dataset
baseline = target.time_before(datetime(2008, 4, 1))

try:
    # get data drift detector by name
    monitor = DataDriftDetector.get_by_name(ws, 'fd-drift-monitor')
except:
    # set up data drift detector
    monitor = DataDriftDetector.create_from_datasets(ws, 'fd-drift-monitor', baseline, target, 
                                                          compute_target=compute_target, 
                                                          frequency='Week', 
                                                          feature_list=None, 
                                                          drift_threshold=0.6, 
                                                          latency=24)



columns  = list(baseline.take(1).to_pandas_dataframe())
exclude  = ['Month', 'DayofMonth', 'DayofWeek','Origin_dayl', 'Dest_dayl', 'Origin_srad', 'Dest_srad', 'Origin_swe', 'Dest_swe', 'Origin_tmax', 'Dest_tmax', 'Origin_tmin', 'Dest_tmin', 'Origin_vp', 'Dest_vp', '__index_level_0__']
features = [col for col in columns if col not in exclude]

# update data drift detector
monitor = monitor.update(feature_list=features)

backfill = monitor.backfill(datetime(2008, 4, 1), datetime(2008, 6, 1))

backfill.wait_for_completion(show_output=False, wait_post_processing=True)

## Analyze historical data and backfill

See how the dataset differs from the target dataset in the specified time period. The closer to 100%, the more the two datasets differ.

In [None]:
# get results from Python SDK (wait for backfills or monitor runs to finish)
results, metrics = monitor.get_output(start_time=datetime(year=2008, month=4, day=1))
# plot the results from Python SDK 
monitor.show(datetime(2008, 4, 1), datetime(2008, 6, 1))

# Train & Register

## Fetch latest model

Let's fetch the latest run for our experiment.

In [None]:
from azureml.core.experiment import Experiment
from azureml.train.automl.run import AutoMLRun

experiment = Experiment(ws, 'flight-delay-exp')
run = AutoMLRun(experiment, run_id=next(x for x in experiment.get_runs() if x.id.startswith('AutoML')).id)

In [None]:
best_run, fitted_model = run.get_output()
print(best_run)

## Register Model
Next, register the model obtained from the best run. In order to register the model, the function `register_model()` should be called. This will take care of registering the model obtained from the best run.

In [None]:
# register the model for deployment
model = best_run.register_model(model_name='flight_delay_weather', 
                                model_path='outputs/model.pkl',
                                datasets=[(Dataset.Scenario.TRAINING, tabular)],
                                description='This model was developed by Microsoft to showcase the capabilities of Azure ML.',
                       tags={'title': 'Flight Delay Model',
    'datasheet_description':
"""
Last updated: October 2020

Based on dataset from by [Statistical Computing Statistical Graphics](http://stat-computing.org/dataexpo/2009/the-data.html)

""",
    'details': 'This model was developed for Microsoft.',
    'date': 'October 2020, trained on data that cuts off at the end of 2008.', 
    'type': 'Classification model',
    'version': '1.0',
    'help': 'https://www.azure.com/',
    'usecase_primary': 
"""
Developed for Flight Delay Demo.

""",
    'usecase_secondary':
"""
Field demos and marketing.

""",
    'usecase_outofscope':
"""
Do not use for production environments.

""",
    'dataset_description':
"""
The data comes originally from RITA where it is described in detail. You can download the data there, or from the bzipped csv files listed below. These files have derivable variables removed, are packaged in yearly chunks and have been more heavily compressed than the originals.

""",
    'motivation': 'Demo the main features behind the Azure ML Workspace environment',
    'caveats':
"""
"""})

print("Model name: " + model.name, "Model version: " + str(model.version), sep="\n")

# Traceability
## Git hash 
We can fecth and display the associated git branch, commit and repository associated with the model.

In [None]:
import json
import pandas as pd
from io import StringIO

# Run details capture configuration and exact Git commit used for the run
remote_run_df = pd.read_json(StringIO('[' + json.dumps(run.get_details()['properties']) + ']'), orient='columns')
remote_run_df[['azureml.git.branch','azureml.git.commit','azureml.git.repository_uri']].T

## Trace back to model run

Run instance associated with the registered model.

In [None]:
# Trace back to the experiment
model.run

## Trace back to model dataset

Dataset instance associated with the registered model.

In [None]:
pd.DataFrame(
    {'Dataset Id': model.datasets['training'][0].id,
     'Name': model.datasets['training'][0].name }, index=[0])

## Model Datasheet

Datasheet associated with the registered model.

In [None]:
from IPython.core.display import display,Markdown

tags = model.tags
display(Markdown(get_datasheet(tags)))

# Cross-platform

## Get ONNX model

Open Neural Network Exchange (ONNX) can help optimize the inference of your machine learning model. Inference, or model scoring, is the phase where the deployed model is used for prediction, most commonly on production data.

Microsoft and a community of partners created ONNX as an open standard for representing machine learning models. Models from many frameworks including TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet, MATLAB, and SparkML can be exported or converted to the standard ONNX format. Once the models are in the ONNX format, they can be run on a variety of platforms and devices.

With Azure Machine Learning, you can use automated ML to build a Python model and have it converted to the ONNX format. Once the models are in the ONNX format, they can be run on a variety of platforms and devices. 

In [None]:
best_run, onnx_mdl = run.get_output(return_onnx_model=True)

## Save ONNX model

Save ONNX Model to a local directory to be used for both cloud and edge and works on Linux, Windows, and Mac. Written in C++, it also has C, Python, C#, Java, and Javascript (Node.js) APIs for usage in a variety of environments.

In [None]:
from azureml.automl.runtime.onnx_convert import OnnxConverter

onnx_fl_path = "./best_model.onnx"
OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)

## OSS Model Export

The `Model.get_model_path()` function returns the path to model.

For more information on **Model Class**, please visit: [Microsoft Model Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py)

In [None]:
from azureml.core.model import Model
import joblib

oss_model_path = Model.get_model_path(model_name = 'flight_delay_weather', _workspace=ws)

# deserialize the model file back into a sklearn model
oss_model = joblib.load(oss_model_path)

## OSS Model Prediction

Let's try to perform a quick prediction over our OSS model.

In [None]:
from sklearn.preprocessing import LabelEncoder

training_data, validation_data = tabular.random_split(percentage=0.9, seed=1)
le = LabelEncoder()
Y = le.fit_transform(data['ArrDelay15'].values)
val = validation_data.to_pandas_dataframe()
val = val.drop(columns=['ArrDelay15'])
X_train = data.drop(columns=['ArrDelay15'])
X_test = data.drop(columns=['ArrDelay15'])
Y_train = Y


oss_model.predict(X_test.head(1))

## Register OSS Model

Our next step would be to register the OSS model from our local directory.

The `Model.register()` function registers a model with the provided workspace.

In [None]:
oss_path = "./oss_model/flight_delay_weather_.pkl"
# register the model for deployment
oss_model = Model.register(workspace = ws,
                            model_name='flight_delay_weather_', 
                            model_path=oss_path)

print("Model name: " + oss_model.name, "Model version: " + str(oss_model.version), sep="\n")

# Interpretability

Enabling the capability of explaining a machine learning model is important during two main phases of model development:

* During the training phase, as model designers and evaluators can use interpretability output of a model to verify hypotheses and build trust with stakeholders. They also use the insights into the model for debugging, validating model behavior matches their objectives, and to check for model unfairness or insignificant features.

* During the inferencing phase, as having transparency around deployed models empowers executives to understand "when deployed" how the model is working and how its decisions are treating and impacting people in real life.

## Interpretability during inference 

Use `automl_setup_model_explanations` to get the engineered explanations. The `fitted_model` can generate the following items:

* Featured data from trained or test samples
* Engineered feature name lists
* Findable classes in your labeled column in classification scenarios


For more information on **Interpretability**, please visit: [Microsoft Interpretability Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl)

In [None]:
from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations

model_2_path = Model.get_model_path(model_name = 'flight_delay_weather', _workspace=ws)
model_2 = joblib.load(model_2_path)
automl_explainer_setup_obj = automl_setup_model_explanations(model_2, X=X_train, 
                                                             X_test=val, y=Y_train, 
                                                             task='classification')

## Instantiate MimicWrapper and explain


To generate an explanation for AutoML models, use the MimicWrapper class. You can initialize the MimicWrapper with these parameters:

* The explainer setup object
* Your workspace
* A surrogate model to explain the fitted_model automated ML model

The MimicWrapper also takes the automl_run object where the engineered explanations will be uploaded.

For more information on **MimicWrapper Class**, please visit: [Microsoft MimicWrapper Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-interpret/azureml.interpret.mimicwrapper?view=azure-ml-py)

In [None]:
from azureml.interpret import MimicWrapper

# Initialize the Mimic Explainer
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator,
                         explainable_model=automl_explainer_setup_obj.surrogate_model, 
                         init_dataset=automl_explainer_setup_obj.X_transform, run=best_run,
                         features=automl_explainer_setup_obj.engineered_feature_names, 
                         feature_maps=[automl_explainer_setup_obj.feature_map],
                         classes=automl_explainer_setup_obj.classes,
                         explainer_kwargs=automl_explainer_setup_obj.surrogate_model_params)

engineered_explanations = explainer.explain(['local', 'global'], eval_dataset=automl_explainer_setup_obj.X_test_transform)

## Register scoring_explainer as model

Use the `TreeScoringExplainer` to create the scoring explainer that'll compute the engineered feature importance values at inference time. You initialize the scoring explainer with the `feature_map` that was computed previously.

For more information on **TreeScoringExplainer Class**, please visit: [Microsoft TreeScoringExplainer Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-interpret/azureml.interpret.scoring.scoring_explainer.treescoringexplainer?view=azure-ml-py)

In [None]:
from azureml.interpret.scoring.scoring_explainer import TreeScoringExplainer, save

# Initialize the ScoringExplainer
scoring_explainer = TreeScoringExplainer(explainer.explainer, feature_maps=[automl_explainer_setup_obj.feature_map])

# Pickle scoring explainer locally
save(scoring_explainer, exist_ok=True)

# Register scoring explainer
try:
    best_run.upload_file('scoring_explainer.pkl', 'scoring_explainer.pkl')
except:
    print('scoring_explainer.pkl already uploaded.')
scoring_explainer_model = best_run.register_model(model_name='scoring_explainer', model_path='scoring_explainer.pkl')
print("Model name: " + scoring_explainer_model.name, "Model version: " + str(scoring_explainer_model.version), sep="\n")

# Technical performance tracking

## Create Scoring File

Creating the scoring file is next step before deploying the service. This file is responsible for the actual generation of predictions using the model. The values or scores generated can represent predictions of future values, but they might also represent a likely category or outcome.

The first thing to do in the scoring file is to fetch the model. This is done by calling `Model.get_model_path()` and passing the model name as a parameter.

After the model has been loaded, the function `model.predict()` function should be called to start the scoring process.

For more information on **Machine Learning - Score**, please visit: [Microsoft Machine Learning - Score Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/machine-learning-score)


In [None]:
%%writefile score.py
import os
import json
import joblib
import time
import numpy as np
import pandas as pd
import azureml.automl.core
from azureml.core.model import Model
from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from azureml.monitoring import ModelDataCollector
 
input_sample = np.array([[3,30,7,820,930,'MQ',70,'DFW','LIT',304,32.89595056,-97.0372,'TX',34.72939611,-92.22424556,'AR',44236.8,44236.8,0.0,11.0,409.6,208.0,0.0,0.0,28.5,16.0,15.0,8.5,1720.0,1120.0]])
output_sample = np.array([1])
model = None
scoring_explainer = None
 
def init():
    global model, inputs_dc, prediction_dc, scoring_explainer
    print ("model initialized" + time.strftime("%H:%M:%S"))

    try:
        scoring_explainer_path = Model.get_model_path("scoring_explainer")
        scoring_explainer = joblib.load(scoring_explainer_path)
        model_path = Model.get_model_path("flight_delay_weather")
    except:
        model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), 'model.pkl')

    model = joblib.load(model_path)
    inputs_dc = ModelDataCollector('flight_delay', designation='inputs', feature_names=['Month', 'DayofMonth', 'DayOfWeek', 'CRSDepTime', 'CRSArrTime', 'UniqueCarrier', 'CRSElapsedTime', 'Origin', 'Dest', 'Distance', 'Origin_Lat', 'Origin_Lon', 'Origin_State', 'Dest_Lat', 'Dest_Lon', 'Dest_State', 'Origin_dayl', 'Dest_dayl', 'Origin_prcp', 'Dest_prcp', 'Origin_srad', 'Dest_srad', 'Origin_swe', 'Dest_swe', 'Origin_tmax', 'Dest_tmax', 'Origin_tmin', 'Dest_tmin', 'Origin_vp', 'Dest_vp'])
    prediction_dc = ModelDataCollector('flight_delay', designation='predictions', feature_names=['ArrDelay15'])
    
@input_schema('data', NumpyParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
    try:
        df = pd.DataFrame(data, columns=['Month', 'DayofMonth', 'DayOfWeek', 'CRSDepTime', 'CRSArrTime', 'UniqueCarrier', 'CRSElapsedTime', 'Origin', 'Dest', 'Distance', 'Origin_Lat', 'Origin_Lon', 'Origin_State', 'Dest_Lat', 'Dest_Lon', 'Dest_State', 'Origin_dayl', 'Dest_dayl', 'Origin_prcp', 'Dest_prcp', 'Origin_srad', 'Dest_srad', 'Origin_swe', 'Dest_swe', 'Origin_tmax', 'Dest_tmax', 'Origin_tmin', 'Dest_tmin', 'Origin_vp', 'Dest_vp']) 
        result = model.predict(df)
        if scoring_explainer:
            automl_explainer_setup_obj = automl_setup_model_explanations(model, X_test=df, task='classification')
            local_importance_values = scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform, get_raw=True)
        else:
            local_importance_values = []
        inputs_dc.collect(data) #this call is saving our input data into Azure Blob
        prediction_dc.collect(result) #this call is saving our output data into Azure Blob
    except Exception as e:
        result = str(e)
        print(result)
        return {"error": result}
    return {"result": result.tolist(), 'local_importance_values': local_importance_values}

## Register Dataset for Model Profiling

The next cell will create a JSON File from our registered training dataset. The purpose of the file is to create a load for the service in order to achieve an adequate profiling. The profiling dataset will be registered under the name `sample_request_data`.

The Datastore Class represents a storage abstraction over an Azure Machine Learning storage account.
The `get_default` function is used to get the default datastore for the workspace.

The `dataset_type_definitions` Module contains enumeration values used with Dataset.
The `PromoteHeadersBehavior` function defines options for how column headers are processed when reading data from files to create a dataset.

For more information on **Datastore Class**, please visit: [Microsoft Machine Learning - Datastore Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py)

For more information on **dataset_type_definitions **, please visit: [Microsoft Machine Learning - dataset_type_definitions Module Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_type_definitions?view=azure-ml-py)


In [None]:
import json
from azureml.core import Datastore
from azureml.data import dataset_type_definitions

sample = val.head(10).values.tolist()
serialized_input_json = json.dumps({'data': sample})

dataset_content = []
for i in range(100):
    dataset_content.append(serialized_input_json)
dataset_content = '\n'.join(dataset_content)
file_name = 'sample_request_data.txt'
f = open(file_name, 'w')
f.write(dataset_content)
f.close()

# upload the txt file created above to the Datastore and create a dataset from it
data_store = Datastore.get_default(ws)
data_store.upload_files(['./' + file_name], target_path='sample_request_data', overwrite=True)
datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]
sample_request_data = Dataset.Tabular.from_delimited_files(
    datastore_path, separator='\n',
    infer_column_types=True,
    header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)
sample_request_data = sample_request_data.register(workspace=ws,
                                                   name='sample_request_data',
                                                   create_new_version=True)

## Profile model

Profiles the model to get resource requirement recommendations.

Profiling will determine the CPU and memory the deployed service will need. Profiling tests the service that runs your model and returns information such as the CPU usage, memory usage, and response latency. It also provides a recommendation for the CPU and memory based on resource usage.


In [None]:
from azureml.core.model import InferenceConfig
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AksWebservice
from azureml.core.profile import ModelProfile
from azureml.core.model import Model
from azureml.core import Environment
import time

ts = str(int(time.time()))
model_profile_name = "fd_model_" + ts
myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn==0.22.1'],pip_packages=['azureml-sdk[notebooks,automl]','azureml-defaults','inference-schema','azureml-monitoring','shap==0.39.0'])

with open("score-new.yml","w") as f:
    f.write(myenv.serialize_to_string())

myenv = Environment.from_conda_specification(name = "myenv",
                                             file_path = "score-new.yml")

# Create an inference config object based on the score.py and myenv.yml from previous steps
inference_config = InferenceConfig(entry_script="score.py",
                                    environment=myenv)


input_dataset = Dataset.get_by_name(workspace=ws, name='sample_request_data')

profile = Model.profile(ws, model_profile_name, [model, scoring_explainer_model], inference_config, input_dataset=input_dataset)
profile.wait_for_completion(True)

profiling_details = profile.get_details()
profiling_details = pd.DataFrame.from_dict(profiling_details, orient='index', columns=['Profiling Result'])
profiling_details

# Deployment

## Deploy to Managed Endpoints

Create a new directory to hold the configuration files for deploying a managed endpoint.

In [None]:
import os

managed_endpoints = './managed-endpoints'

# Working directory
if not os.path.exists(managed_endpoints):
    os.makedirs(managed_endpoints)
    
if os.path.exists(os.path.join(managed_endpoints,".amlignore")):
  os.remove(os.path.join(managed_endpoints,".amlignore"))

## Prepare Scoring File

Creating the scoring file is next step before deploying the service. This file is responsible for the actual generation of predictions using the model. The values or scores generated can represent predictions of future values, but they might also represent a likely category or outcome.

In [None]:
!cp ./score.py $managed_endpoints

## Create the environment definition

The following file contains the details of the environment to host the model and code. 

In [None]:
%%writefile $managed_endpoints/score-new.yml
name: mlops-model-env
channels:
  - conda-forge
dependencies:
  - python=3.7
  - numpy
  - pip
  - scikit-learn==0.22.1
  - scipy
  - pip:
    - azureml-defaults
    - azureml-sdk[notebooks,automl]
    - pandas
    - inference-schema[numpy-support]
    - joblib
    - numpy
    - scipy
    - azureml-monitoring
    - shap==0.39.0

## Define the endpoint configuration
Specific inputs are required to deploy a model on an online endpoint:

1. Model files.
1. The code that's required to score the model.
1. An environment in which your model runs.
1. Settings to specify the instance type and scaling capacity.

In [None]:
%%writefile $managed_endpoints/endpointconfig.yml
name: fd-mlops-endpoint
type: online
auth_mode: key
traffic:
  blue: 100

deployments:
  #blue deployment
  - name: blue
    model: azureml:flight_delay_weather:1
    code_configuration:
      code:
        local_path: ./
      scoring_script: score.py
    environment: 
      name: fd-mlops-env
      version: 1
      path: ./
      conda_file: file:./score-new.yml
      docker:
          image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
    instance_type: Standard_DS3_v2
    scale_settings:
      scale_type: manual
      instance_count: 1
      min_instances: 1
      max_instances: 2

## Deploy your managed online endpoint to Azure

This deployment might take up to 15 minutes, depending on whether the underlying environment or image is being built for the first time. Subsequent deployments that use the same environment will finish processing more quickly.

In [None]:
!az ml endpoint create -g [your resource group name] -w [your AML workspace name] -n fd-mlops-mng-endpoint -f ./managed-endpoints/endpointconfig.yml

## Generate a sample request JSON file

In [None]:
%%writefile $managed_endpoints/sample-request.json
{"data": [
[6.0,21.0,6.0,1330.0,1600.0,9.0,150.0,16.0,93.0,745.0,33.64044444,-84.42694444,8.0,40.69249722,-74.16866056,29.0,51148.8,53568.0,2.0,0.0,438.4,451.2,0.0,0.0,30.5,28.5,18.0,15.0,2040.0,1720.0],
[4.0,2.0,3.0,1910.0,2035.0,11.0,85.0,222.0,62.0,361.0,35.87763889,-78.78747222,25.0,39.99798528,-82.89188278,33.0,44928.0,45273.6,0.0,0.0,355.2,438.4,0.0,0.0,23.0,12.5,12.0,1.5,1400.0,680.0],
[1.0,3.0,4.0,935.0,1224.0,16.0,229.0,207.0,78.0,1302.0,39.87195278,-75.24114083,36.0,32.89595056,-97.0372,41.0,33177.6,35596.8,0.0,0.0,156.8,252.8,0.0,0.0,-2.0,6.5,-8.0,-4.5,320.0,440.0],
[4.0,3.0,4.0,1000.0,1252.0,16.0,172.0,207.0,206.0,951.0,39.87195278,-75.24114083,36.0,26.68316194,-80.09559417,7.0,45273.6,44582.4,0.0,4.0,425.6,220.8,0.0,0.0,12.0,28.0,0.5,22.5,640.0,2720.0],
[1.0,21.0,1.0,800.0,1045.0,15.0,105.0,198.0,129.0,589.0,41.979595,-87.90446417,12.0,38.94453194,-77.45580972,43.0,33868.8,34905.6,2.0,0.0,256.0,246.4,56.0,0.0,-7.0,-3.0,-17.0,-13.5,160.0,200.0],
[3.0,12.0,3.0,1640.0,1952.0,5.0,192.0,89.0,101.0,1065.0,40.69249722,-74.16866056,29.0,26.07258333,-80.15275,7.0,41817.6,42508.8,0.0,0.0,336.0,368.0,0.0,0.0,10.0,27.0,0.5,19.5,640.0,2280.0],
[3.0,19.0,3.0,1229.0,1346.0,6.0,77.0,151.0,76.0,214.0,40.77724306,-73.87260917,32.0,38.85208333,-77.03772222,43.0,42854.4,42854.4,22.0,0.0,204.8,307.2,0.0,0.0,10.0,15.0,4.0,6.5,800.0,960.0],
[4.0,18.0,5.0,1210.0,1503.0,4.0,173.0,139.0,169.0,944.0,40.63975111,-73.77892556,32.0,28.42888889,-81.31602778,7.0,47692.8,45964.8,0.0,0.0,524.8,508.8,0.0,0.0,22.5,26.0,5.5,11.5,920.0,1360.0],
[11.0,1.0,6.0,615.0,745.0,9.0,90.0,130.0,18.0,432.0,39.71732917,-86.29438417,13.0,33.64044444,-84.42694444,8.0,36633.6,38016.0,0.0,0.0,297.6,387.2,0.0,0.0,22.0,20.5,5.0,2.0,880.0,720.0],
[11.0,24.0,1.0,936.0,1123.0,8.0,107.0,208.0,77.0,602.0,33.43416667,-112.00805559999999,3.0,39.85840806,-104.6670019,5.0,35942.4,34214.4,0.0,0.0,297.6,291.2,0.0,0.0,27.5,17.0,10.0,-9.5,520.0,280.0]]}

## Invoke the endpoint to score data by using your model

You can use either the invoke command or a REST client of your choice to invoke the endpoint and score against it.

In [None]:
!az ml endpoint invoke -g [your resource group name] -w [your AML workspace name] -n fd-mlops-mng-endpoint --request-file ./managed-endpoints/sample-request.json

## Create/connect to the Kubernetes compute cluster

The `AksCompute Class` manages an Azure Kubernetes Service compute target in Azure Machine Learning.

The `ComputeTarget Class` is an abstract parent class for all compute targets managed by Azure Machine Learning. A compute target is a designated compute resource/environment where you run your training script or host your service deployment. 

The `ComputeTargetException` is an exception related to failures when creating, interacting with, or configuring a compute target. This exception is commonly raised for failures attaching a compute target, missing headers, and unsupported configuration values.

For more information on **AksCompute Class**, please visit: [Microsoft Machine Learning - AksCompute Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.aks.akscompute?view=azure-ml-py)

For more information on **ComputeTarget Class**, please visit: [Microsoft Machine Learning - ComputeTarget Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.computetarget?view=azure-ml-py)

For more information on **ComputeTargetException Class**, please visit: [Microsoft Machine Learning - ComputeTargetException Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.exceptions.computetargetexception?view=azure-ml-py)


In [None]:
from azureml.core.compute import AksCompute
from azureml.core.compute import ComputeTarget
from azureml.exceptions import ComputeTargetException

prov_config = AksCompute.provisioning_configuration(location='westus2')

try:
    aks_target = AksCompute(ws, 'flight-delay-aks')
except ComputeTargetException:
    # Create the cluster
    aks_target = ComputeTarget.create(workspace = ws, 
                            name = 'flight-delay-aks', 
                            provisioning_configuration = prov_config)
    aks_target.wait_for_completion(True)

## Deploy the model to Kubernetes

The first step is to define the dependencies that are needed for the service to run and they are defined by calling `CondaDependencies.create()`. This create function will receive as parameters the pip and conda packages to install on the remote machine. Secondly, the output of this function is persisted into a `.yml` file that will be leveraged later on the process.

Now that the AKS cluster has been deployed and our CondaDependencies have been declared, it’s time to create an `InferenceConfig` object by calling its constructor and passing the runtime type, the path to the `entry_script` (score.py), and the `conda_file` (the previously created file that holds the environment dependencies).

Next, define the configuration of the web service to deploy. This is done by calling `AksWebservice.deploy_configuration()` and passing along the number of `cpu_cores` and `memory_gb` that the service needs.

Finally, in order to deploy the model and service to the created AKS cluster, the function `Model.deploy()` should be called, passing along the workspace object, a list of models to deploy, the defined inference configuration, deployment configuration, and the AKS object created in the step above.

For more information on **CondaDependencies**, please visit: [Microsoft CondaDependencies Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py)

For more information on **InferenceConfig**, please visit: [Microsoft InferenceConfig Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py)

For more information on **AksWebService**, please visit: [Microsoft AksWebService Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.akswebservice?view=azure-ml-py)

For more information on **Model**, please visit: [Microsoft Model Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py)


**Note:** Please wait for the execution of the cell to finish before moving forward.

In [None]:
from azureml.exceptions import WebserviceException

inference_config = InferenceConfig(runtime= "python",
                                    entry_script="score.py",
                                    conda_file="score-new.yml")

deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, 
                                                        memory_gb = 1,
                                                        collect_model_data=True, 
                                                        enable_app_insights=True)

try:
    service = AksWebservice(ws, 'flight-delay-aml')
    print(service.state)
except WebserviceException:
    service = Model.deploy(ws, 
                            'flight-delay-aml', 
                            [model, scoring_explainer_model], 
                            inference_config, 
                            deployment_config, 
                            aks_target)

    service.wait_for_deployment(show_output = True)
    print(service.state)

## Connect to deployed webservice

Now with test data, we can get it into a suitable format to consume the web service. First an instance of the web service should be obtained by calling the constructor `Webservice()` with the Workspace object and the service name as parameters. Sanitizing of the data is then performed in order to avoid sending unexpected columns to the web service. Finally, call the service via POST using the `requests` module. `requests.post()` will call the deployed web service. It takes for parameters the service URL, the test data, and a headers dictionary that contains the authentication token.

For more information on **Webservice**, please visit: [Microsoft Webservice Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice?view=azure-ml-py)

In [None]:
import json
import requests
import pandas as pd
from azureml.core.webservice import Webservice

aks_service = Webservice(ws, 'flight-delay-aml')

# prepare the test data
sample = val.sample(n=10, random_state=4).values.tolist()

headers = {'Content-Type':'application/json'}

if aks_service.auth_enabled:
    headers['Authorization'] = 'Bearer '+ aks_service.get_keys()[0]

output_df = []
for x in sample:    
    test_sample = json.dumps({'data': [x]})
    response = requests.post(aks_service.scoring_uri, data=test_sample, headers=headers)
    prediction = [response.json()['result'][0]]
    prediction.extend(x)
    output_df.append(prediction)

## Present scoring service predictions

Let's format our service responses and present them in a suitable way to our end users.

In [None]:
def highlight_delays(val):
    return 'background-color: yellow' if val == True else ''

predictions = pd.DataFrame(output_df, columns =['Prediction', 'Month', 'DayofMonth', 'DayOfWeek', 'CRSDepTime', 'CRSArrTime', 'UniqueCarrier', 'CRSElapsedTime', 'Origin', 'Dest', 'Distance', 'Origin_Lat', 'Origin_Lon', 'Origin_State', 'Dest_Lat', 'Dest_Lon', 'Dest_State', 'Origin_dayl', 'Dest_dayl', 'Origin_prcp', 'Dest_prcp', 'Origin_srad', 'Dest_srad', 'Origin_swe', 'Dest_swe', 'Origin_tmax', 'Dest_tmax', 'Origin_tmin', 'Dest_tmin', 'Origin_vp', 'Dest_vp'])
predictions = predictions.style.applymap(highlight_delays, subset=['Prediction'])
predictions

## Interpretability at inference time

Let's observe the top important features that lead our results.

In [None]:
inference_interpretability = pd.DataFrame(response.json()['local_importance_values'], columns =['Month', 'DayofMonth', 'DayOfWeek', 'CRSDepTime', 'CRSArrTime', 'UniqueCarrier', 'CRSElapsedTime', 'Origin', 'Dest', 'Distance', 'Origin_Lat', 'Origin_Lon', 'Origin_State', 'Dest_Lat', 'Dest_Lon', 'Dest_State', 'Origin_dayl', 'Dest_dayl', 'Origin_prcp', 'Dest_prcp', 'Origin_srad', 'Dest_srad', 'Origin_swe', 'Dest_swe', 'Origin_tmax', 'Dest_tmax', 'Origin_tmin', 'Dest_tmin', 'Origin_vp', 'Dest_vp']).T
inference_interpretability.columns = ['Importance']
inference_interpretability = inference_interpretability.sort_values(by=['Importance'], ascending=False)
inference_interpretability.head(8)

## Docker Package

Create a model package in the form of a Docker image or Dockerfile build context. When deploying to Azure ML inferencing compute, this is done transparently; however, for other hosting targets, the Docker image can be built and exported automatically.

In [None]:
from azureml.core.model import Model

docker_package = Model.package(ws, 
                                [model, scoring_explainer_model], 
                                inference_config)

docker_package.wait_for_creation(show_output=True)

# Security
## Homomorphic Encryption

The encryption method used in this sample is homomorphic encryption. Homomorphic encryption allows for computations to be done on encrypted data without requiring access to a secret (decryption) key. The results of the computations are encrypted and can be revealed only by the owner of the secret key. 

The `BlobServiceClient` Class is a client to interact with the Blob Service at the account level.
This client provides operations to retrieve and configure the account properties as well as list, create and delete containers within the account. We pass a connection string to the `from_connection_string` function to create the BlobServiceClient.


For more information on **BlobServiceClient**, please visit: [Microsoft BlobServiceClient Documentation](https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobserviceclient?view=azure-python)

In [None]:
%%writefile hom_score.py
import json
import time
import pandas as pd
import azureml.train.automl
from azureml.core.model import Model
import joblib
from azure.storage.blob import BlobServiceClient
from encrypted.inference.eiserver import EIServer

def init():
    global model
    print ("model initialized" + time.strftime("%H:%M:%S"))
    
    # this name is model.id of model that we want to deploy
    model_path = Model.get_model_path(model_name = 'flight_delay_weather_')
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)
    
    global server
    server = EIServer(model.coef_, model.intercept_, verbose=True)

def run(raw_data):

    json_properties = json.loads(raw_data)

    key_id = json_properties['key_id']
    conn_str = json_properties['conn_str']
    container = json_properties['container']
    data = json_properties['data']

    # download the Galois keys from blob storage 
    blob_service_client = BlobServiceClient.from_connection_string(conn_str=conn_str)
    blob_client = blob_service_client.get_blob_client(container=container, blob=key_id)
    public_keys = blob_client.download_blob().readall()
    
    result = {}
    # make prediction
    result = server.predict(data, public_keys)

    # you can return any data type as long as it is JSON-serializable
    return result

## Homomorphic Encryption Dependencies

Create an environment for inferencing and add encrypted-inference package as a conda dependency.

In [None]:
from azureml.core.model import InferenceConfig, Model
from azureml.core.dataset import Dataset
from azureml.core.conda_dependencies import CondaDependencies

azureml_pip_packages = ['azureml-defaults', 'azureml-contrib-interpret', 'azureml-core', 'azureml-telemetry',
                        'azureml-train-automl', 'azureml-interpret', 'azureml-dataprep','azureml-dataprep[fuse,pandas]','joblib',
                        'matplotlib','scikit-learn==0.22.1','seaborn','fairlearn','encrypted-inference==0.9','azure-storage-blob']

# Define dependencies needed in the remote environment
hom_myenv = CondaDependencies.create(pip_packages=azureml_pip_packages)

# Write dependencies to yml file
with open("hom_myenv.yml","w") as f:
    f.write(hom_myenv.serialize_to_string())

# Create an inference config object based on the score.py and myenv.yml from previous steps
inference_config = InferenceConfig(runtime= "python",
                                    entry_script="hom_score.py",
                                    conda_file="hom_myenv.yml")

## Deploy Homomorphic Encryption service to AKS

The process of deploying the HE-enabled service to AKS is no different to other services.

In [None]:
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, 
                                                        memory_gb = 1,
                                                        enable_app_insights=True)

model = Model(ws, name='flight_delay_weather_')

try:
    service = AksWebservice(ws, 'flight-delay-aml-hom')
    print(service.state)
except WebserviceException:
    service = Model.deploy(ws, 
                        'flight-delay-aml-hom', 
                        [model], 
                        inference_config, 
                        deployment_config, 
                        aks_target)

    service.wait_for_deployment(show_output = True)
    print(service.state)

## Create public and private keys

In order to work with Homomorphic Encryption we need to generate our private and public keys to satisfy the encryption process.

In [None]:
import os
import azureml.core
from azureml.core import Workspace, Datastore
from encrypted.inference.eiclient import EILinearRegressionClient

# Create a new Encrypted inference client and a new secret key.
edp = EILinearRegressionClient(verbose=True)

public_keys_blob, public_keys_data = edp.get_public_keys()

datastore = ws.get_default_datastore()
container_name = datastore.container_name

# Create a local file and write the keys to it
public_keys = open(public_keys_blob, "wb")
public_keys.write(public_keys_data)
public_keys.close()

# Upload the file to blob store
datastore.upload_files([public_keys_blob])

# Delete the local file
os.remove(public_keys_blob)

## Inspect raw data

Let's observe how our raw data looks before we encrypt it.

In [None]:
X_test_hom = Dataset.get_by_name(ws, 'flightdelayweather_ds_clean')
X_test_hom = X_test_hom.to_pandas_dataframe().drop(columns=['ArrDelay15'])

sample_index = 0
X_test_hom.iloc[sample_index].to_frame()

## Inspect encrypted data

Let's observe how our encrypted data looks like.

In [None]:
sample_data = (X_test_hom.to_numpy())
raw_data = edp.encrypt(sample_data[sample_index])

## Testing the Service with Encrypted data

Now with test data, we can get it into a suitable format to consume the web service. First an instance of the web service should be obtained by calling the constructor `Webservice()` with the Workspace object and the service name as parameters. 

For more information on **Webservice**, please visit: [Microsoft Webservice Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice?view=azure-ml-py)

In [None]:
import json
from azureml.core import Webservice

service = Webservice(ws, 'flight-delay-aml-hom')
service.update(enable_app_insights=True)

#pass the connection string for blob storage to give the server access to the uploaded public keys 
conn_str_template = 'DefaultEndpointsProtocol={};AccountName={};AccountKey={};EndpointSuffix=core.windows.net'
conn_str = conn_str_template.format(datastore.protocol, datastore.account_name, datastore.account_key)

#build the json 
data = json.dumps({"data": raw_data, "key_id" : public_keys_blob, "conn_str" : conn_str, "container" : container_name })
data = bytes(data, encoding='ASCII')

print ('Making an encrypted inference web service call ')
eresult = service.run(input_data=data)

print ('Received encrypted inference results')
print (f'Encrypted results: ...', eresult[0][0:100], '...')

## Decrypting Service Response

The below cell uses the `decrypt()` function to decrypt the response from the deployed AKS Service. 

In [None]:
import numpy as np 

results = edp.decrypt(eresult)

prediction = 'On-Time'
if results[0] > 0:
    prediction = 'Delayed'

print ( ' Prediction : ', prediction)