# Flight Delay Demo - MLOps with MLflow

## Install prerequisites

Before running the notebook, make sure the correct versions of these libraries are installed.

In [None]:
!pip install --upgrade mlflow azureml-mlflow azureml-core

## Configure Datasheets

Define helper functions to enable model data sheets.

In [None]:
from markdown import markdown

def get_tag(tagname):
    text = ''
    try:
        text = tags[tagname]
    except:
        print('Missing tag ' + tagname)
    finally:
        return text

def get_datasheet(tags):
    title = get_tag('title')
    description = get_tag('datasheet_description')
    details = get_tag('details')
    date = get_tag('date')
    modeltype = get_tag('type')
    version = get_tag('version')
    helpresources = get_tag('help')
    usecase_primary = get_tag('usecase_primary')
    usecase_secondary = get_tag('usecase_secondary')
    usecase_outofscope = get_tag('usecase_outofscope')
    dataset_description = get_tag('dataset_description')
    motivation = get_tag('motivation')
    caveats = get_tag('caveats')

    datasheet = ''
    datasheet+=markdown(f'# {title} \n {description} \n')
    datasheet+=markdown(f'## Model Details \n {details} \n')
    datasheet+=markdown(f'### Model date \n {date} \n')
    datasheet+=markdown(f'### Model type \n {modeltype} \n')
    datasheet+=markdown(f'### Model version \n {version} \n')
    datasheet+=markdown(f'### Where to send questions or comments about the model \n Please send questions or concerns using [{helpresources}]({helpresources}) \n')
    datasheet+=markdown('## Intended Uses:\n')
    datasheet+=markdown(f'### Primary use case \n {usecase_primary} \n')
    datasheet+=markdown(f'### Secondary use case \n {usecase_secondary} \n')
    datasheet+=markdown(f'### Out of scope \n {usecase_outofscope} \n')
    datasheet+=markdown('## Evaluation Data:\n')
    datasheet+=markdown(f'### Datasets \n {dataset_description} \n')
    datasheet+=markdown(f'### Motivation \n {motivation} \n')
    datasheet+=markdown(f'### Caveats \n {caveats} \n')

    return datasheet

In [None]:
import warnings
warnings.filterwarnings("ignore")

import logging
logging.basicConfig(level = logging.ERROR)

## Setup working directory

The cell below creates our working directory. This will hold our generated scripts.

In [None]:
import os

project_folder = './scripts'

# Working directory
if not os.path.exists(project_folder):
    os.makedirs(project_folder)

## Training Script

Let's write our training script to the working directory.

The `sklearn.preprocessing.LabelEncoder` encodes target labels with value between 0 and n_classes-1.

The `sklearn.model_selection.train_test_split` splits arrays or matrices into random train and test subsets

The `sklearn.metrics.accuracy_score` is an accuracy classification score. In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

The `sklearn.metrics.confusion_matrix` is compute confusion matrix to evaluate the accuracy of a classification.

The `sklearn.metrics.f1_score` computes the F1 score, also known as balanced F-score or F-measure. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.

The `sklearn.metrics.precision_score` computes the precision. The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives.

The `sklearn.metrics.recall_score` computes the recall. The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The `sklearn.metrics.roc_auc_score` computes Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

The `sklearn.metrics.roc_curve` computes Receiver operating characteristic (ROC).

The `Model Class` represents the result of machine learning training. A model is the result of a Azure Machine learning training Run or some other model training process outside of Azure. Regardless of how the model is produced, it can be registered in a workspace, where it is represented by a name and a version. 


For more information on **Model Class**, please visit: [Microsoft Model Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py)

In [None]:
%%writefile $project_folder/utils.py
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import joblib
import matplotlib.pyplot as plt
import os
import seaborn as sns
from azureml.core import Model
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score, precision_score, recall_score, roc_auc_score, roc_curve
from azureml.core import Dataset
from azureml.core import Model, Run, Workspace
import mlflow
from mlflow.utils.file_utils import TempDir

def split_dataset(X_raw, Y):
    A = X_raw[['UniqueCarrier']]
    X = X_raw.drop(labels=['UniqueCarrier'],axis = 1)
    X = pd.get_dummies(X)


    le = LabelEncoder()
    Y = le.fit_transform(Y)

    X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_raw, 
                                                        Y, 
                                                        A,
                                                        test_size = 0.2,
                                                        random_state=123,
                                                        stratify=Y)

    # Work around indexing bug
    X_train = X_train.reset_index(drop=True)
    A_train = A_train.reset_index(drop=True)
    X_test = X_test.reset_index(drop=True)
    A_test = A_test.reset_index(drop=True)

    return X_train, X_test, Y_train, Y_test, A_train, A_test 

def prepareDataset(X_raw):
    df = X_raw.to_pandas_dataframe()
    Y = df['ArrDelay15'].values
    synth_df = df.drop(columns=['ArrDelay15'])
    return synth_df, Y

def analyze_model(clf, X_test, Y_test, preds):
    ws = Workspace.from_config()

    mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
    experiment_name = 'flight_delay_with_mlflow'
    mlflow.set_experiment(experiment_name)

    with mlflow.start_run() as run:
        accuracy = accuracy_score(Y_test, preds)
        print(f'Accuracy', np.float(accuracy))
        mlflow.log_metric(f'Accuracy', np.float(accuracy))

        precision = precision_score(Y_test, preds, average="macro")
        print(f'Precision', np.float(precision))
        mlflow.log_metric(f'Precision', np.float(precision))
        
        recall = recall_score(Y_test, preds, average="macro")
        print(f'Recall', np.float(recall))
        mlflow.log_metric(f'Recall', np.float(recall))
        
        f1score = f1_score(Y_test, preds, average="macro")
        print(f'F1 Score', np.float(f1score))
        mlflow.log_metric(f'F1 Score', np.float(f1score))
        
        with TempDir() as tmp:
            local_path = tmp.path("model")
            mlflow.sklearn.save_model(clf, path=local_path)

            # Workaround a bug where scikit-learn requirement isn't correct
            data = ''
            with open(os.path.join(local_path, 'requirements.txt'), 'r') as infile:
                data = infile.read().replace('scikit-learn==0.0\n', '')
            with open(os.path.join(local_path, 'requirements.txt'), 'w') as outfile:
                outfile.write(data)
            with open(os.path.join(local_path, 'conda.yaml'), 'r') as infile:
                data = infile.read().replace('  - scikit-learn==0.0\n', '')
            with open(os.path.join(local_path, 'conda.yaml'), 'w') as outfile:
                outfile.write(data)
            # End workaround

            mlflow.log_artifact(local_path)

        class_names = clf.classes_
        fig, ax = plt.subplots()
        tick_marks = np.arange(len(class_names))
        plt.xticks(tick_marks, class_names)
        plt.yticks(tick_marks, class_names)
        sns.heatmap(pd.DataFrame(confusion_matrix(Y_test, preds)), annot=True, cmap='YlGnBu', fmt='g')
        ax.xaxis.set_label_position('top')
        plt.tight_layout()
        plt.title('Confusion Matrix', y=1.1)
        plt.ylabel('Actual label')
        plt.xlabel('Predicted label')
        plt.show()
        fig.savefig("ConfusionMatrix.png")
        mlflow.log_artifact("ConfusionMatrix.png")
        plt.close()

        preds_proba = clf.predict_proba(X_test)[::,1]
        fpr, tpr, _ = roc_curve(Y_test, preds_proba, pos_label = clf.classes_[1])
        auc = roc_auc_score(Y_test, preds_proba)
        plt.plot(fpr, tpr, label="data 1, auc=" + str(auc))
        plt.legend(loc=4)
        plt.show()
        plt.close()

## Connect to Workspace

In the next cell, we create a new Workspace config object using the `<subscription_id>`, `<resource_group_name>`, and `<workspace_name>`. This will fetch the matching Workspace and prompt you for authentication. Please click on the link and input the provided details.

For more information on **Workspace**, please visit: [Microsoft Workspace Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py)

`<subscription_id>` = You can get this ID from the landing page of your Resource Group.

`<resource_group_name>` = This is the name of your Resource Group.

`<workspace_name>` = This is the name of your Workspace.

In [None]:
from azureml.core.workspace import Workspace

try:    
    # Get instance of the Workspace and write it to config file
    ws = Workspace(
        subscription_id = '<subscription_id>', 
        resource_group = '<resource_group>', 
        workspace_name = '<workspace_name>')

    # Writes workspace config file
    ws.write_config()
    
    print('Library configuration succeeded')
except Exception as e:
    print(e)
    print('Workspace not found')

# Data Drift

Data drift is one of the top reasons model accuracy degrades over time. For machine learning models, data drift is the change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues.

Causes of data drift include:

* Upstream process changes, such as a sensor being replaced that changes the units of measurement from inches to centimeters.
* Data quality issues, such as a broken sensor always reading 0.
* Natural drift in the data, such as mean temperature changing with the seasons.
* Change in relation between features, or covariate shift.

## Load Dataset

First step is to get our data using Dataset, the function `Dataset.get_by_name()` returns a registered Dataset from a given `workspace` and its registration `name`.

`workspace` = The existing AzureML workspace in which the Dataset was registered..

`name` = The registration name.

`dataframe.take() ` = Function returns the elements in the given positional indices along an axis. 

For more information on **Dataset**, please visit: [Microsoft Dataset Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-by-name-workspace--name--version--latest--)


In [None]:
from azureml.core import Dataset, Datastore

tabular = Dataset.get_by_name(ws, 'flightdelayweather_ds')

data = tabular.to_pandas_dataframe()
tabular.take(3).to_pandas_dataframe()

## Create AML Compute Cluster

Firstly, check for the existence of the cluster. If it already exists, we are able to reuse it. Checking for the existence of the cluster can be performed by calling the constructor `ComputeTarget()` with the current workspace and name of the cluster.

In case the cluster does not exist, the next step will be to provide a configuration for the new AML cluster by calling the function `AmlCompute.provisioning_configuration()`. It takes as parameters the VM size and the max number of nodes that the cluster can scale up to. After the configuration has executed, `ComputeTarget.create()` should be called with the previously configuration object and the workspace object.

For more information on **ComputeTarget**, please visit: [Microsoft ComputeTarget Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.computetarget?view=azure-ml-py)

For more information on **AmlCompute**, please visit: [Microsoft AmlCompute Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.akscompute?view=azure-ml-py)


**Note:** Please wait for the execution of the cell to finish before moving forward.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

### Create AML CPU Compute Cluster

try:
    compute_target = ComputeTarget(workspace=ws, name='cpucluster')
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_v2',
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, 'cpucluster', compute_config)

    compute_target.wait_for_completion(show_output=True)

## Create baseline for Data Drift Monitor

Specify a baseline dataset - usually the training dataset for a model. A target dataset - usually model input data - is compared over time to your baseline dataset.

The `from_delimited_files` creates a TabularDataset to represent tabular data in delimited files (e.g. CSV and TSV).

The `with_timestamp_columns` defines timestamp columns for the dataset.

In [None]:
import pandas as pd

data_drift = tabular.to_pandas_dataframe()
data_drift.dropna()
data_drift['Date'] = pd.to_datetime(dict(year=2008, month=data_drift.Month, day=data_drift.DayofMonth), errors='coerce')
data_drift = data_drift[data_drift['Date'].notna()]
file_name = 'flight_delay_ds_wDate.csv'
data_drift.to_csv(file_name, index=False)
data_store = Datastore.get_default(ws)
data_store.upload_files(['../flight-delay-mlops/' + file_name], overwrite=True)
datastore_path = [(data_store, file_name)]

drift_tabular = Dataset.Tabular.from_delimited_files(datastore_path)

# assign the timestamp attribute to a real or virtual column in the dataset
drift_tabular = drift_tabular.with_timestamp_columns('Date')

drift_tabular = drift_tabular.register(workspace=ws,
                           name='target',
                           create_new_version=True)

drift_tabular.take(3).to_pandas_dataframe()

## Create Data Drift Monitor

The DataDriftDetector class enables you to configure a data monitor object which then can be run as a job to analyze data drift. Data drift jobs can be run interactively or enabled to run on a schedule. 

The `get_by_name` retrieves a unique DataDriftDetector object for a given workspace and name.

The `create_from_datasets` creates a new DataDriftDetector object from a baseline tabular dataset and a target time series dataset.

For more information on **DataDriftDetector Class**, please visit: [Microsoft DataDriftDetector Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-datadrift/azureml.datadrift.datadriftdetector.datadriftdetector?view=azure-ml-py)

In [None]:
from azureml.datadrift import DataDriftDetector
from datetime import datetime

target = Dataset.get_by_name(ws, 'target')

# set the baseline dataset
baseline = target.time_before(datetime(2008, 4, 1))

try:
    # get data drift detector by name
    monitor = DataDriftDetector.get_by_name(ws, 'fd-drift-monitor')
except:
    # set up data drift detector
    monitor = DataDriftDetector.create_from_datasets(ws, 'fd-drift-monitor', baseline, target, 
                                                          compute_target=compute_target, 
                                                          frequency='Week', 
                                                          feature_list=None, 
                                                          drift_threshold=0.6, 
                                                          latency=24)



columns  = list(baseline.take(1).to_pandas_dataframe())
exclude  = ['Month', 'DayofMonth', 'DayofWeek','Origin_dayl', 'Dest_dayl', 'Origin_srad', 'Dest_srad', 'Origin_swe', 'Dest_swe', 'Origin_tmax', 'Dest_tmax', 'Origin_tmin', 'Dest_tmin', 'Origin_vp', 'Dest_vp', '__index_level_0__']
features = [col for col in columns if col not in exclude]

# update data drift detector
monitor = monitor.update(feature_list=features)

backfill = monitor.backfill(datetime(2008, 4, 1), datetime(2008, 6, 1))

backfill.wait_for_completion(show_output=False, wait_post_processing=True)

## Analyze historical data and backfill

See how the dataset differs from the target dataset in the specified time period. The closer to 100%, the more the two datasets differ.

In [None]:
# get results from Python SDK (wait for backfills or monitor runs to finish)
results, metrics = monitor.get_output(start_time=datetime(year=2008, month=4, day=1))
# plot the results from Python SDK 
monitor.show(datetime(2008, 4, 1), datetime(2008, 6, 1))

# Train & Register

## Load Dataset

First step is to get our data using Dataset, the function `Dataset.get_by_name()` returns a registered Dataset from a given `workspace` and its registration `name`.

`workspace` = The existing AzureML workspace in which the Dataset was registered..

`name` = The registration name.

`dataframe.take() ` = Function returns the elements in the given positional indices along an axis. 

For more information on **Dataset**, please visit: [Microsoft Dataset Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-by-name-workspace--name--version--latest--)

In [None]:
from azureml.core import Dataset

tabular = Dataset.get_by_name(ws, 'flightdelayweather_ds_clean')

data = tabular.to_pandas_dataframe()
tabular.take(3).to_pandas_dataframe()

## Train with sklearn

The `Pipeline()` function purpose is to assemble several steps that can be cross-validated together while setting different parameters.

The `sklearn.linear_model.LogisticRegression` class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers.

The `sklearn.preprocessing.StandardScaler()` function standardizes features by removing the mean and scaling to unit variance.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from azureml.core import Dataset, Run
from sklearn.preprocessing import LabelEncoder
from scripts.utils import *
import joblib
import numpy as np
import pandas as pd

# Fetch dataset from the run by name
synth_df, Y = prepareDataset(tabular)

# Split dataset
X_train, X_test, Y_train, Y_test, A_train, A_test = split_dataset(synth_df, Y)

# Setup scikit-learn pipeline
numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())])

clf = Pipeline(steps=[('classifier', LogisticRegression(solver='liblinear', fit_intercept=True))])


model = clf.fit(X_train, Y_train)
preds = clf.predict(X_test)
analyze_model(clf, X_test, Y_test, preds)

## Fetch latest model

Let's fetch the latest run for our experiment.

In [None]:
from azureml.core import Run, Experiment
exp = Experiment(ws, 'flight_delay_with_mlflow')
run = next(exp.get_runs())
run

# Deployment

MLflow provides native support for AKS and ACI deployment options.

## Deploy to Azure Kubernetes Service (AKS)
To deploy your MLflow model to an Azure Machine Learning web service, your model must be set up with the MLflow Tracking URI to connect with Azure Machine Learning.

To deploy to AKS, first create an AKS cluster. Create an AKS cluster using the ComputeTarget.create() method. It may take 20-25 minutes to create a new cluster.

In [None]:
from azureml.core.compute import AksCompute
from azureml.core.compute import ComputeTarget
from azureml.exceptions import ComputeTargetException

prov_config = AksCompute.provisioning_configuration(location='westus2')

try:
    aks_target = AksCompute(ws, 'flight-delay-aks')
except ComputeTargetException:
    # Create the cluster
    aks_target = ComputeTarget.create(workspace = ws, 
                            name = 'flight-delay-aks', 
                            provisioning_configuration = prov_config)
    aks_target.wait_for_completion(True)

print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

Then, register and deploy the model in one step with MLflow's deployment client.

In [None]:
from mlflow.deployments import get_deploy_client
import json
deployment_config = {"computeType": "aks", "computeTargetName": "flight-delay-aks"}

with open('deployment_config.json', 'w', encoding='utf-8') as f:
    json.dump(deployment_config, f, ensure_ascii=False, indent=4)

# set the tracking uri as the deployment client
client = get_deploy_client(mlflow.get_tracking_uri())

# set the model path 
model_path = "model"

# set the deployment config
deploy_path = "deployment_config.json"
test_config = {'deploy-config-file': deploy_path}

# define the model path and the name is the service name
# the model gets registered automatically and a name is autogenerated using the "name" parameter below 
client.create_deployment(model_uri='runs:/{}/{}'.format(run.id, model_path),
                         config=test_config,
                         name="fd-delay-mlflow-aks")

## Deploy to Azure Container Instance (ACI)

The `mlflow.tracking.MlflowClient` class is a client of an MLflow Tracking Server that creates and manages experiments and runs, and of an MLflow Registry Server that creates and manages registered models and model versions.

The `get_deploy_client` function returns a subclass of `mlflow.deployments.BaseDeploymentClient` exposing standard APIs for deploying models to the specified target.

In [None]:
import mlflow.azureml
from azureml.core.webservice import Webservice
from mlflow.deployments import get_deploy_client
from mlflow.tracking import MlflowClient

client = MlflowClient()

# set the tracking uri as the deployment client
client = get_deploy_client(mlflow.get_tracking_uri())

# set the model path 
model_path = "model"

# define the model path and the name is the service name
# the model gets registered automatically and a name is autogenerated using the "name" parameter below 
client.create_deployment(model_uri='runs:/{}/{}'.format(run.id, model_path),
                         name="fd-mlflow-aci")

## Connect to deployed webservice

Now with test data, we can get it into a suitable format to consume the web service. First an instance of the web service should be obtained by calling the constructor `Webservice()` with the Workspace object and the service name as parameters. Sanitizing of the data is then performed in order to avoid sending unexpected columns to the web service. Finally, call the service via POST using the `requests` module. `requests.post()` will call the deployed web service. It takes for parameters the service URL, the test data, and a headers dictionary that contains the authentication token.

For more information on **Webservice**, please visit: [Microsoft Webservice Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice?view=azure-ml-py)

In [None]:
import json
import requests
import pandas as pd
from azureml.core.webservice import Webservice

aks_service = Webservice(ws, 'fd-mlflow-aks')

# prepare the test data
sample = data.drop(columns=['ArrDelay15']).sample(n=10, random_state=4).values.tolist()

headers = {'Content-Type':'application/json'}

if aks_service.auth_enabled:
    headers['Authorization'] = 'Bearer '+ aks_service.get_keys()[0]

output_df = []
for x in sample:    
    test_sample = json.dumps({'input_data': {'data': [x]}})
    response = requests.post(aks_service.scoring_uri + '?verbose=true', data=test_sample, headers=headers)
    prediction = [bool(response.json()[0])]
    prediction.extend(x)
    output_df.append(prediction)

## Present scoring service predictions

Let's format our service responses and present them in a suitable way to our end users.

In [None]:
def highlight_delays(val):
    return 'background-color: yellow' if val == True else ''

predictions = pd.DataFrame(output_df, columns =['Prediction', 'Month', 'DayofMonth', 'DayOfWeek', 'CRSDepTime', 'CRSArrTime', 'UniqueCarrier', 'CRSElapsedTime', 'Origin', 'Dest', 'Distance', 'Origin_Lat', 'Origin_Lon', 'Origin_State', 'Dest_Lat', 'Dest_Lon', 'Dest_State', 'Origin_dayl', 'Dest_dayl', 'Origin_prcp', 'Dest_prcp', 'Origin_srad', 'Dest_srad', 'Origin_swe', 'Dest_swe', 'Origin_tmax', 'Dest_tmax', 'Origin_tmin', 'Dest_tmin', 'Origin_vp', 'Dest_vp'])
predictions = predictions.style.applymap(highlight_delays, subset=['Prediction'])
predictions

## Optional: Deploy to Managed Endpoints

While the MLflow SDK does not provide out-of-the-box support for managed endpoints, it's easy to deploy OSS models to this service.

First, create a new directory to hold the configuration files for deploying a managed endpoint.

In [None]:
import os

managed_endpoints = './managed-endpoints'

# Working directory
if not os.path.exists(managed_endpoints):
    os.makedirs(managed_endpoints)
    
if os.path.exists(os.path.join(managed_endpoints,".amlignore")):
  os.remove(os.path.join(managed_endpoints,".amlignore"))

## Optional: Create Scoring File

Creating the scoring file is next step before deploying the service. This file is responsible for the actual generation of predictions using the model. The values or scores generated can represent predictions of future values, but they might also represent a likely category or outcome.

The first thing to do in the scoring file is to fetch the model. This is done by calling `Model.get_model_path()` and passing the model name as a parameter.

After the model has been loaded, the function `model.predict()` function should be called to start the scoring process.

For more information on **Machine Learning - Score**, please visit: [Microsoft Machine Learning - Score Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/machine-learning-score)

In [None]:
%%writefile $managed_endpoints/score.py
import os
import json
import time
import numpy as np
import pandas as pd
import azureml.automl.core
from sklearn.externals import joblib
 
def init():
    global model
    print ("model initialized" + time.strftime("%H:%M:%S"))
    
    # this name is model.id of model that we want to deploy
    model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model/model.pkl")
    
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)
    
def run(data):
    try:
        data = json.loads(data)
        df = pd.DataFrame(data['data'], columns=['Month', 'DayofMonth', 'DayOfWeek', 'CRSDepTime', 'CRSArrTime',
       'UniqueCarrier', 'CRSElapsedTime', 'Origin', 'Dest', 'Distance',
       'Origin_Lat', 'Origin_Lon', 'Origin_State', 'Dest_Lat', 'Dest_Lon',
       'Dest_State', 'Origin_dayl', 'Dest_dayl', 'Origin_prcp', 'Dest_prcp',
       'Origin_srad', 'Dest_srad', 'Origin_swe', 'Dest_swe', 'Origin_tmax',
       'Dest_tmax', 'Origin_tmin', 'Dest_tmin', 'Origin_vp', 'Dest_vp']) 
        result = model.predict(df)
    except Exception as e:
        result = str(e)
        print(result)
        return {"error": result}
    return {"result":result.tolist()}

## Optional: Create the environment definition

The following file contains the details of the environment to host the model and code. 

In [None]:
%%writefile $managed_endpoints/score-new.yml
name: fd-mlflow-mng-env
channels:
  - conda-forge
dependencies:
  - python=3.7
  - numpy
  - pip
  - scikit-learn==0.22.1
  - scipy
  - pip:
    - azureml-defaults
    - azureml-sdk[notebooks,automl]
    - pandas
    - inference-schema[numpy-support]
    - joblib
    - numpy
    - scipy

## Optional: Define the endpoint configuration
Specific inputs are required to deploy a model on an online endpoint:

1. Model files.
1. The code that's required to score the model.
1. An environment in which your model runs.
1. Settings to specify the instance type and scaling capacity.

In [None]:
%%writefile $managed_endpoints/endpointconfig.yml
name: fd-mlflow-mng-endpoint
type: online
auth_mode: key
traffic:
  blue: 100

deployments:
  #blue deployment
  - name: blue
    model: azureml:fd-mlflow-aks-model:1
    code_configuration:
      code:
        local_path: ./
      scoring_script: score.py
    environment: 
      name: fd-mlflow-mng-env
      version: 1
      path: ./
      conda_file: file:./score-new.yml
      docker:
          image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
    instance_type: Standard_DS3_v2
    scale_settings:
      scale_type: manual
      instance_count: 1
      min_instances: 1
      max_instances: 2

## Optional: Deploy your managed online endpoint to Azure

This deployment might take up to 15 minutes, depending on whether the underlying environment or image is being built for the first time. Subsequent deployments that use the same environment will finish processing more quickly.

In [None]:
!az ml endpoint create -g [your resource group name] -w [your AML workspace name] -n fd-mlflow-mng-endpoint -f ./managed-endpoints/endpointconfig.yml

## Optional: Generate a sample request JSON file

Export some test data to a JSON file we can send to the endpoint.

In [None]:
%%writefile $managed_endpoints/sample-request.json
{"data": [
[6.0,21.0,6.0,1330.0,1600.0,9.0,150.0,16.0,93.0,745.0,33.64044444,-84.42694444,8.0,40.69249722,-74.16866056,29.0,51148.8,53568.0,2.0,0.0,438.4,451.2,0.0,0.0,30.5,28.5,18.0,15.0,2040.0,1720.0],
[4.0,2.0,3.0,1910.0,2035.0,11.0,85.0,222.0,62.0,361.0,35.87763889,-78.78747222,25.0,39.99798528,-82.89188278,33.0,44928.0,45273.6,0.0,0.0,355.2,438.4,0.0,0.0,23.0,12.5,12.0,1.5,1400.0,680.0],
[1.0,3.0,4.0,935.0,1224.0,16.0,229.0,207.0,78.0,1302.0,39.87195278,-75.24114083,36.0,32.89595056,-97.0372,41.0,33177.6,35596.8,0.0,0.0,156.8,252.8,0.0,0.0,-2.0,6.5,-8.0,-4.5,320.0,440.0],
[4.0,3.0,4.0,1000.0,1252.0,16.0,172.0,207.0,206.0,951.0,39.87195278,-75.24114083,36.0,26.68316194,-80.09559417,7.0,45273.6,44582.4,0.0,4.0,425.6,220.8,0.0,0.0,12.0,28.0,0.5,22.5,640.0,2720.0],
[1.0,21.0,1.0,800.0,1045.0,15.0,105.0,198.0,129.0,589.0,41.979595,-87.90446417,12.0,38.94453194,-77.45580972,43.0,33868.8,34905.6,2.0,0.0,256.0,246.4,56.0,0.0,-7.0,-3.0,-17.0,-13.5,160.0,200.0],
[3.0,12.0,3.0,1640.0,1952.0,5.0,192.0,89.0,101.0,1065.0,40.69249722,-74.16866056,29.0,26.07258333,-80.15275,7.0,41817.6,42508.8,0.0,0.0,336.0,368.0,0.0,0.0,10.0,27.0,0.5,19.5,640.0,2280.0],
[3.0,19.0,3.0,1229.0,1346.0,6.0,77.0,151.0,76.0,214.0,40.77724306,-73.87260917,32.0,38.85208333,-77.03772222,43.0,42854.4,42854.4,22.0,0.0,204.8,307.2,0.0,0.0,10.0,15.0,4.0,6.5,800.0,960.0],
[4.0,18.0,5.0,1210.0,1503.0,4.0,173.0,139.0,169.0,944.0,40.63975111,-73.77892556,32.0,28.42888889,-81.31602778,7.0,47692.8,45964.8,0.0,0.0,524.8,508.8,0.0,0.0,22.5,26.0,5.5,11.5,920.0,1360.0],
[11.0,1.0,6.0,615.0,745.0,9.0,90.0,130.0,18.0,432.0,39.71732917,-86.29438417,13.0,33.64044444,-84.42694444,8.0,36633.6,38016.0,0.0,0.0,297.6,387.2,0.0,0.0,22.0,20.5,5.0,2.0,880.0,720.0],
[11.0,24.0,1.0,936.0,1123.0,8.0,107.0,208.0,77.0,602.0,33.43416667,-112.00805559999999,3.0,39.85840806,-104.6670019,5.0,35942.4,34214.4,0.0,0.0,297.6,291.2,0.0,0.0,27.5,17.0,10.0,-9.5,520.0,280.0]]}

## Optional: Invoke the endpoint to score data by using your model

You can use either the invoke command or a REST client of your choice to invoke the endpoint and score against it.

In [None]:
!az ml endpoint invoke -g [your resource group name] -w [your AML workspace name] -n fd-mlflow-mng-endpoint --request-file ./managed-endpoints/sample-request.json

# Traceability

## Update Model

We can update the model registered during deployment with additional metadata, including the linked dataset.

In [None]:
from azureml.core import Model

model = Model(ws, 'fd-mlflow-aks-model')
model.update(description='This model was developed by Microsoft to showcase the capabilities of Azure ML.',
             tags={'title': 'Flight Delay Model',
    'datasheet_description':
"""
Last updated: October 2020

Based on dataset from by [Statistical Computing Statistical Graphics](http://stat-computing.org/dataexpo/2009/the-data.html)

""",
    'details': 'This model was developed for Microsoft.',
    'date': 'October 2020, trained on data that cuts off at the end of 2008.', 
    'type': 'Classification model',
    'version': '1.0',
    'help': 'https://www.azure.com/',
    'usecase_primary': 
"""
Developed for Flight Delay Demo.

""",
    'usecase_secondary':
"""
Field demos and marketing.

""",
    'usecase_outofscope':
"""
Do not use for production environments.

""",
    'dataset_description':
"""
The data comes originally from RITA where it is described in detail. You can download the data there, or from the bzipped csv files listed below. These files have derivable variables removed, are packaged in yearly chunks and have been more heavily compressed than the originals.

""",
    'motivation': 'Demo the main features behind the Azure ML Workspace environment',
    'caveats':
"""
"""})

model.add_dataset_references([(Dataset.Scenario.TRAINING, tabular)])

## Trace back to model dataset

Dataset instance associated with the registered model.

In [None]:
pd.DataFrame(
    {'Dataset Id': model.datasets['training'][0].id,
     'Name': model.datasets['training'][0].name }, index=[0])

## Model Datasheet

Datasheet associated with the registered model.

In [None]:
from IPython.core.display import display,Markdown

tags = model.tags
display(Markdown(get_datasheet(tags)))

# Projects

## Set tracking URI to Azure ML

The `mlflow.tracking` module provides a Python CRUD interface to MLflow experiments and runs. 

We update tracking URI via `set_tracking_uri()` function.

In [10]:
import mlflow.azureml
import mlflow

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

## Configure the active experiment

The `mlflow.set_experiment()` function sets an experiment as active. If the experiment does not exist, creates a new experiment.

In [11]:
experiment_name = "fs_with_mlflow_proj"
mlflow.set_experiment(experiment_name)

## Execute the project run

The `mlflow.projects.run()` function allows us to run an MLflow project. The project can be local or stored at a Git URI.

In [13]:
backend_config = {"USE_CONDA": True}
local_mlproject_run = mlflow.projects.run(uri=".", backend = "azureml", backend_config = backend_config)

# Optional: Portal

## Start MLflow portal

By starting the MLflow portal with the Azure ML tracking URI, we can use the MLflow experiments UI to explore data stored in Azure ML.

In [None]:
tracking_uri = ws.get_mlflow_tracking_uri()

In [None]:
!pip install mlflow

In [None]:
!mlflow ui --backend-store-uri "$tracking_uri"

## Access MLflow portal

The port opened by MLflow needs to be access locally. To work around this, open a tunnel using SSH:

`ssh azureuser@[your notebook IP] -L 5000:localhost:5000`

This is not the recommended approach for deploying MLflow in production; however, it is a simple, secure option for a demo environment.

Once connected, you can navigate to `localhost:5000` in your browser.