## Connect to your workspace

The code below uses the configuration file to connect to your workspace. The first time you run it in a notebook session, you'll be prompted to sign into Azure by clicking the https://microsoft.com/devicelogin link, entering an automatically generated code, and signing into Azure. After you have successfully signed in, you can close the browser tab that was opened and return to this notebook.

In [None]:
import azureml.core
from azureml.core import Workspace


# Load the workspace from the saved config file

ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

## Challenge 1: Train and Register Model

Before we deploy a real-time endpoint there should be a model to be deployed. The below code trains and registers that model. Run the following cell to complete this challenge. Please be adviced that this will take some time as some configurations and training needs to be completed.

In [None]:
from azureml.core import Dataset
from sklearn.preprocessing import LabelEncoder
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.conda_dependencies import CondaDependencies 
from azureml.train.sklearn import SKLearn
from azureml.core import Environment, Experiment
from azureml.widgets import RunDetails


#######################################
# Working with data
#######################################
print(f"{'#'*50}\n# Working with data \n{'#'*50}")
# Get the default datastore
default_ds = ws.get_default_datastore()

default_ds.upload_files(files=['./data/flight_delays_data.csv'], # Upload the flight_delays csv files in /data
                       target_path='data/', # Put it in a folder path in the datastore
                       overwrite=True, # Replace existing files of the same name
                       show_progress=True)

if 'flight_delays_data' not in ws.datasets:
    #Create a tabular dataset from the path on the datastore (this may take a short while)
    csv_path = [(default_ds, 'data/flight_delays_data.csv')]
    tab_data_set = Dataset.Tabular.from_delimited_files(path=csv_path)

    # Register the tabular dataset
    try:
        tab_data_set = tab_data_set.register(workspace=ws, 
                                name='flight_delays_data',
                                description='flight delays data',
                                tags = {'format':'CSV'},
                                create_new_version=True)
        print('Dataset registered.')
    except Exception as ex:
        print(ex)
else:
    print('Dataset already registered.')


###################################
# Feature Engineering
###################################
print(f"\n\n{'#'*50}\n# Feature Engineering \n{'#'*50}")
# Get the training dataset
dataset = ws.datasets.get('flight_delays_data')
dataset = dataset.to_pandas_dataframe().dropna()

# Remove target leaker and features that are not useful
target_leakers = ['DepDel15','ArrDelay','Cancelled','Year']
dataset.drop(columns=target_leakers, axis=1, inplace=True)

# convert some columns to categorical features
columns_as_categorical = ['OriginAirportID','DestAirportID','ArrDel15']
dataset[columns_as_categorical] = dataset[columns_as_categorical].astype('object')

# The labelEncoder and OneHotEncoder only works on categorical features. We need first to extract the categorial featuers using boolean mask.
categorical_feature_mask = dataset.dtypes == object 
categorical_cols = dataset.columns[categorical_feature_mask].tolist()
categorical_cols

le = LabelEncoder()

# Apply LabelEncoder on each of the categorical columns:
dataset[categorical_cols] = dataset[categorical_cols].apply(lambda col:le.fit_transform(col))

# Drop all null values
dataset.dropna(inplace=True)

train_ds, test_ds = dataset.loc[dataset['Month'] < 9], dataset.loc[dataset['Month'] >= 9]
train_count = train_ds.Month.count()
test_count = test_ds.Month.count()
print('Test data ratio:',(test_count/(test_count+train_count))*100)


#########################################
# Environment setup
#########################################
print(f"\n\n{'#'*50}\n# Environment setup \n{'#'*50}")
# Create a Python environment for the experiment
flight_delays_env = Environment("flight-delays-experiment-env")
flight_delays_env.python.user_managed_dependencies = False # Let Azure ML manage dependencies
flight_delays_env.docker.enabled = True # Use a docker container

# Create a set of package dependencies (conda or pip as required)
flight_delays_packages = CondaDependencies.create(conda_packages=['scikit-learn'],
                                          pip_packages=['azureml-defaults', 'azureml-dataprep[pandas]', 'matplotlib', 'seaborn'])

# Add the dependencies to the environment
flight_delays_env.python.conda_dependencies = flight_delays_packages

print(flight_delays_env.name, 'defined.')

# Register the environment
flight_delays_env.register(workspace=ws)
print(flight_delays_env.name, 'registered.')



##########################################
# Create remote training cluster
##########################################
print(f"\n\n{'#'*50}\n# Create remote training cluster \n{'#'*50}")

cluster_name = "aml-cluster"
try:
    # Get the cluster if it exists
    training_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If not, create it
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS2_V2', max_nodes=2)
    training_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

training_cluster.wait_for_completion(show_output=True)



############################################################
# Train Model
############################################################
print(f"\n\n{'#'*50}\n# Training Model \n{'#'*50}")
# Get the environment
registered_env = Environment.get(ws, 'flight-delays-experiment-env')

# specify cluster name
cluster_name = "aml-cluster"

# Set the script parameters
script_params = {
    '--regularization': 0.1
}
experiment_folder = 'flight_delays'

# Get the training dataset
flight_delays_ds = ws.datasets.get("flight_delays_data")

# Create an estimator
estimator = SKLearn(source_directory=experiment_folder,
                      inputs=[flight_delays_ds.as_named_input('flight_delays_data')],
                      script_params=script_params,
                      compute_target = cluster_name, # Run the experiment on the remote compute target
                      environment_definition = registered_env,
                      entry_script='flight_delays_training.py')

# Create an experiment
experiment = Experiment(workspace = ws, name = 'flight-delays-training')

# Run the experiment
run = experiment.submit(config=estimator)
# Show the run details while running
RunDetails(run).show()
run.wait_for_completion()


#########################################################
# Register the model
#########################################################
print(f"\n\n{'#'*50}\n# Registering the model \n{'#'*50}")
run.register_model(model_path='outputs/flight_delays_model.pkl', model_name='flight_delays_model',
                   tags={'Training context':'Parameterized SKLearn Estimator'},
                   properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})
print(f"\n\n{'#'*50}\n# Model registered. \n{'#'*50}")

## Challenge 2: Create an Entry Script and Execution Environment
We're going to create a web service to host this model, and this will require some code and configuration files; so let's create a folder for those.

In [None]:
import os

folder_name = 'flight_delays_service'

# Create a folder for the web service files
experiment_folder = './' + folder_name
os.makedirs(folder_name, exist_ok=True)

print(folder_name, 'folder created.')

The web service where we deploy the model will need some Python code to load the input data, get the model from the workspace, and generate and return predictions. We'll save this code in an *entry script* that will be deployed to the web service:

The web service will be hosted in a container, and the container will need to install any required Python dependencies when it gets initialized. In this case, our scoring code requires **scikit-learn, matplotlib and seaborn**, so we'll create a .yml file that tells the container host to install this into the environment.

## Challenge 3: Deploy model to a Web Service hosted on Azure Container Instance (ACI)
When you want to test a model deployment, or if your deployment is very low-scale and CPU-based, Azure Container Instances (ACI) is a good option. This fully managed service is the fastest and most straightforward way to deploy an isolated container in Azure, which means that no cluster management or orchestration is required.

Unlike deploying to AKS, you do not need to create ACI containers in advance because they are created on the fly. This means you can go straight to deploying to ACI.

## Challenge 4: Consume the Real-time Endpoint

Let's determine the URL to which these applications must submit their requests as well as the keys:

Now that you know the endpoint URI, an application can simply make an HTTP request, sending the flight data in JSON (or binary) format, and receive back the predicted class(es).

In [None]:
import requests
import json

x_new = [[4, 19, 5, 4, 18, 36, 837, -3.0, 1138]]

# Convert the array to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Set the content type
request_headers = {"Content-Type": "application/json",
                   "Authorization": "Bearer " + primary_key}

predictions = requests.post(endpoint, input_json, headers=request_headers)

predicted_classes = json.loads(predictions.json())

for i in range(len(x_new)):
    print(f"Flight {x_new[i]} -> {predicted_classes[i]}")