# Deploy Document Classification Custom Skill

This tutorial shows how to deploy a document classification custom skill for Cognitive Search. We will use the document classifier that was created by *01_Train_AML_Model.ipynb*. If you have not already, please run that script.

For more information on using custom skills with Cognitive Search, please see this [page](https://docs.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface).

### 0.0 Important Variables you need to set for this tutorial

Enter your workspace, resource and subscription credentials below


In [None]:
# Machine Learning Service Workspace configuration
my_workspace_name = ''
my_azure_subscription_id = ''
my_resource_group = ''

# Azure Kubernetes Service configuration
my_aks_location = 'eastus'
my_aks_compute_target_name = 'aks-comptarget'
my_aks_service_name = 'aks-service'     
my_leaf_domain_label = 'ssl1'   # web service url prefix

### 1.0 Import Packages

In [None]:
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import numpy as np

import azureml
from azureml.core import Workspace, Run

# display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

### 2.0 Connect to Workspace
Create a workspace object. If you already have a workspace and a config.json file you can use `ws = Workspace.from_config()` instead.

In [None]:
from azureml.core import Workspace
from azureml.core.model import Model

ws = Workspace.get(name = my_workspace_name, resource_group = my_resource_group, subscription_id = my_azure_subscription_id)
print(ws.name, ws.location, ws.resource_group, sep = '\t')

### 3.0 Register Model
The last step in the training script wrote the file outputs/sklearn_mnist_model.pkl in a directory named outputs.

Register the model in the workspace so that you (or other collaborators) can query, examine, and deploy this model.

In [None]:
model_name="newsgroup_classifier"

model = Model.register(model_path="outputs/newsgroup_classifier.pkl",
                        model_name=model_name,
                        tags={"data": "newsgroup", "document": "classification"},
                        description="document classifier for newsgroup20",
                        workspace=ws)

print(model.id)

### 4.0 Create Scoring Script
Create the scoring script, called score.py, used by the web service call to show how to use the model.

You must include two required functions into the scoring script:
- The init() function, which typically loads the model into a global object. This function is run only once when the Docker container is started.
- The run(input_data) function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

*The **run function** has been specifically tailored to deploy the model as a custom skill. This means that inputs & outputs are formatted correctly and any errors will be returned in a format usable by Cognitive Search*.

In [None]:
%%writefile score.py
import json
import numpy as np
import pandas as pd
import os
import pickle
import joblib

from azureml.core.model import Model

def init():
    global model
    # retreive the path to the model file using the model name
    model_path = Model.get_model_path(model_name='newsgroup_classifier')
    model = joblib.load(model_path)
    
def convert_to_df(my_dict):
    df = pd.DataFrame(my_dict["values"])
    data = df['data'].tolist()
    index = df['recordId'].tolist()
    return pd.DataFrame(data, index = index)

def run(raw_data):
    data = json.loads(raw_data)
    # Converting the input dictionary to a dataframe
    try:
        df = convert_to_df(data)
    # Returning error message for each item in batch if data not in correct format 
    except:
        df = pd.DataFrame(data)
        index = df['recordId'].tolist()
        message = "Request for batch is not in correct format"
        output_list = [{'recordId': i, 'data': {}, "errors": [{'message': message}]} for i in index]
        return {'values': output_list}
    
    output_list = []
    for index, row in df.iterrows():
        output = {'recordId': index, 'data': {}}
        try:
            output['data']['type'] = str(model.predict([row['content']])[0])
        # Returning exception if an error occurs
        except Exception as ex:
            output['errors'] = [{'message': str(ex)}]
        output_list.append(output)

    return {'values': output_list}    

### 5.0 Create Environment and Inference Configuration

In [None]:
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import Environment

pip = ["azureml-defaults", "scikit-learn", "pandas", "joblib"]
conda_deps = CondaDependencies.create(conda_packages=None, pip_packages=pip)

myenv = Environment(name='myenv')
myenv.python.conda_dependencies = conda_deps

In [None]:
from azureml.core.model import InferenceConfig

inf_config = InferenceConfig(entry_script='score.py', environment=myenv)

### 6.0 Create Azure Kubernetes Service Configuration File
Estimated time to complete: about 10 minutes

Create an Azure Kubernetes Service deployment configuration file. Notice that we enable SSL since Azure Search only allows secure endpoints as custom skills. 

In [None]:
# create AKS compute target
from azureml.core.compute import ComputeTarget, AksCompute

config = AksCompute.provisioning_configuration(location= my_aks_location)
config.enable_ssl(leaf_domain_label= my_leaf_domain_label, overwrite_existing_domain=True)

aks = ComputeTarget.create(ws, my_aks_compute_target_name, config)
aks.wait_for_completion(show_output=True)

# if you already created a configuration file, you can just attach: 
#config = AksCompute.attach_configuration(resource_group= my_resource_group, cluster_name='enter cluser name here')
#config.enable_ssl(leaf_domain_label= my_leaf_domain_label, overwrite_existing_domain=True)
#aks = ComputeTarget.attach(ws, my_aks_compute_target_name, config)
#aks.wait_for_completion(show_output=True)

print(aks.ssl_configuration.cname, aks.ssl_configuration.status)

### 7.0 Define the Deployment Configuration

In [None]:
from azureml.core.webservice import AksWebservice, Webservice

# If deploying to a cluster configured for dev/test, ensure that it was created with enough
# cores and memory to handle this deployment configuration. Note that memory is also used by
# dependencies and AML components.

aks_config = AksWebservice.deploy_configuration(autoscale_enabled=True, 
                                                autoscale_min_replicas=1, 
                                                autoscale_max_replicas=3, 
                                                autoscale_refresh_seconds=10, 
                                                autoscale_target_utilization=70,
                                                auth_enabled=True, 
                                                cpu_cores=1, memory_gb=2, 
                                                scoring_timeout_ms=5000, 
                                                replica_max_concurrent_requests=2, 
                                                max_request_wait_time=5000)

### 8.0 Deploy a web service
Deploy a web service using the AKS image. Then get the web service HTTPS endpoint and the key to use to call the service

In [None]:
from azureml.core.model import Model

document_classifier = Model(ws, model_name)

# deploy an AKS web service using the image
#aks_config = AksWebservice.deploy_configuration()

service = Model.deploy(workspace=ws,
                       name=my_aks_service_name,
                       models=[document_classifier],
                       inference_config=inf_config,
                       deployment_config=aks_config,
                       deployment_target=aks,
                       overwrite=True)


service.wait_for_deployment(show_output = True)

primary, secondary = service.get_keys()
print('Scoring Uri: ' + service.scoring_uri)
print('Primary key: ' + primary)

### 9.0 Test Deployed Service

#### 9.1 Import 20newsgroups Test Dataset

In [None]:
from sklearn.datasets import fetch_20newsgroups

categories = ['comp.graphics', 'sci.space']
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)

X_test = newsgroups_test.data
y_test = [categories[x] for x in newsgroups_test.target]

#### 9.2 Format Data in Correct Structure for Cognitive Search
For more information on custom skills see this [link](https://docs.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface).

In [None]:
# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)

input_data = {"values":[{"recordId": "0", "data": {"content": newsgroups_test.data[random_index]}}]}
print(input_data)

#### 9.3 Send HTTP Request and View Results

In [None]:
import requests
import json

input_json = json.dumps(input_data)

headers = { 'Content-Type':'application/json'}
headers['Authorization']= f'Bearer {primary}'

# for AKS deployment you'd need the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(service.scoring_uri, input_json, headers=headers)

print("POST to url", service.scoring_uri)
print("label:", y_test[random_index])
print("prediction:", resp.text)

## 10.0 Integrate the custom skill

In [None]:
print('Scoring Uri: ' + service.scoring_uri)
print('Primary key: ' + primary)

Nice work! You're now ready to add the custom skill to your skillset. 

Add the following skill to your skillset:

```json
{
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "description": "A document classification custom skill",
    "uri": "<your-scoring-uri>",
    "httpHeaders": {
        "Authorization": "Bearer <your-primary-key>"
    },
    "batchSize": 1,
    "context": "/document",
    "inputs": [
      {
        "name": "content",
        "source": "/document/content"
      }
    ],
    "outputs": [
      {
        "name": "type",
        "targetName": "type"
      }
    ]
 }
```

Don't forget to also add an [output field mapping](https://docs.microsoft.com/azure/search/cognitive-search-output-field-mapping) to your indexer so that the data gets mapped into the search index correctly:

```json
{
  "sourceFieldName": "/document/type",
  "targetFieldName": "type"
}
```