Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Develop Scoring Script

In this notebook, we will develop the scoring script and test it locally. We will use the scoring script to create the 
web service that will call the model for scoring.

In [None]:
import sys
import pandas as pd

from azure_utils.utilities import text_to_json, get_auth
from azureml.core.model import Model
from azureml.core.workspace import Workspace
from dotenv import get_key, find_dotenv

In [None]:
sys.path.append('./scripts/')

In [None]:
env_path = find_dotenv(raise_error_if_not_found=True)

Let's load the workspace.

In [None]:
ws = Workspace.from_config(auth=get_auth(env_path))
print(ws.name, ws.resource_group, ws.location, sep="\n")

Let's retrieve the model registered earlier and download it.

In [None]:
model_name = 'question_match_model'
model_version = int(get_key(env_path, 'model_version'))
model = Model(ws, name=model_name, version=model_version)
print(model.name, model.version, model.url, sep="\n")

In [None]:
model.download(target_dir=".", exist_ok=True)

## Create Scoring Script

We use the writefile magic to write the contents of the below cell to `score.py` which includes the  `init` and `run` 
functions required by AML.
- The init() function typically loads the model into a global object.
- The run(input_data) function uses the model to predict a value based on the input_data.

In [None]:
%%writefile score.py

import pandas as pd
import json
from duplicate_model import DuplicateModel
import logging
import timeit as t

def init():
    logger = logging.getLogger("scoring_script")
    global model
    model_path = "model.pkl"
    questions_path = "./data_folder/questions.tsv"
    start = t.default_timer()
    model = DuplicateModel(model_path, questions_path)
    end = t.default_timer()
    loadTimeMsg = "Model loading time: {0} ms".format(round((end-start)*1000, 2))
    logger.info(loadTimeMsg)


def run(body):
    logger = logging.getLogger("scoring_script")
    json_load_text = json.loads(body)
    text_to_score = json_load_text["input"]
    start = t.default_timer()
    resp = model.score(text_to_score)
    end = t.default_timer()
    logger.info("Prediction took {0} ms".format(round((end-start)*1000, 2)))
    return json.dumps(resp)

Let's test by running the score.py which will bring the imports and functions into the context of the notebook.

%run score.py

In [None]:
dupes_test_path = './data_folder/dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
text_to_score = dupes_test.iloc[0, 4]
print(text_to_score)

Now, call the init() to initialize the model.

In [None]:
init()

We convert the question text to json format and make predictions.

In [None]:
json_text = text_to_json(text_to_score)
r = run(json_text)
r

Next, we move on to [creating the docker image which we will deploy](04_CreateImage.ipynb).