# Develop Model Driver

In this notebook, we will develop the API that will call our model. This module initializes the model, transforms the input so that it is in the appropriate format and defines the scoring method that will produce the predictions. The API will expect the input to be in JSON format. Once a request is received, the API will use the request body to score the question text. There are two main functions in the API. The first function loads the model and returns a scoring function. The second function process the question text and uses the first function to score it.

In [1]:
import pandas as pd
import logging
import json

We use the writefile magic to write the contents of the below cell to driver.py which includes the driver methods.

In [2]:
%%writefile driver.py

import lightgbm as lgb
import timeit as t
import logging
from duplicate_model import DuplicateModel

model_path = 'model.pkl'
questions_path = 'questions.tsv'
logger = logging.getLogger("model_driver")

def create_scoring_func():
    """ Initialize Model Object 
    """   
    start = t.default_timer()
    DM = DuplicateModel(model_path, questions_path)
    end = t.default_timer()
    
    loadTimeMsg = "Model object loading time: {0} ms".format(round((end-start)*1000, 2))
    logger.info(loadTimeMsg)
    
    def call_model(text):
        preds = DM.score(text)  
        return preds
    
    return call_model

def get_model_api():
    logger = logging.getLogger("model_driver")
    scoring_func = create_scoring_func()
    
    def process_and_score(inputString):
        """ Classify the input using the loaded model
        """
        start = t.default_timer()

        responses = []
        preds = scoring_func(inputString)
        responses.append(preds)

        end = t.default_timer()
        
        logger.info("Predictions: {0}".format(responses))
        logger.info("Predictions took {0} ms".format(round((end-start)*1000, 2)))
        return (responses, "Computed in {0} ms".format(round((end-start)*1000, 2)))
    return process_and_score

def version():
    return lgb.__version__

Overwriting driver.py


Let's test the module.

In [3]:
logging.basicConfig(level=logging.DEBUG)

We run the driver.py which will bring the imports and functions into the context of the notebook.

In [4]:
%run driver.py

Now, let's use one of the duplicate questions to test our driver.

In [5]:
dupes_test_path = 'dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
text_to_score = dupes_test.iloc[0,4]
text_to_score

'best way to json parsing using javascript.  possible duplicate: length of a javascript object (that is, associative array)  i have json in the following format  i tried using (data.student[i].length), which did not work (just to see what the length of the object is), and i also tried (data.user[i]) to no avail. i am basically very confused on how i can get the length of one of the objects in the json and how i can effectively display it. how can i do this?'

Here, we define a helper function to convert our text for the format that will be required by the Flask application that will use the functions in the driver.

In [6]:
def text_to_json(text):
    return json.dumps({'input':'{0}'.format(text)})

In [7]:
jsontext = text_to_json(text_to_score)
json_load_text = json.loads(jsontext)
body = json_load_text['input']

In [8]:
predict_for = get_model_api()

INFO:model_driver:Model object loading time: 347.89 ms


In [9]:
resp = predict_for(body)

INFO:model_driver:Predictions: [[(11922383, 11922384, 0.984836747218761), (5223, 6700, 0.26940691358778396), (126100, 4889658, 0.15567350448589748), (4935632, 4935684, 0.08335138920111888), (171251, 171256, 0.007303045770366217), (19590865, 19590901, 0.006725269198891133), (684672, 684692, 0.0013128070346465882), (7364150, 7364307, 0.0011194899608325862), (4616202, 4616273, 0.0009635338081120141), (14220321, 14220323, 0.0005954235787411567), (979256, 979289, 0.0004864703339893272), (12953704, 12953750, 0.00045879016038158983), (901115, 901144, 0.0002602969237632822), (122102, 122704, 0.00016247621327218932), (45015, 5686237, 0.00016141301883164434), (1129216, 1129270, 8.615037671166863e-05), (1068834, 1144249, 7.799950428193527e-05), (6491463, 6491621, 6.776432884270919e-05), (85992, 86014, 4.427277240582091e-05), (728360, 728694, 2.1073103785470865e-05), (359788, 359910, 1.625803284659823e-05), (7837456, 14853974, 1.5838816951097937e-05), (1527803, 1527820, 1.5495191431020978e-05), (2

Next, we move on to building our docker image.