# Test deployed web application

This notebook uses some duplicate questions and tests them against the deployed web application on AKS.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
import json

from utilities import text_to_json

Get the external url for the web application running on AKS cluster.

In [2]:
service_json = !kubectl get service azure-ml -o json
service_dict = json.loads(''.join(service_json))
app_url = service_dict['status']['loadBalancer']['ingress'][0]['ip']
app_url

'137.117.41.76'

Quickly check if the web application is working.

In [3]:
scoring_url = 'http://{}/score'.format(app_url)
version_url = 'http://{}/version'.format(app_url)
health_url = 'http://{}/'.format(app_url)

In [5]:
!curl $health_url

Healthy

In [6]:
!curl $version_url # Reports the lightgbm version

2.1.2

Let's use one of the duplicate questions to test our web service.

In [7]:
dupes_test_path = 'dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
text_to_score = dupes_test.iloc[0,4]
text_to_score

"javascript arrays as objects.  possible duplicate: length of javascript object (ie. associative array) loop through javascript object    i'm trying to make an array, where each item has some name and value. the code above doesn't work. tryed to make an object, but it doesn't have a length property - no for loop.  is it possible to use arrays in this context?"

In [8]:
jsontext = text_to_json(text_to_score)
jsontext[:100]

'{"input": "javascript arrays as objects.  possible duplicate: length of javascript object (ie. assoc'

In [9]:
headers = {'content-type': 'application/json'}
r = requests.post(scoring_url, data=jsontext, headers=headers) # Run the request twice since the first time takes a 
%time r = requests.post(scoring_url, data=jsontext, headers=headers) # little longer due to the loading of the model
print(r)
r.json()

CPU times: user 1.08 ms, sys: 818 µs, total: 1.9 ms
Wall time: 207 ms
<Response [200]>


{'result': [[[11922383, 11922384, 0.4618297837830524],
   [750486, 750506, 0.10187547714586409],
   [5223, 6700, 0.0638246129572683],
   [85992, 86014, 0.04240792341938436],
   [4255472, 4255480, 0.026144231768798966],
   [684672, 684692, 0.02453308688418635],
   [126100, 4889658, 0.016040256778800346],
   [6491463, 6491621, 0.013490367187470405],
   [1584370, 1584377, 0.011515613015373344],
   [171251, 171256, 0.009396552126284062],
   [4616202, 4616273, 0.009032739115879984],
   [1885557, 1885660, 0.008370193039577081],
   [111102, 111111, 0.0071304530036020745],
   [2274242, 2274327, 0.005718410320988048],
   [7364150, 7364307, 0.0034374790002485965],
   [19590865, 19590901, 0.0034328185480412433],
   [14028959, 8716680, 0.00309644289435943],
   [840781, 840808, 0.002855357692136627],
   [20279484, 20279485, 0.002639894956650536],
   [695050, 695053, 0.002557405523119375],
   [8495687, 8495740, 0.002151802183642379],
   [7837456, 14853974, 0.0016722794698331785],
   [12953704, 12953

Let's try a few more duplicate questions and display their top 3 original matches.

In [10]:
dupes_to_score = dupes_test.iloc[:5,4]

In [11]:
results = [requests.post(scoring_url, data=text_to_json(text), headers=headers) for text in dupes_to_score]

Let's print top 3 matches for each duplicate question.

In [12]:
[results[i].json()['result'][0][0:3] for i in range(0, len(results))]

[[[11922383, 11922384, 0.4618297837830524],
  [750486, 750506, 0.10187547714586409],
  [5223, 6700, 0.0638246129572683]],
 [[14220321, 14220323, 0.8221692467477977],
  [11922383, 11922384, 0.6581886437186476],
  [5223, 6700, 0.6432238154719481]],
 [[14220321, 14220323, 0.9645868077283122],
  [27928, 27943, 0.9222570573208747],
  [13840429, 13840431, 0.7895954858417572]],
 [[27928, 27943, 0.8253809764787108],
  [14220321, 14220323, 0.35839553716891404],
  [23667086, 23667087, 0.2802815688638847]],
 [[203198, 1207393, 0.989876477409501],
  [152975, 153047, 0.5379549302867603],
  [12829963, 12830031, 0.22370925120587182]]]

Next let's quickly check what the request response performance is for the deployed model on AKS cluster.

In [13]:
text_data = list(map(text_to_json, dupes_to_score)) # Retrieve the text data

In [14]:
timer_results = list()
for text in text_data:
    res=%timeit -r 1 -o -q requests.post(scoring_url, data=text, headers=headers)
    timer_results.append(res.best)

In [15]:
timer_results

[0.30018480587750673,
 0.2426607357338071,
 0.2597067950293422,
 0.2025966914370656,
 0.23379836697131395]

In [16]:
print('Average time taken: {0:4.2f} ms'.format(10**3 * np.mean(timer_results)))

Average time taken: 247.79 ms


We have tested that the model works and we can now move on to the [next notebook to get a sense of its throughput](07_Speed_Test_WebApp.ipynb).