# Test the deployed web application

In this notebook, we test the local web app running inside the Docker container we made previously.

In [1]:
import numpy as np
import requests
import pandas as pd
import json

from utilities import text_to_json

In [2]:
docker_login = 'fboylu'
image_name = docker_login + '/mlaksdep'

Run the Docker conatainer in the background and open port 80.

In [3]:
%%bash --bg -s "$image_name"
docker run -p 80:80 $1

Wait a few seconds for the application to spin up and then check that everything works.

In [2]:
!curl "http://0.0.0.0:80/"

Healthy

In [3]:
!curl "http://0.0.0.0:80/version"

2.1.2

Now, let's use one of the duplicate questions to test our driver.

In [4]:
dupes_test_path = 'dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
text_to_score = dupes_test.iloc[0,4]
text_to_score

'check length of multidimensional arrays with javascript.  possible duplicate: length of javascript associative array   i want to check the length of a multidimensional array but i get "undefined" as the return. i\'m assuming that i am doing something wrong with my code but i can\'t see anything odd about it.  thoughts? could this have something to do with scope? the array is declared and set outside of the function. could this have something to do with json? i created the array from an eval() statement. why does the dummy array work just fine?'

In [6]:
jsontext = text_to_json(text_to_score)
jsontext[:100]

'{"input": "check length of multidimensional arrays with javascript.  possible duplicate: length of j'

In [7]:
headers = {'content-type': 'application/json'}
%time r = requests.post('http://0.0.0.0:80/score', data=jsontext, headers=headers)
print(r)
r.json()

CPU times: user 5.19 ms, sys: 805 µs, total: 5.99 ms
Wall time: 46.3 ms
<Response [200]>


{'result': [[[14220321, 14220323, 0.7865442773746284],
   [23667086, 23667087, 0.466772423845678],
   [111102, 111111, 0.12491377361611108],
   [750486, 750506, 0.08508909345471119],
   [5223, 6700, 0.059068876328743365],
   [11922383, 11922384, 0.0479570594875083],
   [1885557, 1885660, 0.025266629671571887],
   [3127429, 3127440, 0.018891165795017057],
   [19590865, 19590901, 0.012545292402904666],
   [500431, 500459, 0.01089350367598299],
   [2421911, 2421949, 0.010131126805885545],
   [5767325, 5767357, 0.008165211185900259],
   [1451009, 1451043, 0.005940954791771819],
   [4255472, 4255480, 0.005158237254440684],
   [1584370, 1584377, 0.004937012694565438],
   [20279484, 20279485, 0.004803617362417362],
   [6491463, 6491621, 0.004235244442278276],
   [951021, 951057, 0.0037432571359726405],
   [85992, 86014, 0.0029606498940158263],
   [237104, 1473742, 0.0029091044017371024],
   [1359469, 1359507, 0.0028757792274005027],
   [950087, 950146, 0.0027901909903594924],
   [8228281, 822

Let's try a few more duplicate questions and display their top 3 original matches.

In [8]:
dupes_to_score = dupes_test.iloc[:5,4]

In [9]:
url = 'http://0.0.0.0:80/score'
results = [requests.post(url, data=text_to_json(text), headers=headers) for text in dupes_to_score]

Let's print top 3 matches for each duplicate question.

In [10]:
[results[i].json()['result'][0][0:3] for i in range(0, len(results))]

[[[14220321, 14220323, 0.7865442773746284],
  [23667086, 23667087, 0.466772423845678],
  [111102, 111111, 0.12491377361611108]],
 [[5223, 6700, 0.7397060379835241],
  [4616202, 4616273, 0.6256758901283033],
  [11922383, 11922384, 0.103836658977746]],
 [[14220321, 14220323, 0.965174975937841],
  [27928, 27943, 0.9644578504536946],
  [1069666, 1069840, 0.016080867202196546]],
 [[14220321, 14220323, 0.25864867021509896],
  [750486, 750506, 0.07433038465049964],
  [1771786, 1771824, 0.04370333110706829]],
 [[31044, 31047, 0.9714481598389331],
  [2631001, 2631198, 0.2401596039527375],
  [750486, 750506, 0.10560619440762473]]]

Next let's quickly check what the request response performance is for the locally running Docker container.

In [11]:
text_data = list(map(text_to_json, dupes_to_score)) # Retrieve the text data

In [12]:
timer_results = list()
for text in text_data:
    res=%timeit -r 1 -o -q requests.post(url, data=text, headers=headers)
    timer_results.append(res.best)

In [13]:
timer_results

[0.04046588679775596,
 0.03142287442460656,
 0.038978354539722204,
 0.03583723120391369,
 0.03897201484069228]

In [14]:
print('Average time taken: {0:4.2f} ms'.format(10**3 * np.mean(timer_results)))

Average time taken: 37.14 ms


In [15]:
%%bash
docker stop $(docker ps -q)

b50cafe01592


We can now [deploy our web application](05_Deploy_On_AKS.ipynb) on AKS.