# Load Test deployed web application

This notebook pulls some images and tests them against the deployed web application. We submit requests asychronously which should reduce the contribution of latency.

In [1]:
import asyncio
import json
import random
import urllib.request
from timeit import default_timer

import aiohttp
from tqdm import tqdm
import requests
import pandas as pd

In [2]:
print(aiohttp.__version__) 

3.3.2


We will test our deployed service with 100 calls. We will only have 4 requests concurrently at any time. We have only deployed one pod on one node and increasing the number of concurrent calls does not really increase throughput. Feel free to try different values and see how the service responds.

In [3]:
NUMBER_OF_REQUESTS = 100  # Total number of requests
CONCURRENT_REQUESTS = 4   # Number of requests at a time

Get the IP address of our service.

In [4]:
service_json = !kubectl get service azure-ml -o json
service_dict = json.loads(''.join(service_json))
app_url = service_dict['status']['loadBalancer']['ingress'][0]['ip']

In [5]:
scoring_url = 'http://{}/score'.format(app_url)
version_url = 'http://{}/version'.format(app_url)
health_url = 'http://{}/'.format(app_url)

In [6]:
!curl $health_url

Healthy

In [7]:
!curl $version_url # Reports the lightgbm version

2.1.2

In [8]:
dupes_test_path = 'dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
dupes_to_score = dupes_test.iloc[:NUMBER_OF_REQUESTS,4]

In [9]:
def text_to_json(text):
    return json.dumps({'input':'{0}'.format(text)})

In [10]:
url_list = [[scoring_url, jsontext] for jsontext in dupes_to_score.apply(text_to_json)]

In [11]:
def decode(result):
    return json.loads(result.decode("utf-8"))

In [12]:
async def fetch(url, session, data, headers):
    start_time = default_timer()
    async with session.request('post', url, data=data, headers=headers) as response:
        resp = await response.read()
        elapsed = default_timer() - start_time
        return resp, elapsed

In [13]:
async def bound_fetch(sem, url, session, data, headers):
    # Getter function with semaphore.
    async with sem:
        return await fetch(url, session, data, headers)

In [14]:
async def await_with_progress(coros):
    results=[]
    for f in tqdm(asyncio.as_completed(coros), total=len(coros)):
        result = await f
        results.append((decode(result[0]),result[1]))
    return results

In [15]:
async def run(url_list, num_concurrent=CONCURRENT_REQUESTS):
    headers = {'content-type': 'application/json'}
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(num_concurrent)

    # Create client session that will ensure we dont open new connection
    # per each request.
    async with aiohttp.ClientSession() as session:
        for url, data in url_list:
            # pass Semaphore and session to every POST request
            task = asyncio.ensure_future(bound_fetch(sem, url, session, data, headers))
            tasks.append(task)
        return await await_with_progress(tasks)

Below we run the 100 requests against our deployed service.

In [16]:
loop = asyncio.get_event_loop()
start_time = default_timer()
complete_responses = loop.run_until_complete(asyncio.ensure_future(run(url_list, num_concurrent=CONCURRENT_REQUESTS)))
elapsed = default_timer() - start_time
print('Total Elapsed {}'.format(elapsed))
print('Avg time taken {0:4.2f} ms'.format(1000*elapsed/len(url_list)))

100%|██████████| 100/100 [00:07<00:00, 14.16it/s]

Total Elapsed 7.0668306006118655
Avg time taken 70.67 ms





In [35]:
# Example response
complete_responses[0]

({'result': "([[(27928, 27943, 0.9806319857860937), (1726630, 1726662, 0.039778853781766224), (23667086, 23667087, 0.03596640261893073), (3059044, 3059129, 0.023800730662433985), (1458633, 3439981, 0.016286266565376985), (14220321, 14220323, 0.012335680702969764), (18082, 1830844, 0.012222658506013475), (3384504, 3384534, 0.009659842507799827), (901115, 901144, 0.008119897990625796), (13840429, 13840431, 0.007713470308071854), (3127429, 3127440, 0.006995784296409811), (784929, 784946, 0.006296776815135907), (4616202, 4616273, 0.006055237070916387), (750486, 750506, 0.005344779961669449), (201183, 201471, 0.005244422811033432), (1789945, 1789952, 0.00498123639766631), (1267283, 1267338, 0.004466097982226653), (11922383, 11922384, 0.004321992388219521), (7486085, 7486130, 0.004281227623528716), (149055, 149150, 0.004130159219589328), (5891840, 5891929, 0.0040389688246748575), (171251, 171256, 0.004036971551964096), (3224834, 3224854, 0.003957822472214618), (2901102, 2901298, 0.0039296042

In [30]:
no_questions = len(eval(complete_responses[0][0]['result'])[0][0])

In [33]:
num_succesful=[len(eval(i[0]['result'])[0][0]) for i in complete_responses].count(no_questions)
print('Succesful {} out of {}'.format(num_succesful, len(url_list)))

Succesful 100 out of 100


To tear down the cluster and all related resources go to the last section of [deploy on AKS notebook](04_DeployOnAKS.ipynb).