# Load Test deployed web application

This notebook pulls some images and tests them against the deployed web application. We submit requests asychronously which should reduce the contribution of latency.

In [2]:
import os
from timeit import default_timer
import pandas as pd

from azureml.core.webservice import AksWebservice
from azureml.core.workspace import Workspace
from dotenv import get_key, find_dotenv
from utilities import get_auth
from urllib.parse import urlparse


In [3]:
env_path = find_dotenv(raise_error_if_not_found=True)

In [None]:
ws = Workspace.from_config(auth=get_auth(env_path))
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep="\n")

Let's retrive the web service.

In [5]:
aks_service_name = get_key(env_path, 'aks_service_name')
aks_service = AksWebservice(ws, name=aks_service_name)

We will test our service concurrently but only have 4 concurrent requests at any time. We have only deployed one pod on one node and increasing the number of concurrent calls does not really increase throughput. Feel free to try different values and see how the service responds.

In [None]:
CONCURRENT_REQUESTS = 4   # Number of requests at a time

Get the scoring URL and API key of the service.

In [7]:
scoring_url = aks_service.scoring_uri
api_key = aks_service.get_keys()[0]

Below we are going to use [Locust](https://locust.io/) to load test our deployed model. First we need to write the locustfile.

In [9]:
%%writefile locustfile.py
from locust import HttpLocust, TaskSet, task
import os
import pandas as pd
from utilities import text_to_json
from itertools import cycle


_NUMBER_OF_REQUESTS = os.getenv('NUMBER_OF_REQUESTS', 100)
dupes_test_path = './data_folder/dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
dupes_to_score = dupes_test.iloc[:_NUMBER_OF_REQUESTS,4]
_SCORE_PATH = os.getenv('SCORE_PATH', "/score")
_API_KEY = os.getenv('API_KEY')


class UserBehavior(TaskSet):
    def on_start(self):
        print('Running setup')
        self._text_generator =  cycle(dupes_to_score.apply(text_to_json))
        self._headers = {
             "content-type": "application/json",
             'Authorization':('Bearer {}'.format(_API_KEY))
        }
        
    @task
    def score(self):
        self.client.post(_SCORE_PATH, data=next(self._text_generator), headers=self._headers)


class WebsiteUser(HttpLocust):
    task_set = UserBehavior
    # min and max time to wait before repeating task
    min_wait = 10
    max_wait = 200

Overwriting locustfile.py


Below we define the locust command we want to run. We are going to run at a hatch rate of 10 and the whole test will last 1 minute. Feel free to adjust the parameters below and see how the results differ. The results of the test will be saved to two csv files **modeltest_requests.csv** and **modeltest_distribution.csv**

In [10]:
parsed_url = urlparse(scoring_url)
cmd = "locust -H {host} --no-web -c {users} -r {rate} -t {duration} --csv=modeltest --only-summary".format(
    host="{url.scheme}://{url.netloc}".format(url=parsed_url),
    users=CONCURRENT_REQUESTS,  # concurrent users
    rate=10,                    # hatch rate (users / second)
    duration='1m',              # test duration
)

In [11]:
! API_KEY={api_key} SCORE_PATH={parsed_url.path} PYTHONPATH={os.path.abspath('../')} {cmd}

[2019-05-07 18:14:59,817] msvole2/INFO/locust.main: Run time limit set to 60 seconds
[2019-05-07 18:14:59,817] msvole2/INFO/locust.main: Starting Locust 0.11.0
[2019-05-07 18:14:59,817] msvole2/INFO/locust.runners: Hatching and swarming 4 clients at the rate 10 clients/s...
[2019-05-07 18:14:59,818] msvole2/INFO/stdout: Running setup
[2019-05-07 18:14:59,818] msvole2/INFO/stdout: 
[2019-05-07 18:15:00,108] msvole2/INFO/stdout: Running setup
[2019-05-07 18:15:00,108] msvole2/INFO/stdout: 
[2019-05-07 18:15:00,385] msvole2/INFO/stdout: Running setup
[2019-05-07 18:15:00,386] msvole2/INFO/stdout: 
[2019-05-07 18:15:00,665] msvole2/INFO/stdout: Running setup
[2019-05-07 18:15:00,665] msvole2/INFO/stdout: 
[2019-05-07 18:15:00,945] msvole2/INFO/locust.runners: All locusts hatched: WebsiteUser: 4
[2019-05-07 18:15:59,817] msvole2/INFO/locust.main: Time limit reached. Stopping Locust.
[2019-05-07 18:15:59,818] msvole2/INFO/locust.main: Shutting down (exit code 0), bye.
[2019-05-07 18:15:59,81

Here are the summary results of our test and below that the distribution infromation of those tests. 

In [12]:
pd.read_csv("modeltest_requests.csv")

Unnamed: 0,Method,Name,# requests,# failures,Median response time,Average response time,Min response time,Max response time,Average Content Size,Requests/s
0,POST,/api/v1/service/askservice/score,806,0,180,184,86,436,181,13.62
1,,Total,806,0,180,184,86,436,181,13.62


In [13]:
pd.read_csv("modeltest_distribution.csv")

Unnamed: 0,Name,# requests,50%,66%,75%,80%,90%,95%,98%,99%,100%
0,POST /api/v1/service/askservice/score,806,180,200,210,220,250,270,290,310,440
1,Total,806,180,200,210,220,250,270,290,310,440


To tear down the cluster and all related resources go to the [tear down the cluster](07_TearDown.ipynb) notebook.