# AI Platform Prediction Load Testing using Locust

This notebook demonstrates how to perform load testing of AI Platform Prediction using [Locust](https://locust.io) - an open source load testing tool. 


### Load testing environment

The below diagram depicts the load testing environment utilized in this example.

![Test harness](images/aipp-locust.png)


In the environment, Locust is run in a distributed mode on a GKE cluster. Locust's master and workers are deployed to the cluster as Kubernetes [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) using a custom docker image dervied from the baseline [locustio/locust](https://hub.docker.com/r/locustio/locust) image. The custom image incorporates the [locustfile](locust/locust-image/tasks.py) script and its dependencies.

The script simulates calls to the `predict` method of the  AI Platform Prediction REST endpoint. The parameters of the method (project, model, model version, and signature) and test instances passed in the method's body are retrieved from Cloud Storage location at the start of each test.

In addition to simulating requests, the script logs test statistics managed by the Locust master to [Cloud Logging](https://cloud.google.com/logging). 
The log entries created by the script are used to define a set of [Log-based metrics](https://cloud.google.com/logging/docs/logs-based-metrics) that complement standard [AI Platform Prediction metrics](https://cloud.google.com/monitoring/api/metrics_gcp#gcp-ml). 

Load tests can be configured, started, and stoped using **Locust's** [web interface](https://docs.locust.io/en/stable/quickstart.html#locust-s-web-interface). The **Locust's** web interface is enabled on the Locust master and exposed through a Kubernetes [Service](https://kubernetes.io/docs/concepts/services-networking/service/) configured as an external load balancer.

The progress of the tests can be monitored using [Locust's web interface](https://docs.locust.io/en/stable/quickstart.html#locust-s-web-interface) and/or a Cloud Monitoring [dashboard](https://cloud.google.com/monitoring/dashboards). The advantage of a Cloud Monitoring dashboard is that it can combine AI Platform Prediction metrics with custom Locust log-based metrics. You can find an example dashboard template in the `dashboard_template` folder.

After a test completes, the test's metrics are retrieved from Cloud Monitoring and consolidated into a Pandas dataframe to facilitate comprehensive post-mortem analysis.  The `04-analyze-test.ipynb` notebook demonstrates how to use Pandas and Matplotlib to analyze and interpret test runs. 


## Install pre-requisites

In [None]:
%pip install -U locust google-cloud-monitoring google-cloud-logging 

**You may need to restart the kernel to use updated packages!**

In [1]:
import base64
import os
import time
import datetime
import json
import requests

import numpy as np
import pandas as pd

import google.auth

from typing import List

from google.api_core.exceptions import GoogleAPICallError 

from google.cloud import logging_v2
from google.cloud.logging_v2 import MetricsServiceV2Client
from google.cloud.logging_v2 import LoggingServiceV2Client

from google.cloud.monitoring_dashboard.v1.types import Dashboard
from google.cloud.monitoring_dashboard.v1 import DashboardsServiceClient
from google.cloud.monitoring_v3 import MetricServiceClient
from google.cloud.monitoring_v3.query import Query
from google.cloud.monitoring_v3.types import TimeInterval

from google.protobuf.json_format import ParseDict



## Configuring and deploying the test environment

### Creating log based metrics

In this section of the notebook you will use the [Python Cloud Logging client library](https://googleapis.dev/python/logging/latest/v2.html) to create a set of custom log-based metrics. The metrics are based on the log entries generated by the example locustfile script. The script writes the log entries into the Cloud Logging log named `locust`.

Each log entry includes a set of key value pairs encoded as the JSON payload type. The metrics are based on the subset of keys from the log entry.

Key | Value
----|------
test_id | An ID of a test
model | An AI Platform Prediction Model name
model_version | An AI Platform Prediction Model version
latency | A 95 percentile response time, which is calculated over a 10 sliding second window
num_requests | A total number of requests since the test started
num_failures | A total number of requests since the test started
user_count | A number of simulated users 
rps | A current requests per second


Refer to the [Cloud Logging API reference](https://googleapis.dev/python/logging/latest/v2.html) form more information about the API.

#### Define a helper function that creates a custom log metric

In [2]:
def create_locust_metric(
    metric_name:str,
    log_path:str, 
    value_field:str,  
    bucket_bounds:List[int]):
    
    metric_path = logging_client.metric_path(project_id, metric_name)
    log_entry_filter = 'resource.type=global AND logName={}'.format(log_path)
    
    metric_descriptor = {
        'metric_kind': 'DELTA',
        'value_type': 'DISTRIBUTION',
        'labels': [
            {
                'key': 'test_id',
                'value_type': 'STRING'
            },
            {
                'key': 'signature',
                'value_type': 'STRING'
            }
        ]
    }
    
    bucket_options = {
        'explicit_buckets': {
            'bounds': bucket_bounds
        }
    }
    
    value_extractor = 'EXTRACT(jsonPayload.{})'.format(value_field)
    label_extractors = {
        'test_id': 'EXTRACT(jsonPayload.test_id)',
        'signature': 'EXTRACT(jsonPayload.signature)'
    }
    
    metric = logging_v2.types.LogMetric(
        name=metric_name,
        filter=log_entry_filter,
        value_extractor=value_extractor,
        bucket_options=bucket_options,
        label_extractors=label_extractors,
        metric_descriptor=metric_descriptor,
    )
    
    try:
        logging_client.get_log_metric(metric_path)
        print('Metric: {} already exists'.format(metric_path))
    except:
        logging_client.create_log_metric(parent, metric)
        print('Created metric {}'.format(metric_path))

#### Create a logging client.

In [3]:
log_name = 'locust'

creds , project_id = google.auth.default()
logging_client = MetricsServiceV2Client(credentials=creds)

parent = logging_client.project_path(project_id)
log_path = LoggingServiceV2Client.log_path(project_id, log_name)

#### Create a metric to track Locust users.

In [4]:
metric_name = 'locust_users'
value_field = 'user_count'
bucket_bounds = [1, 16, 32, 64, 128]

create_locust_metric(metric_name, log_path, value_field, bucket_bounds)

Metric: projects/mlops-dev-env/metrics/locust_users already exists


#### Create a metric to track response times.

In [5]:
metric_name = 'locust_latency'
value_field = 'latency'
bucket_bounds = [1, 50, 100, 200, 500]

create_locust_metric(metric_name, log_path, value_field, bucket_bounds)

Metric: projects/mlops-dev-env/metrics/locust_latency already exists


#### Create a metric to track total failures

In [6]:
metric_name = 'num_failures'
value_field = 'num_failures'
bucket_bounds = [1, 1000]

create_locust_metric(metric_name, log_path, value_field, bucket_bounds)

Metric: projects/mlops-dev-env/metrics/num_failures already exists


#### Create a metric to track total requests

In [7]:
metric_name = 'num_requests'
value_field = 'num_requests'
bucket_bounds = [1, 1000]

create_locust_metric(metric_name, log_path, value_field, bucket_bounds)

Metric: projects/mlops-dev-env/metrics/num_requests already exists


#### List metrics

In [8]:
metrics = logging_client.list_log_metrics(parent)

if not list(metrics):
    print("There are not any log based metrics defined in the the project")
else:
    for element in logging_client.list_log_metrics(parent):
        print(element.metric_descriptor.name)

projects/mlops-dev-env/metricDescriptors/logging.googleapis.com/user/locust_latency
projects/mlops-dev-env/metricDescriptors/logging.googleapis.com/user/locust_users
projects/mlops-dev-env/metricDescriptors/logging.googleapis.com/user/num_failures
projects/mlops-dev-env/metricDescriptors/logging.googleapis.com/user/num_requests


### Creating Cloud Monitoring dashboard

The`dashboard_template` folder contains an example monitoring dashboard template that combines standard AI Platform Prediction metrics with log-based metrics defined in the previous steps. You can use [Python Client for Cloud Monitoring Dashboards API](https://googleapis.dev/python/monitoring-dashboards/latest/index.html) to create a dashboard based on the template.  

#### Load the dashboard template

In [11]:
dashboard_service_client = DashboardsServiceClient(credentials=creds)
parent = 'projects/{}'.format(project_id)

dashboard_template_file = 'dashboard_template/aipp-monitoring.json'
with open(dashboard_template_file) as f:
    dashboard_template = json.load(f)

#### Prepare a dashboard protobuf

In [10]:
dashboard_proto = Dashboard()
dashboard_proto = ParseDict(dashboard_template, dashboard_proto)

#### Create the dashboard in Cloud Monitoring

In [None]:
dashboard = dashboard_service_client.create_dashboard(parent, dashboard_proto)

#### List custom dashboards

In [12]:
for dashboard in dashboard_service_client.list_dashboards(parent):
    print('Dashboard name: {}, Dashboard ID: {}'.format(dashboard.display_name, dashboard.name))

Dashboard name: test, Dashboard ID: projects/881178567352/dashboards/0d899081-d606-417d-9b0d-772ebc737dd2
Dashboard name: AI Platform Prediction and Locust, Dashboard ID: projects/881178567352/dashboards/bd5e5d15-a726-4097-ba84-45d526040a1e


### Deploying Locust to a GKE cluster

Before proceeding, you need an access to a GKE cluster. The described deployment process can deploy Locust to any GKE cluster as long as there are enough compute resources to support your Locust configuration. The default configuration follows the Locust's best practices and requests one processor core and 4Gi of memory for the Locust master and one processor core and 2Gi of memory for each Locust worker. As you run your tests, it is important to monitor the the master and the workers for resource utilization and fine tune the allocated resources as required.

The deployment process has been streamlined using [Kustomize](https://kustomize.io/). As described in the following steps, you can fine tune the baseline configuration by modifying the default `kustomization.yaml` and `patch.yaml` files in the `locust/manifests` folder.



#### Install Kustomize

The configuration files depend on the latest version of Kustomize. 

##### Download Kustomize:

In [13]:
!curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"  | bash

{Version:kustomize/v3.8.1 GitCommit:0b359d0ef0272e6545eda0e99aacd63aef99c4d0 BuildDate:2020-07-16T00:58:46Z GoOs:linux GoArch:amd64}
kustomize installed to current directory.


##### Move Kustomize to a folder on your path
The following command moves the Kustomize executable to `/usr/local/bin`. Modify the command if you prefer to move it to some other location on your `path`.

In [14]:
!sudo mv kustomize /usr/local/bin

#### Set credentials to access your GKE cluster

Use, the `gcloud` command to set credentials to your GKE cluster. Make sure to update the `cluster_name` and `cluster_zone` variables with values reflecting your environment.

In [15]:
cluster_name = 'locust'
cluster_zone = 'us-central1-a'

!gcloud container clusters get-credentials {cluster_name} --zone {cluster_zone}

Fetching cluster endpoint and auth data.
kubeconfig entry generated for locust.


#### Build the Locust image

The first step is to build a docker image that will be used to deploy Locust master and worker pods. The image is derived from the [baseline locust.io image](https://hub.docker.com/r/locustio/locust) and embeds the locustfile and the files's dependencies.

In [17]:
!tail -n 5 locust/locust-image/Dockerfile

FROM locustio/locust
WORKDIR /tasks
COPY tasks.py .
RUN pip install -U google-auth google-cloud-storage google-cloud-logging python-dotenv



In [None]:
image_uri = 'gcr.io/{}/locust'.format(project_id)

!gcloud builds submit --tag {image_uri} locust/locust-image

#### Update the manifests

Before proceeding with deployment, you need to update the default manifests. The manifests are located in the `locust/manifests` folder. You will modify two files: `kustomization.yaml` and `patch.yaml`.

##### Set the name of the custom Locust image

You need to update the `kustomization.yaml` file with a reference to the custom image your created in the previous step. 

Update the `newName` field in the `images` section of the `kustomization.yaml` file.

In [18]:
!sed -n '22,26p' locust/manifests/kustomization.yaml


images:
- name: locustio/locust
  newName: gcr.io/mlops-dev-env/locust:latest



##### Set the number of worker pods 

The default configuration deploys 32 worker pods. If you want to change it, modify the `count` field in the `replicas` section of the `kustomization.yaml` file.

In [19]:
!sed -n '26,30p' locust/manifests/kustomization.yaml


replicas:
- name: locust-worker
  count: 32
    


##### Set the GCS bucket for the test configuration and data files

As described in more detail in the later section of the notebook, every time you start a test, the locustfile script attempts to retrieve a test configuration and test data files from a GCS location. You need to configure the name of the GCS bucket hosting the files and the name of the files in `kustomization.yaml`.

Modify the `configMapGenerator` section of the file. Specifically, set the `LOCUST_TEST_BUCKET`, `LOCUST_TEST_CONFIG`, and `LOCUST_TEST_DATA` literals to the GCS bucket name, the test config file name, and the test data config file name respectively.

In [20]:
!sed -n '35,52p' locust/manifests/kustomization.yaml


configMapGenerator:
- name: test-config-locations
  literals:
    - LOCUST_TEST_BUCKET=mlops-dev-workspace
    - LOCUST_TEST_CONFIG=test-config/test-config.json
    - LOCUST_TEST_DATA=test-config/test-data.json
  options:
    disableNameSuffixHash: true

##### Modify the node pool that hosts Locust master and workers

By default, master and worker pods are deployed to the `default-pool` node pool. If you want to change it (recommended), update the name of the node pool in the `patch.yaml` file. The name of the node pool is a value of the `values` field in the `matchExpressions` section.

In [21]:
!tail -n 17 locust/manifests/patch.yaml


apiVersion: apps/v1
kind: Deployment
metadata:
  name: not-important
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-nodepool
                operator: In
                values:
                - locust     


#### Deploy Locust

You are now ready to deploy Locust.

In [22]:
!kustomize build locust/manifests |kubectl apply -f -

configmap/test-config-locations created
service/locust-master created
deployment.apps/locust-master created
deployment.apps/locust-worker created


## Running load tests

Load tests can be configured, started, monitored and stopped using using Locust's [web interface](https://docs.locust.io/en/stable/quickstart.html#locust-s-web-interface). 

In our deployment, the web interface is exposed by an external load balancer. You can access the interface using the following URL:

```
http://[EXTERNAL-IP]:8089
```

where `[EXTERNAL-IP]` can be retrieved by the below command.

In [23]:
!kubectl get service locust-master

NAME            TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)                                        AGE
locust-master   LoadBalancer   10.0.2.0     <pending>     8089:30492/TCP,5557:31531/TCP,5558:31961/TCP   37s


### Configure a Locust test

At the start of each test, the locustfile script attempts to retrieve test data and a test configuration from a GCS location. Both the test data and the test configuration are formated as JSON. 

The test data is an array of JSON objects, where each objects includes a list of instances and a model signature. If the array contains more than one object, Locust users will randomly pick a list of instances and an associated signature with each call to the `predict` method of the AI Platform Prediction endpoint.

The test configuration is a JSON object with a project id, model name, model version, and a test id.

#### Specify the GCS bucket for test data and test configuration

In [24]:
test_config_bucket = 'mlops-dev-workspace'

#### Prepare test data

In this example we are using the  **ResNet101** model developed in the `01-prepare-for-serving.ipynb` notebook and deployed to AI Platform Prediction in the `02-deploy-to-aipp.ipynb` notebook. We will prepare the instances for the `serving_preprocess` signature of the model using a couple of JPEG images from the `test_images` folder.

In [25]:
image_folder = 'test_images'
images = []
for image_name in os.listdir(image_folder):
    with open(os.path.join(image_folder, image_name), 'rb') as f:
        images.append(f.read())

In [26]:
single_instance = [{'b64': base64.b64encode(images[0]).decode('utf-8')}]
two_instances = [{'b64': base64.b64encode(image).decode('utf-8')} for image in images] 

In [27]:
test_data = [
        {
            'signature': 'serving_preprocess',
            'instances': single_instance
        },
        {
            'signature': 'serving_preprocess',
            'instances': two_instances
        }
    ]

In [28]:
test_data_local_file = 'test-data.json'
test_data_gcs_file = 'test-config/test-data.json'

with open (test_data_local_file, 'w') as f:
    json.dump(test_data, f)
    
!gsutil cp {test_data_local_file} gs://{test_config_bucket}/{test_data_gcs_file}

Copying file://test-data.json [Content-Type=application/json]...
/ [1 files][242.8 KiB/242.8 KiB]                                                
Operation completed over 1 objects/242.8 KiB.                                    


#### Prepare test config

Make sure to update the below mapping with the values representing your environment. The `test_id` is an arbitrary value that is used to match the custom log-based metrics records with a given test run. Use a different value anytime you start a test.

In [29]:
test_config = {
    'test_id': 'test-3-2020-08-13',
    'project_id': 'mlops-dev-env',
    'model': 'ResNet101',
    'version': 'batching_150'
}

In [30]:
test_config_local_file = 'test-config.json'
test_config_gcs_file = 'test-config/test-config.json'

with open (test_config_local_file, 'w') as f:
    json.dump(test_config, f)

!gsutil cp {test_config_local_file} gs://{test_config_bucket}/{test_config_gcs_file}

Copying file://test-config.json [Content-Type=application/json]...
/ [1 files][  112.0 B/  112.0 B]                                                
Operation completed over 1 objects/112.0 B.                                      


#### Double check the test data and config in GCS

In [31]:
!gsutil cat gs://{test_config_bucket}/{test_config_gcs_file}

{"test_id": "test-3-2020-08-13", "project_id": "mlops-dev-env", "model": "ResNet101", "version": "batching_150"}

In [32]:
!gsutil cat -r 0-150 gs://{test_config_bucket}/{test_data_gcs_file}

[{"signature": "serving_preprocess", "instances": [{"b64": "/9j/4AAQSkZJRgABAQEAYABgAAD//gBGRmlsZSBzb3VyY2U6IGh0dHA6Ly9jb21tb25zLndpa2ltZWRpYS5vcmcvd2l

#### Run and monitor tests

You are now ready to run the tests. Use the Locust web UI to start and monitor the tests. To see the consolidated view of AI Platform Prediction performance metrics and Locust client metrics use the Cloud Monitoring dashboard created in the previous step.

## Retrieving and consolidating test results

Locust's web interface and a Cloud Monitoring dashboard provide a cursory view into performance of a tested AI Platform Prediction model version. A more thorough analysis can be performed by consolidating metrics collected during a test and using data analytics and visualization tools.

In this section, you will retrieve the metrics captured in Cloud Monitoring and consolidate them into a single Pandas dataframe. The `04-analyze-test-results.ipynb` notebook demonstrates how to analyze the consolidated results using Pandas and Matplotlib.

You will use the Python Cloud Monitoring client library. Refer to the [Cloud Monitoring API reference](https://googleapis.dev/python/monitoring/latest/gapic/v3/api.html) form more information about the API.

### List available AI Platform Prediction metrics

In [33]:
creds , project_id = google.auth.default()
client = MetricServiceClient(credentials=creds)

project_path = client.project_path(project_id)
filter = 'metric.type=starts_with("ml.googleapis.com/prediction")'

for descriptor in client.list_metric_descriptors(project_path, filter_=filter):
    print(descriptor.type)

ml.googleapis.com/prediction/error_count
ml.googleapis.com/prediction/latencies
ml.googleapis.com/prediction/online/accelerator/duty_cycle
ml.googleapis.com/prediction/online/accelerator/memory/bytes_used
ml.googleapis.com/prediction/online/cpu/utilization
ml.googleapis.com/prediction/online/memory/bytes_used
ml.googleapis.com/prediction/online/network/bytes_received
ml.googleapis.com/prediction/online/network/bytes_sent
ml.googleapis.com/prediction/online/replicas
ml.googleapis.com/prediction/online/target_replicas
ml.googleapis.com/prediction/prediction_count
ml.googleapis.com/prediction/response_count


### List custom log based metrics

In [34]:
filter = 'metric.type=starts_with("logging.googleapis.com/user")'

for descriptor in client.list_metric_descriptors(project_path, filter_=filter):
    print(descriptor.type)

logging.googleapis.com/user/locust_latency
logging.googleapis.com/user/locust_users
logging.googleapis.com/user/num_failures
logging.googleapis.com/user/num_requests


### Retrieve test metrics

#### Define a helper function that retrieves test metrics from Cloud Monitoring

In [35]:
def retrieve_metrics(client, project_id, start_time, end_time, model, model_version, test_id, log_name):
    """
    Retrieves test metrics from Cloud Monitoring.
    """
    def _get_aipp_metric(metric_type: str, labels: List[str]=[], metric_name=None)-> pd.DataFrame:
        """
        Retrieves a specified AIPP metric.
        """
        query = Query(client, project_id, metric_type=metric_type)
        query = query.select_interval(end_time, start_time)
        query = query.select_resources(model_id=model)
        query = query.select_resources(version_id=model_version)
        
        if metric_name:
            labels = ['metric'] + labels 
        df = query.as_dataframe(labels=labels)
        
        if not df.empty:
            if metric_name:
                df.columns.set_levels([metric_name], level=0, inplace=True)
            df = df.set_index(df.index.round('T'))
        
        return df
    
    def _get_locust_metric(metric_type: str, labels: List[str]=[], metric_name=None)-> pd.DataFrame:
        """
        Retrieves a specified custom log-based metric.
        """
        query = Query(client, project_id, metric_type=metric_type)
        query = query.select_interval(end_time, start_time)
        query = query.select_metrics(log=log_name)
        query = query.select_metrics(test_id=test_id)
        
        if metric_name:
            labels = ['metric'] + labels 
        df = query.as_dataframe(labels=labels)
        
        if not df.empty:    
            if metric_name:
                df.columns.set_levels([metric_name], level=0, inplace=True)
            df = df.apply(lambda row: [metric.mean for metric in row])
            df = df.set_index(df.index.round('T'))
        
        return df
    
    # Retrieve GPU duty cycle
    metric_type = 'ml.googleapis.com/prediction/online/accelerator/duty_cycle'
    metric = _get_aipp_metric(metric_type, ['replica_id', 'signature'], 'duty_cycle')
    df = metric

    # Retrieve CPU utilization
    metric_type = 'ml.googleapis.com/prediction/online/cpu/utilization'
    metric = _get_aipp_metric(metric_type, ['replica_id', 'signature'], 'cpu_utilization')
    if not metric.empty:
        df = df.merge(metric, how='outer', right_index=True, left_index=True)
    
    # Retrieve prediction count
    metric_type = 'ml.googleapis.com/prediction/prediction_count'
    metric = _get_aipp_metric(metric_type, ['replica_id', 'signature'], 'prediction_count')
    if not metric.empty:
        df = df.merge(metric, how='outer', right_index=True, left_index=True)
    
    # Retrieve responses per second
    metric_type = 'ml.googleapis.com/prediction/response_count'
    metric = _get_aipp_metric(metric_type, ['replica_id', 'signature'], 'response_rate')
    if not metric.empty:
        metric = (metric/60).round(2)
        df = df.merge(metric, how='outer', right_index=True, left_index=True)
    
    # Retrieve backend latencies
    metric_type = 'ml.googleapis.com/prediction/latencies'
    metric = _get_aipp_metric(metric_type, ['latency_type', 'replica_id', 'signature'])
    if not metric.empty:
        metric = metric.apply(lambda row: [round(latency.mean/1000,1) for latency in row])
        metric.columns.set_names(['metric', 'replica_id', 'signature'], inplace=True)
        level_values = ['Latency: ' + value for value in metric.columns.get_level_values(level=0)]
        metric.columns.set_levels(level_values, level=0, inplace=True)
        df = df.merge(metric, how='outer', right_index=True, left_index=True)
    
    # Retrieve Locust latency
    metric_type = 'logging.googleapis.com/user/locust_latency'
    metric = _get_locust_metric(metric_type, ['replica_id', 'signature'], 'Latency: client')
    if not metric.empty:
        metric = metric.round(2).replace([0], np.nan)
        df = df.merge(metric, how='outer', right_index=True, left_index=True)
    
    # Retrieve Locust user count
    metric_type = 'logging.googleapis.com/user/locust_users'
    metric = _get_locust_metric(metric_type, ['replica_id', 'signature'], 'User count')
    if not metric.empty:
        metric = metric.round()
        df = df.merge(metric, how='outer', right_index=True, left_index=True)
    
    # Retrieve Locust num_failures
    metric_type = 'logging.googleapis.com/user/num_failures'
    metric = _get_locust_metric(metric_type, ['replica_id', 'signature'], 'Num of failures')
    if not metric.empty:
        metric = metric.round()
        df = df.merge(metric, how='outer', right_index=True, left_index=True)
    
    # Retrieve Locust num_failures
    metric_type = 'logging.googleapis.com/user/num_requests'
    metric = _get_locust_metric(metric_type, ['replica_id', 'signature'], 'Num of requests')
    if not metric.empty:
        metric = metric.round()
        df = df.merge(metric, how='outer', right_index=True, left_index=True)

    return df
    

#### Retrieve metrics for a specific test and time period.

Update the below variables with the values used to configure the test whose metrics you want to retrieve.

In [36]:
model = 'ResNet1'
model_version = 'batching_100'
log_name = 'locust'
test_id = 'test-2-2020-08-13'
test_start_time = datetime.datetime.fromisoformat('2020-08-13T12:50:00-07:00')
test_end_time = datetime.datetime.fromisoformat('2020-08-13T14:20:00-07:00')


In [37]:
df = retrieve_metrics(client, project_id, test_start_time, test_end_time, model, model_version, test_id, log_name)
df

metric,duty_cycle,cpu_utilization,prediction_count,response_rate,Latency: api server,Latency: model,Latency: network,Latency: overhead,Latency: total,Latency: client,User count,Num of failures,Num of requests
replica_id,resnb3624f-batc48cbf2-7dd9b997c4-dzxr7,resnb3624f-batc48cbf2-7dd9b997c4-dzxr7,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
signature,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
2020-08-13 19:51:00,0.00,0.100362,,,,,,,,,,,
2020-08-13 19:52:00,0.00,0.100406,,,,,,,,,,,
2020-08-13 19:53:00,0.00,0.099228,0.0,,0.0,0.0,0.0,0.0,0.0,,,,
2020-08-13 19:54:00,0.00,0.098050,218.0,1.07,32.9,255.6,15.0,47.9,303.5,1207.50,8.0,0.0,184.0
2020-08-13 19:55:00,0.08,0.152498,318.0,4.93,24.4,119.4,10.5,34.9,154.3,241.82,8.0,0.0,416.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-08-13 21:16:00,0.56,0.544746,5490.0,90.80,1.7,149.7,9.4,11.2,160.9,235.00,152.0,0.0,253670.0
2020-08-13 21:17:00,0.57,0.614392,5457.0,91.32,1.7,149.7,9.2,10.9,160.5,237.50,152.0,0.0,259212.0
2020-08-13 21:18:00,0.57,0.684038,5424.0,90.40,1.7,148.9,9.4,11.1,160.1,235.00,152.0,0.0,264758.0
2020-08-13 21:19:00,0.56,0.645338,5456.0,90.95,1.8,148.2,10.3,12.0,160.2,238.75,152.0,0.0,270986.0


The retrieved dataframe uses [hierarchical indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) for column names. The reason is that some metrics contain multiple time series. For example, the GPU `duty_cycle` metric includes a time series of measures per each GPU used in the deployment (denoted as `replica_id`). The top level of the column index is a metric name. The second level is a `replica_id`. The third level is a `signature` of a model.

All metrics are aligned on the same timeline. 



### Serialize the metrics dataframe

The consolidated metrics can be saved for a later analysis by saving the dataframe in the Python `pickle` format.

In [None]:
results_path = 'test_results/{}.gzip'.format(test_id)

df.to_pickle(results_path)

## Next steps

The `04-analyze-tests.ipynb` notebook demonstrates how to use Pandas and Matplotlib to perform a detailed analysis of the load testing runs

## Cleaning up

### Remove the Locust deployment

In [16]:
!kustomize build locust/manifests | kubectl delete -f -

configmap "test-config-locations" deleted
service "locust-master" deleted
deployment.apps "locust-master" deleted
deployment.apps "locust-worker" deleted


### Delete the log based metrics

In [None]:
creds , project_id = google.auth.default()

logging_client = MetricsServiceV2Client(credentials=creds)
parent = logging_client.project_path(project_id)

for element in logging_client.list_log_metrics(parent):
    metric_path = logging_client.metric_path(project_id, element.name)
    logging_client.delete_log_metric(metric_path)
    print("Deleted metric: ", metric_path)

### Delete the dasboard

In [None]:
dashboard_service_client = DashboardsServiceClient(credentials=creds)
parent = 'projects/{}'.format(project_id)

dashboard_service_client.delete_dashboard(parent, dashboard)