diff --git a/walkthroughs/howto-k8s-appmesh-load-test/.gitignore b/walkthroughs/howto-k8s-appmesh-load-test/.gitignore new file mode 100644 index 00000000..15b6da23 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/.gitignore @@ -0,0 +1,3 @@ +./logs +.idea/ +scripts/data diff --git a/walkthroughs/howto-k8s-appmesh-load-test/README.md b/walkthroughs/howto-k8s-appmesh-load-test/README.md new file mode 100644 index 00000000..a73c1a28 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/README.md @@ -0,0 +1,246 @@ +# AppMesh K8s Load Test +This walkthrough demonstrates how to load test AppMesh on EKS. It can be used as a tool for further load testing in different mesh configuration. +We use [Fortio](https://github.com/fortio/fortio) to generate the load. This load test is for AppMesh on EKS, therefore, we +need [aws-app-mesh-controller-for-k8s](https://github.com/aws/aws-app-mesh-controller-for-k8s) to run it. Note that, the load test runs as a part of the controller integration test, hence, +we need the controller repo in this walkthorugh. Following are the key components of this load test: + + +* Configuration JSON: This specifies the details of the mesh, such as Virtual Nodes and their backends, a list of parameters for the load generator e.g., query per seconds (QPS), duration for each experiment to run and a list of metrics (and their corresponding logic) that need to be captured in the load test. +The details of the `config.json` can be found in [Step 3: Configuring the Load Test](#step-3:-configuring-the-load-test). +* Driver script: This bash script (`scripts/driver.sh`) sets up port-forwarding of the prometheus and starts the load test as part of the AppMesh K8s Controller integration tests. +* AppMesh K8s Controller: The K8s Controller for AppMesh [integration testing code](https://github.com/aws/aws-app-mesh-controller-for-k8s/tree/master/test/e2e/fishapp/load) is the +entry point of our load test. It handles creation of a meshified app with Virtual Nodes, Virtual Services, backends etc. It also cleans up resources and spins down the mesh after +finishing the test. The list of unique values under the adjacency list `backends_map` in `config.json` provides the number of Virtual Nodes that need to be created and the map +values provide the backend/edge connections of each node’s virtual service. These services corresponding to the backend connections will be configured as environment variables +at the time of creation of the deployment. The *Custom Service* looks for this environment variable when re-routing incoming HTTP requests. +* Custom service: The custom service `scripts/request_handler.py` script runs on each pod which receives incoming requests and makes calls to its “backend” services according to the `backends_map` +in the `config.json`. This is a simple HTTP server that runs on each pod which handles incoming requests and in turn routes them to its backend services. This backends info is +initialized as an environment variable at the time of pod creation. The custom service script is mounted onto the deployment using *ConfigMaps* +(see [createConfigMap](https://github.com/aws/aws-app-mesh-controller-for-k8s/blob/420a437f68e850a32f395f9ecd4917d62845d25a/test/e2e/fishapp/load/dynamic_stack_load_test.go) for +more details) to reduce development time (avoids creating Docker containers, pushing them to the registry, etc.). If the response from all its backends is SUCCESS/200 OK, then it +returns a 200. If any one of the responses is a failure, it returns a 500 HTTP error code. If it does not have any backends, it auto returns a 200 OK. +* Fortio: The [Fortio](https://github.com/fortio/fortio) load generator hits an endpoint in the mesh to simulate traffic by making HTTP requests to the given endpoint at the +requested QPS for the requested duration. The default endpoint in the mesh is defined in `URL_DEFAULT` under `scripts/constants.py`. Since fortio needs to access an endpoint within +the mesh, we install fortio inside the mesh with its own Virtual Node, K8 service and deployment. See the `fortio.yaml` file for more details. The K8s service is then port-forwarded +to the local machine so that REST API calls can be sent from local. +* AppMesh-Prometheus: Prometheus scrapes the required Envoy metrics during load test from each pod at specified interval. It has its own query language [*PromQL*]((https://prometheus.io/docs/prometheus/latest/querying/operators/)) which is helpful +for aggregating metrics at different granularities before exporting them. +* Load Driver: The load driver `scripts/load_driver.py` script reads the list of tests from the `config.json`, triggers the load, fetches the metrics from the Prometheus server using +its APIs and writes to persistent storage such as S3. This way, we have access to history data even if the Prometheus server spins down for some reason. The API endpoints support +PromQL queries so that aggregate metrics can be fetched directly instead of collecting raw metrics and writing separate code for aggregating them. The start and end timestamps of the +test will be noted for each test and the metrics will be queried using this time range. +* S3 storage for metrics: Experiments are uniquely identified by their `test_name` defined in `config.json`. Multiple runs of the same experiment are identified by their run +*timestamps* (in YYYYMMDDHHMMSS format). Hence, there will be a 1:1 mapping between the `test_name` and the set of config parameters in the JSON. Metrics are stored inside above +subfolders along with a metadata file specifying the parameter values used in the experiment. A list of metrics can be found in `metrics` defined under `config.json`. + + +Following is a flow diagram of the load test: + +![Flow Diagram](./load_test_flow_dg.png "Flow Diagram") + +## Step 1: Prerequisites + +[//]: # (The following commands can be used to create an ec2 instance to run this load test. Make sure you already created the security-group, subnet, vpc and elasctic IP if you need it.) +[//]: # (Follow this https://docs.aws.amazon.com/cli/latest/userguide/cli-services-ec2-instances.html#launching-instances for more details.) +[//]: # (```shell) +[//]: # (aws ec2 run-instances --image-id ami-0534f435d9dd0ece4 --count 1 --instance-type t2.xlarge --key-name color-app-2 --security-group-ids sg-09581640015241144 --subnet-id subnet-056542d0b479a259a --associate-public-ip-address) +[//]: # (```) +### 1.1 Tools +We need to install the following tools first: +- Make sure you have the latest version of [AWS CLI v2](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) or [AWS CLI v1](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv1.html) installed (at least version `1.18.82` or above). +- Make sure to have `kubectl` [installed](https://kubernetes.io/docs/tasks/tools/install-kubectl/), at least version `1.13` or above. +- Make sure to have `jq` [installed](https://stedolan.github.io/jq/download/). +- Make sure to have `helm` [installed](https://helm.sh/docs/intro/install/). +- Install [eksctl](https://eksctl.io/). Please make you have version `0.21.0` or above installed + ```sh + curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp + + sudo mv -v /tmp/eksctl /usr/local/bin + ``` + + ```sh + eksctl version + 0.127.0 + ``` + +- Make sure you have [Python 3.9+](https://www.python.org/downloads/) installed. This walkthroguh is tested with Python 3.9.6. + ```shell + python3 --version + Python 3.9.6 + ``` +- Make sure [pip3](https://pip.pypa.io/en/stable/installation/) is installed. + ```shell + pip3 --version + pip 21.2.4 + ``` +- Make sure [Go](https://go.dev/doc/install) is installed. This walkthorugh is tested with go1.18. +- Make sure [Ginkgo](https://onsi.github.io/ginkgo/) v1.16.5 or later is installed. + ```shell + go install github.com/onsi/ginkgo/ginkgo@v1.16.5 + ``` + ```shell + ginkgo version + Ginkgo Version 1.16.5 + ``` + + +### 1.2 Installing AppMesh Controller for EKS +Follow this [walkthrough: App Mesh with EKS](../eks/) for details about AppMesh Controller for EKS. Don't forget to authenticate with your +AWS account in case you get `AccessDeniedException` or `GetCallerIdentity STS` error. + +1. Make sure you cloned the [AWS AppMesh controller repo](https://github.com/aws/aws-app-mesh-controller-for-k8s). We will need this controller repo +path (`CONTROLLER_PATH`) in [step 2](#step-2:-set-environment-variables). + + ``` + git clone https://github.com/aws/aws-app-mesh-controller-for-k8s.git + ``` +2. Create an EKS cluster with `eksctl`. Following is an example command to create a cluster with name `appmeshtest`: + + ```sh + eksctl create cluster \ + --name appmeshtest \ + --nodes-min 2 \ + --nodes-max 3 \ + --nodes 2 \ + --auto-kubeconfig \ + --full-ecr-access \ + --appmesh-access + # ... + # [✔] EKS cluster "appmeshtest" in "us-west-2" region is ready + ``` + +3. Update the `KUBECONFIG` environment variable according to the output of the above `eksctl` command: + + ```sh + export KUBECONFIG=~/.kube/eksctl/clusters/appmeshtest + ``` + If you need to update the `kubeconfig` file, you can follow this [guide](https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html) and run the following: + ```shell + aws eks update-kubeconfig --region $AWS_REGION --name $CLUSTER_NAME # in this example, $AWS_REGION us-west-2 and cluster-name appmeshtest + ``` + +4. Run the following set of commands to install the App Mesh controller + + ```sh + helm repo add eks https://aws.github.io/eks-charts + helm repo update + kubectl create ns appmesh-system + kubectl apply -k "https://github.com/aws/eks-charts/stable/appmesh-controller/crds?ref=master" + helm upgrade -i appmesh-controller eks/appmesh-controller --namespace appmesh-system + ``` + +### 1.3 Load Test Setup +Clone this repository and navigate to the `walkthroughs/howto-k8s-appmesh-load-test` folder. All the commands henceforth are assumed to be run from the same directory as this `README`. +1. Run the following command to install all python dependencies required for this test + ```shell + pip3 install -r requirements.txt + ``` +2. Install "appmesh-prometheus". You may follow this [App Mesh Prometheus](https://github.com/aws/eks-charts/tree/master/stable/appmesh-prometheus) chart for installation support. + + ```sh + helm upgrade -i appmesh-prometheus eks/appmesh-prometheus --namespace appmesh-system + ``` +3. Load test results will be stored into S3 bucket. So, in `scripts/constants.py` give your `S3_BUCKET` a unique name. + +## Step 2: Set Environment Variables +We need to set a few environment variables before starting the load tests. + +```bash +export CONTROLLER_PATH= +export CLUSTER_NAME= +export KUBECONFIG= +export AWS_REGION=us-west-2 +export VPC_ID= +``` +You can change these `env` variables in `vars.env` file and then apply it using: `source ./vars.env`. + + + + +## Step 3: Configuring the Load Test +All parameters of the mesh, load tests, metrics can be specified in `config.json` + +* `backends_map` -: The mapping from each Virtual Node to its backend Virtual Services. For each unique node name in `backends_map`, +a VirtualNode, Deployment, Service and VirtualService (with its VirtualNode as its target) are created at runtime. An example `backends_map` is following: + ``` + "backends_map": { + "0": ["1", "2"], + "1": ["3"], + "2": ["4"] + }, + ``` + where the virtual node names are `"0"`, `"1"`, `"2"`, `"3"` and `"4"`. + +* `load_tests` -: Array of different test configurations that need to be run on the mesh. + * `test_name`: Name of the experiment. This name will be used to store the experimenter results into S3. + * `url`: is the service endpoint that Fortio (load generator) will hit. The `url` format is: `http://service-.tls-e2e.svc.cluster.local:9080/`. + For example, based on the above `backends_map`, if we want to send the load traffic to the first virtual node `"0"`, then the `ulr` will look like: + `http://service-0.tls-e2e.svc.cluster.local:9080/`. + * `qps`: Total Queries Per Seconds fortio sends to the endpoints. + * `t`: How long the test will run. + * `c`: Number of parallel simultaneous connections to the endpoints fortio hits. + * Optionally, you can add more load generation parameter by following the [Forito documentation](https://github.com/fortio/fortio). + +* `metrics` -: Map of metric_name to the corresponding metric [PromQL logic](https://prometheus.io/docs/prometheus/latest/querying/operators/). + +### Description of other files +- `load_driver.py` -: Script which reads `config.json` and triggers load tests, reads metrics from PromQL and writes to S3. Called from within ginkgo. +- `fortio.yaml` -: Spec of the Fortio components which are created during runtime. +- `request_handler.py` and `request_handler_driver.sh` -: The custom service that runs in each of the pods to handle and route incoming requests according +to the mapping in `backends_map`. +- `configmap.yaml` -: ConfigMap spec to mount above request_handler* files into the cluster instead of creating Docker containers. +Don't forget to use the absolute path of `request_handler_driver.sh`. +- `cluster.yaml` -: This is optional and an example EKS cluster config file. This `cluster.yaml` can be used to create an EKS cluster by running `eksctl create cluster -f cluster.yaml`. + + + +## Step 4: Running the Load Test +Run the driver script using the below command -: + +```sh +/bin/bash scripts/driver.sh +``` + +The driver script will perform the following -: +1. Checks necessary environment variables are set which is required to run this load test. +2. Port-forward the Prometheus service to local. +3. Run the Ginkgo test which is the entrypoint for our load test. +4. Kill the Prometheus port-forwarding after the load Test is done. + + +## Step 5: Analyze the Results +All the test results are saved into `S3_BUCKET` which was specified in `scripts/constants.py`. Optionally, you can run the `scripts/analyze_load_test_data.py` to visualize the results. +The `analyze_load_test_data.py` will: +* First download all the load test results from the `S3_BUCKET` into `scripts\data` directory, then +* Plot a graph against the actual QPS (query per seconds) Fortio sends to the first VirtualNode vs the max memory consumed by the container of that VirtualNode. + +## Step 6: Clean-up + +After the load test is finished, the mesh (including its dependent resources such as virtual nodes, services etc.) and the corresponding Kubernetes +namespace (currently this load test uses `tls-e2e` namespace) will be cleaned automatically. However, in case the test is stopped, perhaps because of manual intervention like pressing +ctrl + c, the automatic cleanup process may not be finished. In that case we have to manually clean up the mesh and the namespace. +- Delete the namespace: + ```sh + kubectl delete ns tls-e2e + ``` +- The mesh created in our load test starts with `$CLUSTER_NAME` + 6 character long alphanumeric random string. So search for the exact mesh name by running: + ```sh + kubectl get mesh --all-namespaces + ``` + Then delete the mesh + ```shell + kubectl delete mesh $CLUSTER_NAME+6 character long alphanumeric random string + ``` + +- Delete the controller and prometheus + + ```shell + helm delete appmesh-controller -n appmesh-system + helm delete appmesh-prometheus -n appmesh-system + kubectl delete ns appmesh-system + ``` +- Finally, get rid of the EKS cluster to free all compute, networking, and storage resources, using: + + ```sh + eksctl delete cluster --name $CLUSTER_NAME # In our case $CLUSTER_NAME is appmeshtest + ``` \ No newline at end of file diff --git a/walkthroughs/howto-k8s-appmesh-load-test/cluster.yaml b/walkthroughs/howto-k8s-appmesh-load-test/cluster.yaml new file mode 100644 index 00000000..43ee0281 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/cluster.yaml @@ -0,0 +1,15 @@ +--- +apiVersion: eksctl.io/v1alpha5 +kind: ClusterConfig + +metadata: + name: basic-cluster + region: us-west-2 + +nodeGroups: + - name: ng-1 + instanceType: m5.4xlarge + desiredCapacity: 4 + volumeSize: 80 + ssh: + allow: true # will use ~/.ssh/id_rsa.pub as the default ssh key diff --git a/walkthroughs/howto-k8s-appmesh-load-test/config.json b/walkthroughs/howto-k8s-appmesh-load-test/config.json new file mode 100644 index 00000000..a3422d34 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/config.json @@ -0,0 +1,22 @@ +{ + "IsTLSEnabled": true, + "IsmTLSEnabled": false, + "ReplicasPerVirtualNode": 1, + "ConnectivityCheckPerURL": 400, + "backends_map": { + "0": ["1", "2"], + "1": ["3"], + "2": ["4"] + }, + "load_tests": [ + {"test_name": "experiment_x", "url": "http://service-0.tls-e2e.svc.cluster.local:9080/", "qps": "5000", "t": "10s", "c":"400"} + ], + "metrics": { + "envoy_ingress_rate_by_replica_set": "sum(rate(envoy_cluster_upstream_rq{job=\"appmesh-envoy\",kubernetes_pod_name=~\"node-.*\",envoy_cluster_name=~\"cds_ingress.*\"}[10s])) by (kubernetes_pod_name)", + "envoy_2xx_requests_rate_by_replica_set": "sum(rate(envoy_cluster_upstream_rq{job=\"appmesh-envoy\",kubernetes_pod_name=~\"node-.*\",envoy_response_code=~\"2.*\",envoy_cluster_name=~\"cds_ingress.*\"}[10s])) by (kubernetes_pod_name)", + "envoy_4xx_requests_rate_by_replica_set": "sum(rate(envoy_cluster_upstream_rq{job=\"appmesh-envoy\",kubernetes_pod_name=~\"node-.*\",envoy_response_code=~\"4.*\",envoy_cluster_name=~\"cds_ingress.*\"}[10s])) by (kubernetes_pod_name)", + "envoy_5xx_requests_rate_by_replica_set": "sum(rate(envoy_cluster_upstream_rq{job=\"appmesh-envoy\",kubernetes_pod_name=~\"node-.*\",envoy_response_code=~\"5.*\",envoy_cluster_name=~\"cds_ingress.*\"}[10s])) by (kubernetes_pod_name)", + "envoy_memory_MB_by_replica_set": "sum(label_replace(container_memory_working_set_bytes{container=\"envoy\",pod=~\"node-.*\"}, \"replica_set\", \"$1\", \"pod\", \"(node-.*-.*)-.*\")) by (replica_set) / (1024*1024)", + "envoy_cpu_usage_seconds_by_replica_set": "sum(label_replace(container_cpu_usage_seconds_total{container=\"envoy\",pod=~\"node-.*\"}, \"replica_set\", \"$1\", \"pod\", \"(node-.*-.*)-.*\")) by (replica_set)" + } +} \ No newline at end of file diff --git a/walkthroughs/howto-k8s-appmesh-load-test/configmap.yaml b/walkthroughs/howto-k8s-appmesh-load-test/configmap.yaml new file mode 100644 index 00000000..c36eeca9 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/configmap.yaml @@ -0,0 +1,7 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: scripts-configmap +data: + request_handler_driver.sh: "/scripts/request_handler_driver.sh" diff --git a/walkthroughs/howto-k8s-appmesh-load-test/fortio.yaml b/walkthroughs/howto-k8s-appmesh-load-test/fortio.yaml new file mode 100644 index 00000000..804aa60c --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/fortio.yaml @@ -0,0 +1,57 @@ +--- +apiVersion: appmesh.k8s.aws/v1beta2 +kind: VirtualNode +metadata: + name: fortio + namespace: tls-e2e +spec: + podSelector: + matchLabels: + app: fortio + listeners: + - portMapping: + port: 8080 + protocol: http + backends: + - virtualService: + virtualServiceRef: + name: service-0 # Replace with one of the fishap VS + namespace: tls-e2e + serviceDiscovery: + dns: + hostname: fortio.tls-e2e.svc.cluster.local +--- +apiVersion: v1 +kind: Service +metadata: + name: fortio + namespace: tls-e2e +spec: + ports: + - port: 8080 + name: http + selector: + app: fortio +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: fortio + namespace: tls-e2e +spec: + replicas: 1 + selector: + matchLabels: + app: fortio + template: + metadata: + labels: + app: fortio + spec: + containers: + - name: app + image: fortio/fortio + imagePullPolicy: Always + ports: + - containerPort: 8080 + args: ["server"] \ No newline at end of file diff --git a/walkthroughs/howto-k8s-appmesh-load-test/load_test_flow_dg.png b/walkthroughs/howto-k8s-appmesh-load-test/load_test_flow_dg.png new file mode 100644 index 00000000..b91f035e Binary files /dev/null and b/walkthroughs/howto-k8s-appmesh-load-test/load_test_flow_dg.png differ diff --git a/walkthroughs/howto-k8s-appmesh-load-test/requirements.txt b/walkthroughs/howto-k8s-appmesh-load-test/requirements.txt new file mode 100644 index 00000000..c9603025 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/requirements.txt @@ -0,0 +1,7 @@ +altair==4.2.0 +boto3==1.26.14 +botocore==1.29.14 +matplotlib==3.5.3 +numpy==1.21.6 +pandas==1.3.5 +requests==2.28.1 \ No newline at end of file diff --git a/walkthroughs/howto-k8s-appmesh-load-test/scripts/analyze_load_test_data.py b/walkthroughs/howto-k8s-appmesh-load-test/scripts/analyze_load_test_data.py new file mode 100644 index 00000000..ba99971e --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/scripts/analyze_load_test_data.py @@ -0,0 +1,106 @@ +import csv +import json +import os +import subprocess +from collections import OrderedDict +from pathlib import Path +from pprint import pprint + +import matplotlib.pyplot as plt +import numpy as np + +from constants import S3_BUCKET + +DIR_PATH = os.path.dirname(os.path.realpath(__file__)) +DATA_PATH = os.path.join(DIR_PATH, 'data') + +NODE_NAME = "0" + + +def get_s3_data(): + res = subprocess.run(["aws sts get-caller-identity"], shell=True, stdout=subprocess.PIPE, + universal_newlines=True) + out = res.stdout + print("Caller identity: {}".format(out)) + + command = "aws s3 sync s3://{} {}".format(S3_BUCKET, DATA_PATH) + print("Running the following command to download S3 load test results: \n{}".format(command)) + res = subprocess.run([command], shell=True, stdout=subprocess.PIPE, universal_newlines=True) + out = res.stdout + print(out) + + +def plot_graph(actual_QPS_list, node_0_mem_list): + x_y_tuple = [(x, y) for x, y in sorted(zip(actual_QPS_list, node_0_mem_list))] + X, Y = zip(*x_y_tuple) + xpoints = np.array(X) + ypoints = np.array(Y) + + plt.figure(figsize=(10, 5)) + plt.bar(xpoints, ypoints, width=20) + plt.ylabel('Node-{} (MiB)'.format(NODE_NAME)) + plt.xlabel('Actual QPS') + print("Plotting graph...") + plt.show() + + +def read_load_test_data(): + all_files_list = [x for x in os.listdir(DATA_PATH) if os.path.isdir(os.path.join(DATA_PATH, x))] + qps_mem_files_list = [] + for exp_f in all_files_list: + result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(os.path.join(DATA_PATH, exp_f)) for f in + filenames + if "fortio.json" in f or "envoy_memory_MB_by_replica_set.csv" in f] + qps_mem_files_list.append(result) + + actual_qps_list = [] + node_0_mem_list = [] + experiment_results = {} + for qps_or_mem_f in qps_mem_files_list: + attrb = {} + for f in qps_or_mem_f: + if "fortio.json" in f: + with open(f) as json_f: + j = json.load(json_f) + actual_qps = j["ActualQPS"] + actual_qps_list.append(actual_qps) + attrb["ActualQPS"] = actual_qps + else: + with open(f) as csv_f: + c = csv.reader(csv_f, delimiter=',', skipinitialspace=True) + node_0_mem = [] + node_found = False + for line in c: + if "node-" + NODE_NAME in line[0]: + node_0_mem.append(float(line[2])) + node_found = True + if not node_found: + raise Exception("Node not found: {} in experiment file: {}".format(NODE_NAME, f)) + max_mem = max(node_0_mem) + node_0_mem_list.append(max_mem) + attrb["max_mem"] = max_mem + key = Path(f) + experiment_results[os.path.join(key.parts[-3], key.parts[-2])] = attrb + + # for research purpose + sorted_experiment_results = OrderedDict() + for k, v in sorted(experiment_results.items(), key=lambda item: item[1]['max_mem']): + sorted_experiment_results[k] = v + print("Experiment results sorted:") + pprint(sorted_experiment_results) + + return actual_qps_list, node_0_mem_list + + +def plot_qps_vs_container_mem(): + actual_qps_list, node_0_mem_list = read_load_test_data() + + plot_graph(actual_qps_list, node_0_mem_list) + + +if __name__ == '__main__': + node_name = input('Enter the node name (or press enter for default node "0"): ') + if node_name != "": + NODE_NAME = node_name + get_s3_data() + plot_qps_vs_container_mem() diff --git a/walkthroughs/howto-k8s-appmesh-load-test/scripts/constants.py b/walkthroughs/howto-k8s-appmesh-load-test/scripts/constants.py new file mode 100644 index 00000000..4d064ee3 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/scripts/constants.py @@ -0,0 +1,10 @@ +URL_DEFAULT = "http://service-0.tls-e2e.svc.cluster.local:9080/path-0" +QPS_DEFAULT = "100" +DURATION_DEFAULT = "30s" +CONNECTIONS_DEFAULT = "1" + +FORTIO_RUN_ENDPOINT = 'http://localhost:9091/fortio/rest/run' +PROMETHEUS_QUERY_ENDPOINT = 'http://localhost:9090/api/v1/query_range' + +# give your s3 bucket a unique name +S3_BUCKET = "-appmeshloadtester" diff --git a/walkthroughs/howto-k8s-appmesh-load-test/scripts/driver.sh b/walkthroughs/howto-k8s-appmesh-load-test/scripts/driver.sh new file mode 100644 index 00000000..61a14af9 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/scripts/driver.sh @@ -0,0 +1,64 @@ +#!/usr/bin/env bash + +err() { + msg="Error: $1" + echo "${msg}" + code=${2:-"1"} + exit ${code} +} + +exec_command() { + eval "$1" + if [ $? -eq 0 ]; then + echo "'$1' command Executed Successfully" + else + err "'$1' command Failed" + fi +} + +check_version() { + eval "$1" +} + +# sanity check +if [ -z "${CONTROLLER_PATH}" ]; then + err "CONTROLLER_PATH is not set" +fi + +if [ -z "${KUBECONFIG}" ]; then + err "KUBECONFIG is not set" +fi + +if [ -z "${CLUSTER_NAME}" ]; then + err "CLUSTER_NAME is not set" +fi + +if [ -z "${AWS_REGION}" ]; then + err "AWS_REGION is not set" +fi + +if [ -z "${VPC_ID}" ]; then + err "VPC_ID is not set" +fi + +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )" +APPMESH_LOADTESTER_PATH="$(dirname "$DIR")" +echo "APPMESH_LOADTESTER_PATH -: $APPMESH_LOADTESTER_PATH" + +# Prometheus port forward +echo "Port-forwarding Prometheus" +kubectl --namespace appmesh-system port-forward service/appmesh-prometheus 9090 & +pid=$! + +# call ginkgo +echo "Starting Ginkgo test. This may take a while! So hang tight and do not close this window" +cd $CONTROLLER_PATH && ginkgo -v -r --focus "DNS" "$CONTROLLER_PATH"/test/e2e/fishapp/load -- --cluster-kubeconfig=$KUBECONFIG \ +--cluster-name=$CLUSTER_NAME --aws-region=$AWS_REGION --aws-vpc-id=$VPC_ID \ +--base-path=$APPMESH_LOADTESTER_PATH + +# kill prometheus port forward +echo "Killing Prometheus port-forward" +kill -9 $pid +[ $status -eq 0 ] && echo "Killed Prometheus port-forward" || echo "Error when killing Prometheus port forward" + +cd "$DIR" || exit \ No newline at end of file diff --git a/walkthroughs/howto-k8s-appmesh-load-test/scripts/load_driver.py b/walkthroughs/howto-k8s-appmesh-load-test/scripts/load_driver.py new file mode 100644 index 00000000..b6b033d6 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/scripts/load_driver.py @@ -0,0 +1,200 @@ +import io +import json +import logging +import os +import sys +import time +from datetime import datetime + +import boto3 +import pandas as pd +import requests +from botocore.exceptions import ClientError + +from constants import * + + +def check_valid_request_data(request_data): + logging.info("Validating Fortio request parameters") + if "url" not in request_data: + logging.warning("URL not provided. Defaulting to {}".format(URL_DEFAULT)) + request_data["url"] = URL_DEFAULT + if "t" not in request_data: + logging.warning("Duration (t) not provided. Defaulting to {}".format(DURATION_DEFAULT)) + request_data["t"] = DURATION_DEFAULT + if "qps" not in request_data: + logging.warning("qps not provided. Defaulting to {}".format(QPS_DEFAULT)) + request_data["qps"] = QPS_DEFAULT + if "c" not in request_data: + logging.warning("# Connections (c) not provided. Defaulting to {}".format(CONNECTIONS_DEFAULT)) + request_data["c"] = CONNECTIONS_DEFAULT + + logging.info("Updated request data -: {}".format(request_data)) + return request_data + + +def run_fortio_test(test_data): + fortio_request_data = test_data.copy() + test_name = fortio_request_data.pop("test_name") + logging.info("Running test -: {}".format(test_name)) + fortio_request_data = check_valid_request_data(fortio_request_data) + fortio_response = requests.post(url=FORTIO_RUN_ENDPOINT, json=fortio_request_data) + + if fortio_response.ok: + fortio_json = fortio_response.json() + logging.info("Successful Fortio run -: {}".format(test_name)) + return fortio_json + else: + logging.error("Fortio response code = {}".format(fortio_response.status_code)) + fortio_response.raise_for_status() + + +def query_prometheus_server(metric_name, metric_logic, start_ts, end_ts, step="10s"): + logging.info("Querying prometheus server for metric = {} using logic = {}".format(metric_name, metric_logic)) + prometheus_response = requests.post(url=PROMETHEUS_QUERY_ENDPOINT, + data={"query": metric_logic, "start": start_ts, "end": end_ts, "step": step}) + if prometheus_response.ok: + logging.info("Successfully queried Prometheus for metric -: {}".format(metric_name)) + else: + logging.error("Error while querying Prometheus for metric -: {}".format(metric_name)) + prometheus_response.raise_for_status() + + return prometheus_response.json() + + +def prometheus_json_to_df(prometheus_json, metric_name): + data = pd.json_normalize(prometheus_json, record_path=['data', 'result']) + try: + # Split values into separate rows + df = data.explode('values') + # Split [ts, val] into separate columns + split_df = pd.DataFrame(df['values'].to_list(), columns=['timestamp', metric_name], index=df.index) + metrics_df = pd.concat([df, split_df], axis=1) + metrics_df.drop(columns='values', inplace=True) + metrics_df['timestamp'] = pd.to_numeric(metrics_df['timestamp']) + + # Normalize timestamps + groupby_column = [col for col in metrics_df.columns if col.startswith("metric")][0] + metrics_df['normalized_ts'] = metrics_df['timestamp'] - metrics_df.groupby(groupby_column).timestamp.transform( + 'min') + logging.info("Normalized DataFrame -: {}".format(metrics_df.head(30))) + except KeyError: + logging.warning("Metrics response is empty. Returning empty DataFrame") + metrics_df = pd.DataFrame(columns=["metric.", "timestamp", metric_name, "normalized_ts"]) + + return metrics_df + + +def write_to_s3(s3_client, data, folder_path, file_name): + response = s3_client.put_object(Bucket=S3_BUCKET, Key="{}/{}".format(folder_path, file_name), Body=data) + status = response.get("ResponseMetadata", {}).get("HTTPStatusCode") + + if status == 200: + logging.info("Successful write of ({}/{}) to S3. Status - {}".format(folder_path, file_name, status)) + else: + logging.error("Error writing ({}/{}) to S3. Response Metadata -: {}".format(folder_path, file_name, + response['ResponseMetadata'])) + raise IOError("S3 Write Failed. ResponseMetadata -: {}".format(response['ResponseMetadata'])) + + +def get_s3_client(region=None, is_creds=False): + try: + if region is None: + s3_client = boto3.client('s3') + elif is_creds: + cred = { + "credentials": { + "accessKeyId": os.environ['AWS_ACCESS_KEY_ID'], + "secretAccessKey": os.environ['AWS_SECRET_ACCESS_KEY'], + "sessionToken": os.environ['AWS_SESSION_TOKEN'], + } + } + s3_client = boto3.client('s3', + aws_access_key_id=cred['credentials']['accessKeyId'], + aws_secret_access_key=cred['credentials']['secretAccessKey'], + aws_session_token=cred['credentials']['sessionToken'], + region_name=region) + else: + s3_client = boto3.client('s3', region_name=region) + except ClientError as e: + logging.error(e) + return + return s3_client + + +def create_bucket_if_not_exists(s3_client, bucket_name, region=None): + """Create an S3 bucket in a specified region + + If a region is not specified, the bucket is created in the S3 default + region (us-west-2). + + :param bucket_name: Bucket to create + :param region: String region to create bucket in, e.g., 'us-west-2' + :return: True if bucket created, else False + """ + try: + s3 = boto3.resource('s3') + s3.meta.client.head_bucket(Bucket=bucket_name) + logging.info("No need to create as bucket: {} already exists,".format(bucket_name)) + except ClientError: + # Create bucket + try: + if region is None: + s3_client.create_bucket(Bucket=bucket_name) + else: + location = {'LocationConstraint': region} + s3_client.create_bucket(Bucket=bucket_name, CreateBucketConfiguration=location) + except ClientError as e: + logging.error(e) + return False + return True + + +if __name__ == '__main__': + config_file = sys.argv[1] + BASE_PATH = sys.argv[2] + LOGS_FOLDER = os.path.join(BASE_PATH, "logs") + os.makedirs(LOGS_FOLDER, exist_ok=True) + driver_ts = datetime.today().strftime('%Y%m%d%H%M%S') + log_file = os.path.join(LOGS_FOLDER, "load_driver_{}.log".format(driver_ts)) + logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', level=logging.INFO) + + logging.info("driver_ts = {}".format(driver_ts)) + with open(config_file, "r") as f: + config = json.load(f) + logging.info("Loaded config file") + + region = os.environ['AWS_REGION'] + s3_client = get_s3_client() + create_bucket_if_not_exists(s3_client=s3_client, bucket_name=S3_BUCKET, region=region) + + for test in config["load_tests"]: + logging.info("Writing config to S3") + write_to_s3(s3_client, json.dumps(config, indent=4), "{}/{}".format(test['test_name'], driver_ts), + "config.json") + start_ts = int(time.time()) + fortio_json = run_fortio_test(test) + # Write Fortio response to S3 + logging.info("Writing Fortio response to S3") + write_to_s3(s3_client, json.dumps(fortio_json, indent=4), "{}/{}".format(test['test_name'], driver_ts), + "fortio.json") + end_ts = int(time.time()) + + logging.info("start_ts -: {}, end_ts -: {}".format(start_ts, end_ts)) + + for metric_name, metric_logic in config['metrics'].items(): + metrics_json = query_prometheus_server(metric_name, metric_logic, start_ts, end_ts) + metrics_df = prometheus_json_to_df(metrics_json, metric_name) + # Write to S3 + logging.info("Writing Metrics dataframe to S3") + s3_folder_path = "{}/{}".format(test['test_name'], driver_ts) + file_name = "{}.csv".format(metric_name) + csv_buffer = io.StringIO() + metrics_df.to_csv(csv_buffer, index=False) + write_to_s3(s3_client, csv_buffer.getvalue(), s3_folder_path, file_name) + csv_buffer.close() + + logging.info("Finished exporting all metrics for {}. Sleeping for 10s before starting next test".format( + test['test_name'])) + # Sleep 10s between tests + time.sleep(10) diff --git a/walkthroughs/howto-k8s-appmesh-load-test/scripts/request_handler.py b/walkthroughs/howto-k8s-appmesh-load-test/scripts/request_handler.py new file mode 100644 index 00000000..e934931c --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/scripts/request_handler.py @@ -0,0 +1,55 @@ +from flask import Flask, request, abort, jsonify +import os +import json +import logging + +import aiohttp +import asyncio + +app = Flask(__name__) + +async def fetch(session, url): + async with session.get(url) as response: + resp = await response.json() + return response + + +async def fetch_all(backends): + async with aiohttp.ClientSession() as session: + tasks = [] + for url in backends: + tasks.append(fetch(session,url)) + responses = await asyncio.gather(*tasks, return_exceptions=True) + return responses + + +@app.route('/health', methods = ['GET']) +def health(): + return f"Alive. Backends -: {os.getenv('BACKENDS')}" + +@app.route('/', methods = ['GET']) +def wrk(): + if(os.getenv('BACKENDS') and os.getenv('BACKENDS') != ""): + backends = os.getenv('BACKENDS').split(",") + else: + backends = "" + + if(backends): + responses = asyncio.run(fetch_all(backends)) + for i in responses: + print(f"Status = {i.status}, reason = {i.reason}, real_url = {i.real_url}") + + retcode = 200 if(set([response.status for response in responses]) == set([200])) else 500 + msg = "backends Success" if retcode == 200 else "error when calling backends" + else: + retcode = 200 + msg = "Success" + + print(msg, retcode) + + return jsonify(msg), retcode + + +if __name__ == "__main__": + logging.getLogger().setLevel(logging.INFO) + app.run(host='0.0.0.0', debug=True, threaded=True) \ No newline at end of file diff --git a/walkthroughs/howto-k8s-appmesh-load-test/scripts/request_handler_driver.sh b/walkthroughs/howto-k8s-appmesh-load-test/scripts/request_handler_driver.sh new file mode 100644 index 00000000..8379c4c4 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/scripts/request_handler_driver.sh @@ -0,0 +1,12 @@ +echo "Entered handler" +# sleep 600 +# Install dependencies +pip3 install flask +echo "Completed flask" +pip3 install aiohttp +echo "Completed aiohttp" +pip3 install asyncio +echo "Completed asyncio" + +# # Run flask server +flask run -p 9080 diff --git a/walkthroughs/howto-k8s-appmesh-load-test/vars.env b/walkthroughs/howto-k8s-appmesh-load-test/vars.env new file mode 100644 index 00000000..bfe59de0 --- /dev/null +++ b/walkthroughs/howto-k8s-appmesh-load-test/vars.env @@ -0,0 +1,5 @@ +export CONTROLLER_PATH= +export CLUSTER_NAME= +export KUBECONFIG= +export AWS_REGION=us-west-2 +export VPC_ID= \ No newline at end of file