log-loss-framework: all test materials developed for AWSLogs benchmark #670

PettitWesley · 2023-06-05T01:36:48Z

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

PettitWesley · 2023-06-05T01:37:49Z

Original review and comments here were addressed: #630

zwj102030 · 2023-06-22T06:44:17Z

troubleshooting/tools/log-loss-test-framework/awslogs-benchmarking-scripts/ec2_launcher.sh

+12000
+"
+export BUFFER_SIZES="
+1m


it better have a comment to tell what is the unit for the buffer size , 1MB ?

zwj102030 · 2023-06-22T06:45:30Z

troubleshooting/tools/log-loss-test-framework/awslogs-benchmarking-scripts/fargate-task.json

+				{
+					"name": "SIZE_IN_KB",
+					"value": "${SIZE_IN_KB}"
+				},
+                {
+					"name": "TOTAL_SIZE_IN_MB",
+					"value": "${TOTAL_SIZE_IN_MB}"
+				},
+                {
+					"name": "THROUGHPUT_IN_KB",
+					"value": "${THROUGHPUT}"
+				},
+                {
+					"name": "CYCLE_TIME_IN_SECONDS",
+					"value": "1"
+				}


Should we fix the indention ??

Also, why others are env variable but not this "1"

The CYCLE_TIME_IN_SECONDS doesn't change in any of the tests, so I didn't add it as an env

fixed indentation

zwj102030 · 2023-06-22T06:46:34Z

troubleshooting/tools/log-loss-test-framework/awslogs-benchmarking-scripts/fargate-task.json

+                {
+					"name": "TOTAL_SIZE_IN_MB",
+					"value": "${TOTAL_SIZE_IN_MB}"
+				},
+                {
+					"name": "THROUGHPUT_IN_KB",
+					"value": "${THROUGHPUT}"
+				},
+                {
+					"name": "TEST_NAME",
+					"value": "${TEST_NAME}"
+				},
+                {
+					"name": "CW_LOG_GROUP_NAME",
+					"value": "awslogs-benchmarking-output"
+				},
+                {


Same indention

fixed indentation

zwj102030 · 2023-06-22T06:48:17Z

troubleshooting/tools/log-loss-test-framework/awslogs-benchmarking-scripts/fargate_launcher.sh

+export THROUGHPUTS="
+1000
+2000
+3000
+4000
+4500
+5000
+6000
+7000
+8000
+9000
+10000
+12000
+"
+export BUFFER_SIZES="
+1m
+2m
+4m
+6m
+8m
+12m
+"


Can't we move those into util class, why it needs to be redefined here ?? Since ecs_launcher.sh has same variables

this is a simple quick script to make launching easy for the AWSLogs project... it may never be used again... so I think its not worth it

zwj102030 · 2023-06-22T06:49:25Z

This seems a fairly large PR. It will be better to break them into smaller ones.

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

matthewfala

Looks great! Lot's of insightful metrics collected on the tests. The organization of the result data as a csv where each line represents a different test with all metric dimensions is creative and impactful as it allows for drilling down statistics on dimensions like buffer and throughput, and also obtaining aggregate statistics.

Left comments on the portions of code that I felt may need a second look. Also there are some style comments which I stopped adding half way through, since I'm not sure if you are concerned about that with these scripts.

matthewfala · 2023-07-17T19:10:06Z

troubleshooting/tools/log-loss-test-framework/validator/validate.go

+	// perform CW insights query for test name
+	// output is comma delimited, output insights query result as CSV
+	// simple python code can parse the CSV
+	fmt.Printf("%s %s - %s, percent lost, %d, number_lost, %d, total_input_record, %d, duplicates, %d, group=%s stream=%s TOTAL_SIZE_IN_MB=%s, SIZE_IN_KB=%s, THROUGHPUT_IN_KB=%s, %s, %s, %s, %s, %s, first_lost=%d, last_lost=%d",


I really like how task metadata is included along with all of the logging statistics. It allows all the data to be dumped into one CW stream making the data easily queryable rather than having a separate stream for each test/result set.

matthewfala · 2023-07-17T19:10:15Z

troubleshooting/tools/log-loss-test-framework/validator/validate.go

+			}
+		}
+	}
+	// fmt.Printf("\n")


Consider removing

matthewfala · 2023-07-17T19:10:42Z

troubleshooting/tools/log-loss-test-framework/validator/validate.go

+			fmt.Printf(".")
+		}
+
+		/* sleep between GetLogEvents calls to proactively reduce TPS against CW frontend */


If you want to be consistent with style, use //

matthewfala · 2023-07-17T19:16:19Z

troubleshooting/tools/log-loss-test-framework/validator/validate.go

+func exitErrorf(msg string, args ...interface{}) {
+	fmt.Fprintf(os.Stderr, msg+"\n", args...)
+	os.Exit(1)
+}


The logic of this file makes sense. I like the use of map, and also storing the duplicated log count via the difference between the counted logs and the number of logs appearing in the map. The histogram idea is also nice. Generally the comments in the code file start with both upper and lower case, which to convention it may be better to start all of them with upper or lower, rather than both though this is really minor and not really an essential solution.

matthewfala · 2023-07-17T19:31:51Z

troubleshooting/tools/log-loss-test-framework/results/test_run_merge.py

+
+if len(sys.argv) < 3:
+    # Created due to the large number of test cases that passed after triple+ re-validation
+    print("Usage: merges at least two result files, taking the more successful run when duplicates are found. Can specify any number of files in any order.")


I don't see any mention of duplicates in the code. Looks like it's just taking the data of the sample per task definition with the lowest number_lost metric

matthewfala · 2023-07-17T23:19:10Z

troubleshooting/tools/log-loss-test-framework/logger/log_generator.c

+    if (burst_enabled != NULL) {
+        burstSizeInMB = atoi(burst_enabled);
+        burstThroughputInKb = atoi(getenv("BURST_THROUGHPUT_IN_KB"));
+        if ((burstSizeInMB * 2 * 1000) > totalSizeInKb) {


I don't see why this limit needs to be imposed. burstSizeInMB seems like it needs to be less than totalSizeInKb/1000 rather than less than half that.

matthewfala · 2023-07-17T23:59:03Z

troubleshooting/tools/log-loss-test-framework/results/loss_histogram.py

+def aggregation_key(data):
+    key = ""
+    for field in aggregate_on:
+        val = data[field]
+        key += val
+    if data['buffer'] > 1:
+        key += 'non-default-buffer'
+    data['key'] = key
+    return key
+
+


Function redefinition. Also defined above. I'm guessing that this is the one you want to keep? Consider removing one implementation

matthewfala · 2023-07-18T00:09:57Z

troubleshooting/tools/log-loss-test-framework/results/loss_histogram.py

+                end = index - 1
+                summary = line[:end]


I would expect this to be:

end = index summary = line[:end]

Because end is not included. I suppose the last value which is the test case name from the summary section added by analyze.py is really garbage and not parsed out.

matthewfala · 2023-07-18T00:11:29Z

troubleshooting/tools/log-loss-test-framework/logger/Dockerfile

@@ -0,0 +1,17 @@
+FROM alpine as build-env


Interesting choice. I suppose alpine can be used instead of amazon2 linux as long as we don't distribute.

matthewfala · 2023-07-18T00:18:19Z

troubleshooting/tools/log-loss-test-framework/awslogs-benchmarking-scripts/fargate_launcher.sh

+            aws ecs --region ${REGION} run-task --cli-input-json "$RUN_TASK" >> "$OUTPUT_FILE"
+            echo "Started 10 tasks"
+            sleep 100
+            aws ecs --region ${REGION} run-task --cli-input-json "$RUN_TASK" >> "$OUTPUT_FILE"


So 20 tasks are started for each type of test right? So 6212*20 = 2880 tasks. That's a lot of tasks!

PettitWesley requested a review from a team as a code owner June 5, 2023 01:36

PettitWesley mentioned this pull request Jun 5, 2023

Add log loss benchmark framework #630

Closed

zwj102030 reviewed Jun 22, 2023

View reviewed changes

log-loss-framework: all test materials developed for AWSLogs benchmark

3426b9d

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley force-pushed the log-loss-framework-awslogs-final-code branch from 562f429 to 3426b9d Compare June 22, 2023 17:26

PettitWesley mentioned this pull request Jul 17, 2023

Benchmark results for log loss for AWSLogs in non-blocking mode with different values of max-buffer-size moby/moby#45999

Open

matthewfala reviewed Jul 18, 2023

View reviewed changes

PettitWesley changed the base branch from mainline to develop March 13, 2024 21:05

PettitWesley merged commit d38bb57 into aws:develop Mar 13, 2024

PettitWesley deleted the log-loss-framework-awslogs-final-code branch March 13, 2024 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log-loss-framework: all test materials developed for AWSLogs benchmark #670

log-loss-framework: all test materials developed for AWSLogs benchmark #670

PettitWesley commented Jun 5, 2023

PettitWesley commented Jun 5, 2023

zwj102030 Jun 22, 2023

PettitWesley Jun 22, 2023

zwj102030 Jun 22, 2023

zwj102030 Jun 22, 2023

PettitWesley Jun 22, 2023

PettitWesley Jun 22, 2023

zwj102030 Jun 22, 2023

PettitWesley Jun 22, 2023

zwj102030 Jun 22, 2023

PettitWesley Jun 22, 2023

zwj102030 commented Jun 22, 2023 •

edited

Loading

matthewfala left a comment

matthewfala Jul 17, 2023

matthewfala Jul 17, 2023

matthewfala Jul 17, 2023

matthewfala Jul 17, 2023

matthewfala Jul 17, 2023

matthewfala Jul 17, 2023

matthewfala Jul 17, 2023

matthewfala Jul 18, 2023

matthewfala Jul 18, 2023

matthewfala Jul 18, 2023

log-loss-framework: all test materials developed for AWSLogs benchmark #670

log-loss-framework: all test materials developed for AWSLogs benchmark #670

Conversation

PettitWesley commented Jun 5, 2023

PettitWesley commented Jun 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zwj102030 commented Jun 22, 2023 • edited Loading

matthewfala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zwj102030 commented Jun 22, 2023 •

edited

Loading