[BEAM-10602] Fix load test metrics in Grafana dashboard #12499

mxm · 2020-08-07T16:47:22Z

This reverts the recent changes to the dashboards and adds a commit which adds a latency and checkpoint duration panel.

Also, it modifies the Flink streaming tests to write into the _pardo_1 table. This way, the results will show up in the dashboard together with all the other Runners' data.

Post-Commit Tests Status (on master branch)

Lang	Dataflow	Samza	Twister2
Go	---	---	---
Java
Python		---	---
XLang	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

…streaming metrics in Grafana dashboard" This reverts commit cdc2475, reversing changes made to 835805d. Revert "Merge pull request apache#12451: [BEAM-10602] Use python_streaming_pardo_5 table for latency results" This reverts commit 2f47b82, reversing changes made to d971ba1.

mxm · 2020-08-07T16:49:18Z

R: @tysonjh

amaliujia · 2020-08-07T17:29:48Z

.test-infra/metrics/grafana/dashboards/perftests_metrics/ParDo_Load_Tests.json

@@ -619,5 +855,5 @@
  "variables": {
    "list": []
  },
-  "version": 2
+  "version": 8
 }


nit: add a newline?

This file is auto-generated but I could add a newline :)

tysonjh

Looks good but reviewing configs is not easy. Is there a way that I can see it? I launched locally but didn't have any data.

.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy

kamilwu · 2020-08-10T13:35:12Z

.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy

@@ -161,12 +164,13 @@ def streamingScenarios = { datasetName ->
      test           : 'apache_beam.testing.load_tests.pardo_test',
      runner         : CommonTestProperties.Runner.PORTABLE,
      pipelineOptions: [
-        job_name             : 'load-tests-python-flink-streaming-pardo-5-' + now,
+        job_name             : 'load-tests-python-flink-streaming-pardo-1-' + now,


I have mixed feelings about it. This test is not load-tests-python-flink-batch-pardo-1 but on streaming. There are more differences between them: batch-pardo-1 uses 10 iterations, this test uses 5 iterations. 0 counters in batch-pardo-1 vs. 3 counters right here. Because of that, I think we should stay with the previous job_name: load-tests-python-flink-streaming-pardo-5.

The general idea behind load tests is that we run the same configuration on different runners, in different SDKs and in different mode (batch or streaming). Grafana dashboards for load tests were designed with that convention in mind. If you choose java and streaming from the list, Grafana will pull data from these measurements: java_streaming_pardo_1, java_streaming_pardo_2 and so. Your streaming tests are a bit problematic, because they are not being run on Dataflow and batch. Also, they have no Java counterpart.

That being said, I think about two solutions:

Add more charts. We would end up with a total of six charts. The fifth and the sixth chart would be empty in most cases (for Java and for batch).

Create a separate, more specific version of dashboard just for these two tests (streaming-pardo-5 and streaming-pardo-6). Leave "ParDo Load Tests" dashboard intact.

@mxm What do you think?

Note, this is just the job name. More important is the table we are writing to further down. Unfortunately, the Grafana setup forces me to do that. I would rather not change this but the Grafana setup is very inflexible and in this regard a regression from the old framework we used: https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056

Your streaming tests are a bit problematic, because they are not being run on Dataflow and batch.

I don't fully understand your point to be honest, in order for the dropdown menus to work properly, i.e. choosing SDK and the mode (batch/streaming), this change is required because the table name is composed of $sdk_$mode_. The test parameters looked identical to me for Dataflow/Flink. If the iterations don't match, we can adjust that. The input is already the same.

Adding more charts would be another option. We have to remove the streaming dropdown and just add one chart per streaming and batch run. I think that is the best option. It gives us a bit more flexibility.

@kamilwu If you agree, I'd remove the streaming/batch dropdown and just add a new chart for the streaming mode. I suppose that is a better migration path because there are no other streaming load tests at the moment.

there are no other streaming load tests at the moment.

Not quite. At the moment, we have streaming load tests for Java (Dataflow only). Apart from that, I'm investigating running other Python load tests (ParDo 1, 2, 3 and 4) in streaming mode too.

@kkucharc is actually working on streaming load tests for Python on Dataflow and she's already prepared a PR: #12435. We would like to show metrics from these new tests too.

If not, then I'm fine with adding new charts (I suppose you'd meant "chart", "dashboard" is a different kind of thing) and removing the selector for batch/streaming.

@mxm Do you think it is possible to adjust those parameters so that pardo-5 can become pardo-1 and pardo-6 can become pardo-2, pardo-3 or pardo-4? The main advantage of this solution is that we wouldn't have to modify dashboards at all. The old version would just work.

That was the original idea in this PR which you I understood you didn't like. pardo_5 became pardo_1. As for pardo_6, that's not possible because it measures the checkpoint duration and should be a separate panel.

If not, then I'm fine with adding new charts (I suppose you'd meant "chart", "dashboard" is a different kind of thing) and removing the selector for batch/streaming.

Yes, I meant panel, corrected above.

As for pardo_6, that's not possible because it measures the checkpoint duration and should be a separate panel.

I see. Then, let's do the opposite way (adding new charts and removing the selector). Thank you.

I went the other route you suggested and adjusted the parameters for the load tests. Adding more panels seemed like a good idea but it also adds significant noise to the dashboard.

As for the latency / checkpoint duration. I think they are good panels to have which are applicable to many runners. I'd like to keep them where they are so we can follow the performance regression guidelines in the release guide.

mxm · 2020-08-10T14:33:29Z

@tysonjh You should be able to run this locally with the backup data which is automatically retrieved from the GCS bucket when you run docker-compose up. Basically, the changes here will restore the old behavior + add a latency/checkpoint panel + combine the Flink ParDo results with the ones from Dataflow/Spark in one panel.

tysonjh · 2020-08-11T17:25:02Z

@tysonjh You should be able to run this locally with the backup data which is automatically retrieved from the GCS bucket when you run docker-compose up. Basically, the changes here will restore the old behavior + add a latency/checkpoint panel + combine the Flink ParDo results with the ones from Dataflow/Spark in one panel.

Got it, thanks. I had to add some extra steps to the wiki for setting up the InfluxDB Data Source and now have graphs showing up. Please let me know when the comments are resolved so I can take another look.

…rDo Load Test The Flink streaming tests were reported in a separate table and made avaible through this dashboard: https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056 Turns out, this is not optimal for the new Grafana-based dashboard. We have to change the table name because the query capability of InfluxDb is very limited. This way the results will be shown together with the other Runners' load test results.

mxm · 2020-08-12T12:18:20Z

I'm going to merge this to bring back the metrics to the dashboard again. For now, I think the solution is the best. We can think about moving latency/checkpoint duration to a separate dashboard but I think it is best to have all the metrics for the release performance regression check in one place.

mxm · 2020-08-12T12:21:00Z

@tysonjh or @kamilwu Please have another look and let me know. You can do this online now once the changes have been deployed. Note, that for the first ParDo panel, the results for Flink will still have to be populated over the next days.

kamilwu · 2020-08-12T14:15:04Z

Thanks @mxm. Just a couple of thoughts after reviewing the panels:

there is something wrong with a legend (fifth and sixth panel affected). It says "$tag_metric" for both data series. This can be fixed by leaving the ALIAS BY field empty. Just in case, here's a documentation that explains how aliasing works: https://grafana.com/docs/grafana/latest/features/datasources/influxdb/#alias-patterns
we should fix parameter values in the title of the fifth panel after adjusting those parameters in job definition

+1 for keeping latency/checkpoint duration where there are now

mxm · 2020-08-28T14:08:51Z

Thanks @kamilwu. Here is the fix for the two issues you mentioned: #12717

probot-autolabeler bot added the infra label Aug 7, 2020

mxm requested a review from kamilwu August 7, 2020 16:47

mxm mentioned this pull request Aug 7, 2020

[BEAM-10602] Display Python streaming metrics in Grafana dashboard #12408

Merged

amaliujia reviewed Aug 7, 2020

View reviewed changes

tysonjh approved these changes Aug 7, 2020

View reviewed changes

.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy Show resolved Hide resolved

kamilwu reviewed Aug 10, 2020

View reviewed changes

mxm force-pushed the BEAM-10602 branch from 8c41254 to 026a972 Compare August 12, 2020 12:14

mxm merged commit d2ef73e into apache:master Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-10602] Fix load test metrics in Grafana dashboard #12499

[BEAM-10602] Fix load test metrics in Grafana dashboard #12499

mxm commented Aug 7, 2020

mxm commented Aug 7, 2020

amaliujia Aug 7, 2020

mxm Aug 7, 2020

tysonjh left a comment

kamilwu Aug 10, 2020

mxm Aug 10, 2020 •

edited

Loading

mxm Aug 10, 2020

kamilwu Aug 10, 2020

kamilwu Aug 10, 2020

kamilwu Aug 10, 2020

mxm Aug 10, 2020

kamilwu Aug 10, 2020

mxm Aug 12, 2020

mxm Aug 12, 2020

mxm commented Aug 10, 2020

tysonjh commented Aug 11, 2020

mxm commented Aug 12, 2020 •

edited

Loading

mxm commented Aug 12, 2020

kamilwu commented Aug 12, 2020

mxm commented Aug 28, 2020

[BEAM-10602] Fix load test metrics in Grafana dashboard #12499

[BEAM-10602] Fix load test metrics in Grafana dashboard #12499

Conversation

mxm commented Aug 7, 2020

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

mxm commented Aug 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tysonjh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxm Aug 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxm commented Aug 10, 2020

tysonjh commented Aug 11, 2020

mxm commented Aug 12, 2020 • edited Loading

mxm commented Aug 12, 2020

kamilwu commented Aug 12, 2020

mxm commented Aug 28, 2020

mxm Aug 10, 2020 •

edited

Loading

mxm commented Aug 12, 2020 •

edited

Loading