-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-10602] Fix load test metrics in Grafana dashboard #12499
Conversation
…streaming metrics in Grafana dashboard" This reverts commit cdc2475, reversing changes made to 835805d. Revert "Merge pull request apache#12451: [BEAM-10602] Use python_streaming_pardo_5 table for latency results" This reverts commit 2f47b82, reversing changes made to d971ba1.
R: @tysonjh |
@@ -619,5 +855,5 @@ | |||
"variables": { | |||
"list": [] | |||
}, | |||
"version": 2 | |||
"version": 8 | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is auto-generated but I could add a newline :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but reviewing configs is not easy. Is there a way that I can see it? I launched locally but didn't have any data.
@@ -161,12 +164,13 @@ def streamingScenarios = { datasetName -> | |||
test : 'apache_beam.testing.load_tests.pardo_test', | |||
runner : CommonTestProperties.Runner.PORTABLE, | |||
pipelineOptions: [ | |||
job_name : 'load-tests-python-flink-streaming-pardo-5-' + now, | |||
job_name : 'load-tests-python-flink-streaming-pardo-1-' + now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about it. This test is not load-tests-python-flink-batch-pardo-1
but on streaming. There are more differences between them: batch-pardo-1 uses 10 iterations, this test uses 5 iterations. 0 counters in batch-pardo-1 vs. 3 counters right here. Because of that, I think we should stay with the previous job_name: load-tests-python-flink-streaming-pardo-5
.
The general idea behind load tests is that we run the same configuration on different runners, in different SDKs and in different mode (batch or streaming). Grafana dashboards for load tests were designed with that convention in mind. If you choose java
and streaming
from the list, Grafana will pull data from these measurements: java_streaming_pardo_1
, java_streaming_pardo_2
and so. Your streaming tests are a bit problematic, because they are not being run on Dataflow and batch. Also, they have no Java counterpart.
That being said, I think about two solutions:
- Add more charts. We would end up with a total of six charts. The fifth and the sixth chart would be empty in most cases (for Java and for batch).
- Create a separate, more specific version of dashboard just for these two tests (streaming-pardo-5 and streaming-pardo-6). Leave "ParDo Load Tests" dashboard intact.
@mxm What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, this is just the job name. More important is the table we are writing to further down. Unfortunately, the Grafana setup forces me to do that. I would rather not change this but the Grafana setup is very inflexible and in this regard a regression from the old framework we used: https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
Your streaming tests are a bit problematic, because they are not being run on Dataflow and batch.
I don't fully understand your point to be honest, in order for the dropdown menus to work properly, i.e. choosing SDK
and the mode
(batch/streaming), this change is required because the table name is composed of $sdk_$mode_
. The test parameters looked identical to me for Dataflow/Flink. If the iterations don't match, we can adjust that. The input is already the same.
Adding more charts would be another option. We have to remove the streaming dropdown and just add one chart per streaming and batch run. I think that is the best option. It gives us a bit more flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kamilwu If you agree, I'd remove the streaming/batch dropdown and just add a new chart for the streaming mode. I suppose that is a better migration path because there are no other streaming load tests at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are no other streaming load tests at the moment.
Not quite. At the moment, we have streaming load tests for Java (Dataflow only). Apart from that, I'm investigating running other Python load tests (ParDo 1, 2, 3 and 4) in streaming mode too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, then I'm fine with adding new charts (I suppose you'd meant "chart", "dashboard" is a different kind of thing) and removing the selector for batch/streaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxm Do you think it is possible to adjust those parameters so that
pardo-5
can becomepardo-1
andpardo-6
can becomepardo-2
,pardo-3
orpardo-4
? The main advantage of this solution is that we wouldn't have to modify dashboards at all. The old version would just work.
That was the original idea in this PR which you I understood you didn't like. pardo_5 became pardo_1. As for pardo_6, that's not possible because it measures the checkpoint duration and should be a separate panel.
If not, then I'm fine with adding new charts (I suppose you'd meant "chart", "dashboard" is a different kind of thing) and removing the selector for batch/streaming.
Yes, I meant panel, corrected above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for pardo_6, that's not possible because it measures the checkpoint duration and should be a separate panel.
I see. Then, let's do the opposite way (adding new charts and removing the selector). Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went the other route you suggested and adjusted the parameters for the load tests. Adding more panels seemed like a good idea but it also adds significant noise to the dashboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the latency / checkpoint duration. I think they are good panels to have which are applicable to many runners. I'd like to keep them where they are so we can follow the performance regression guidelines in the release guide.
@tysonjh You should be able to run this locally with the backup data which is automatically retrieved from the GCS bucket when you run |
Got it, thanks. I had to add some extra steps to the wiki for setting up the InfluxDB Data Source and now have graphs showing up. Please let me know when the comments are resolved so I can take another look. |
…rDo Load Test The Flink streaming tests were reported in a separate table and made avaible through this dashboard: https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056 Turns out, this is not optimal for the new Grafana-based dashboard. We have to change the table name because the query capability of InfluxDb is very limited. This way the results will be shown together with the other Runners' load test results.
I'm going to merge this to bring back the metrics to the dashboard again. For now, I think the solution is the best. We can think about moving latency/checkpoint duration to a separate dashboard but I think it is best to have all the metrics for the release performance regression check in one place. |
Thanks @mxm. Just a couple of thoughts after reviewing the panels:
+1 for keeping latency/checkpoint duration where there are now |
This reverts the recent changes to the dashboards and adds a commit which adds a latency and checkpoint duration panel.
Also, it modifies the Flink streaming tests to write into the
_pardo_1
table. This way, the results will show up in the dashboard together with all the other Runners' data.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.