New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests #11555
Conversation
4f23ab9
to
d87e0cb
Compare
R: @Ardagan Could you run docker-compose and take a look at new dashboards? There is no data in the database yet, so charts will be empty. But I'm open to any suggestions regarding layout, naming, etc. |
Can be useful to add link instructions on how to access dashboards into PR description. I took a brief look at dashboards, here are my ideas:
These are my initial thoughts. R: @aaltay |
Comments from Ahmet:
|
@tysonjh FYI |
d87e0cb
to
be118c2
Compare
@Ardagan Thanks for your suggestions!
Done. I pushed modified version to the website (http://metrics.beam.apache.org) so that you can see what's changed. Also, I pushed some data to InfluxDB to make the review process easier.
Do you mean X axis or Y axis?
I tried this out, but four different data series (read_time x2 and write_time x2) in one single chart were not very readable. I think it's better to keep them separated.
What kind of description do you expect? We have some documentation on what tests are executed in Beam website [1] and cwiki [2]. If something needs to improved, let's improve website/wiki content. I prefer to avoid repetitions between website/wiki and descriptions in Grafana, because it'd be hard to keep them in sync. [1] https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests
I agree. Created a JIRA ticket to track the effort: https://issues.apache.org/jira/browse/BEAM-9889
This would be feasible if we introduced Kapacitor (a component responsible for detecting anomalies). We could write back Kapacitor alerts into InfluxDB and visualize them in a summary window in Grafana. This is not a part of this PR, but I have a plan to introduce anomaly detection this month. |
Grafana has a feature called Data links [1] that could be use here. But the biggest challenge is to get Jenkins job id for specific data point. When Python or Java test sends their metrics to InfluxDB/BigQuery, they have no knowledge of Jenkins job that executes them. Without a rework of sending metrics, this functionality will be difficult to implement. @Ardagan Any thoughs? [1] https://grafana.com/docs/grafana/latest/reference/datalinks/ |
I believe we can get jenkins job ID via env.JOB_NAME, but this will required update test metric report logic and DB update IIUC. We can add jira to do this improvement in separate PR. |
Some dashboards seem to miss data, but that's due to not all data migrated IIUC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, get feedback from @aaltay before merging.
I do not see the new dashboard here. How can I find it? I see these three: |
Some comments:
I might be missing other issues as well. If they are easy to fix later, we can fix what is identified, merge and ask for feedback on dev@ list. |
/cc @chamikaramj @tysonjh @kennknowles -- optional review request, if you would like to take a quick look at new benchmarks at http://metrics.beam.apache.org. (Instructions from @Ardagan : To find dashboards: click at top-left on "Home" or "current dashboard name", this will open drop-down list with full set of dashboards.) |
Hey Kamil, |
Didn't know about this. Thanks, this makes things much easier. |
Some of the Python on Flink tests are currently turned off. Kafka IO dashboard is empty, because the job's been flaky for some time and Go benchmarks are empty, because Go tests aren't implemented yet. I think every other dashboard does not miss any data. |
Tests are not publishing new metrics, this is work in progress: #11534, #11567 and #11577. I'm pretty sure this will be done by the end of this week. As for spark data, spark tests were introduced a short time ago. |
It was a purposeful change. This is the only test within Java IO IT dashboard that reports a different kind of metric (not read_time or write_time, but copies_per_sec). I can modify the color if you think all colors should be the same. |
We have only few Python IO IT tests at the moment. If IO IT dashboards had the same python/java selectors as Load tests, most of charts would be empty. |
Sure, I can take care of it. It's true the navigation is a bit complicated at the moment. |
Thanks for all comments. I will merge this PR tomorrow if they are no further action points. |
No, different colors make sense for different metrics. |
This LGTM. I believe the only open comment is about adding a landing page, but otherwise I do not have additional comments. |
The landing page will be created in a separate PR. I think this one can be closed now. Thanks! |
Thank you! Please cc me in the PR, I am interested in that change. |
A short description how to access new dashboards locally:
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.