Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8133] Publishing results of Nexmark tests to InfluxDB #11956

Merged
merged 4 commits into from Jun 25, 2020

Conversation

kamilwu
Copy link
Contributor

@kamilwu kamilwu commented Jun 9, 2020


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 10, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 10, 2020

Run Direct Runner Nexmark Tests

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 10, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 10, 2020

Run Direct Runner Nexmark Tests

@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch from fab9cff to 5fedf7e Compare June 10, 2020 15:23
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 10, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 10, 2020

Run Direct Runner Nexmark Tests

@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch from 5fedf7e to e89172c Compare June 11, 2020 17:26
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 11, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 11, 2020

Run Direct Runner Nexmark Tests

1 similar comment
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 11, 2020

Run Direct Runner Nexmark Tests

@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch from e89172c to 1586c75 Compare June 15, 2020 11:08
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 15, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 15, 2020

Run Direct Runner Nexmark Tests

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 15, 2020

R: @iemejia Could you take a look?

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 15, 2020

cc: @tysonjh

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 15, 2020

New results will be displayed in Grafana once this pull request is merged. The results of tests executed by phase triggering are written to different measurement (or table).

results.forEach(
map ->
metricBuilder
.append(map.get("measurement"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would make sense to extract the appends of keys and values to a method, but I can't find a nice and clean way of doing it. @kamilwu do you have any thoughts about it?

Copy link
Contributor Author

@kamilwu kamilwu Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a method getKV that would return a String in this format: "key=value", e.g. "runner=DataflowRunner"? This would reduce the number of appends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense, thanks!

@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch from 1586c75 to 92d802f Compare June 18, 2020 13:45
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 18, 2020

Run Direct Runner Nexmark Tests

@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch from 92d802f to 116ad4a Compare June 19, 2020 09:09
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 19, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 19, 2020

Run Direct Runner Nexmark Tests

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 19, 2020

Run Java AvroIO Performance Test HDFS

@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch from 116ad4a to b52dc6d Compare June 19, 2020 12:59
Copy link
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far. I let some questions, just minor stuff. it was not clear to me if we are conserving the same schema for compatibility reasons with BigQuery because the goal is to support both for some time, if it is not the case we should probably get rid of the BigQuery bits.

@@ -24,6 +24,7 @@ metadata:
data:
init-script.iql: |
CREATE RETENTION POLICY "a_year" ON "beam_test_metrics" DURATION 52w REPLICATION 1 DEFAULT
CREATE RETENTION POLICY "forever" ON "beam_test_metrics" DURATION INF REPLICATION 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

final ImmutableMap<String, String> schema =
ImmutableMap.<String, String>builder()
.put("timestamp", "timestamp")
.put("runtimeSec", "float")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the goal is to improve the existing use case can we make this an integer and use ms instead to make it more precise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I think we need to change wording to runtimeMs as well, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good idea

ImmutableMap.<String, String>builder()
.put("timestamp", "timestamp")
.put("runtimeSec", "float")
.put("eventsPerSec", "float")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this one? it looks with runtimeMs + numResults this is not needed anymore or we can deduce it if someone cares.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll remove it from implementation but not here (we want to preserve compatibility with BQ), I'll change it in influx publisher, thanks for the info!

savePerfsToBigQuery(
BigQueryResultsPublisher.create(options.getBigQueryDataset(), schema),
options,
actual,
start);
}

if (options.getExportSummaryToInfluxDB()) {
final long timestamp = start.getMillis() / 1000; // seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought timestamps in Influxe were in ms well probably we don't need that level of precision for the start timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default precision is nanoseconds. In case of nexmark results we changed it and use seconds instead

return new HttpPost(
        settings.host + "/write?db=" + settings.database + "&" + retentionPolicy + "&precision=s");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've thought that we don't really need milliseconds. Even seconds are probably more than enough

Copy link
Member

@iemejia iemejia Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you guys are right seconds is ok for the execution timestamp and ms is good for the benchmark run time

@@ -42,8 +42,8 @@
PROCESSING_TIME_WINDOWS(12), // Query "12"

// Other non-numbered queries
BOUNDED_SIDE_INPUT_JOIN,
SESSION_SIDE_INPUT_JOIN;
BOUNDED_SIDE_INPUT_JOIN(13),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -19,14 +19,19 @@

import static java.nio.charset.StandardCharsets.UTF_8;
import static java.util.Objects.requireNonNull;
import static java.util.stream.Collectors.joining;
import static org.apache.beam.repackaged.core.org.apache.commons.lang3.StringUtils.isBlank;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not depend on repackaged commons-lang3 this will probably be removed in the future so better add the explicit commons-lang3 import and corresponding classes.


final StringBuilder metricBuilder = new StringBuilder();
results.stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The original code with the strings looks uglier but somehow is easier to understand in a single read (so easier to maintain), the new one requires a lot of methods and jumping back and forth in code for not much. Can we go back to the older approach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm...sure we can.

@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch 2 times, most recently from 91d0081 to 12f2cbe Compare June 25, 2020 11:17
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Direct Runner Nexmark Tests

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Java AvroIO Performance Test HDFS

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Dataflow Runner Nexmark Tests


import java.io.IOException;
import java.util.Collection;
import java.util.Map;
import org.apache.beam.repackaged.core.org.apache.commons.lang3.StringUtils;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh this commons import escaped here too. Can you use the non repackaged version please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah..I would swear I changed it...thanks!

Copy link
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM module the fix on the commons.lang3 import.
Thanks !

@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch from 12f2cbe to abfb1dd Compare June 25, 2020 14:53
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Dataflow Runner Nexmark Tests

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Beam Metrics deployment

* changed unit from seconds to milliseconds
* renamed `runtimeSec` field to `runtimeMs`
@kamilwu kamilwu force-pushed the nexmark-influxdb-publisher branch from abfb1dd to e89324f Compare June 25, 2020 15:38
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run Dataflow Runner Nexmark Tests

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run JavaPortabilityApiJava11 PreCommit

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Run JavaPortabilityApi PreCommit

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 25, 2020

Thanks @iemejia!

@kamilwu kamilwu merged commit 3f1db41 into apache:master Jun 25, 2020
@kamilwu kamilwu deleted the nexmark-influxdb-publisher branch June 25, 2020 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants