Add InfluxDB metrics support #2972

tomncooper · 2018-08-02T13:52:29Z

This PR adds a new metrics sink for InfluxDB. It has options for batch sending (at the flush frequency) or sending as soon as metrics are received. I have updated the website source with documentation and added config stubs to all the metric-sinks.yaml files I could find.

This has been tested with the Local scheduler, but not any others.

nwangtw · 2018-08-06T15:48:45Z

heron/metricsmgr/src/java/org/apache/heron/metricsmgr/sink/InfluxDBSink.java

+import org.influxdb.InfluxDBFactory;
+import org.influxdb.dto.BatchPoints;
+import org.influxdb.dto.Point;
+import org.influxdb.BatchOptions;


nwangtw · 2018-08-06T15:49:51Z

heron/metricsmgr/src/java/org/apache/heron/metricsmgr/sink/InfluxDBSink.java

+    serverHost = (String) conf.get(SERVER_HOST_KEY);
+
+    // The InfluxDB client expects the host connection string to begin with http or https
+    if(!serverHost.contains("http://") & !serverHost.contains("https://")){


startsWith() and && might be better?

And please add a space after "if" and before "{"

nwangtw · 2018-08-06T15:54:03Z

heron/metricsmgr/src/java/org/apache/heron/metricsmgr/sink/InfluxDBSink.java

+   * @param context context object containing information on the Topology and Heron system this sink
+   *                is registered with.
+   */
+  public void init(Map<String, Object> conf, SinkContext context) {


This function is a bit long. suggest to refactor some blocks into smaller ones.

nwangtw · 2018-08-06T15:54:22Z

heron/metricsmgr/src/java/org/apache/heron/metricsmgr/sink/InfluxDBSink.java

+   */
+	public void flush() {
+	  if(batchEnabled) {
+      LOG.info("Flushing buffered metrics to InfluxDB");


tomncooper · 2018-08-08T08:45:34Z

@nwangtw Hope those changes address your comments. Thanks for looking at this for me.

Code0x58 · 2018-08-11T20:37:22Z

Out of curiosity, is there a pull based alternative to this which doesn't require changes? I was inclined to look as this introduces a lot of dependencies. Here's a post from May 2017 that made me think it might be an option:

Kapacitor & Pull

With this release, we’ve integrated Prometheus’ service discovery and scraping code into Kapacitor. That means that any service discovery target that works with Prometheus will work with Kapacitor. Combined with a TICK script, you can use Kapacitor to monitor Prometheus scrape targets, write data into InfluxDB, and perform filtering, transformation and other tasks. With Kapacitor’s user defined functions (UDFs) it becomes trivial to pull in advanced anomaly detection and custom logic.

tomncooper · 2018-08-13T09:46:40Z

I agree the added dependencies are excessive. I could rewrite the client to do raw HTTP requests but I would simply be replicating the work of Influx client.

To cut down on dependencies, could I package the influx sink on maven central and have it as a single external dependency? Or is that overkill?

The Prometheus/Kapacitor option is a great solution (wish I had found it before I wrote the Influx sink). However, it does require Kapacitor (a stream processing engine) to be setup, so we have a stream processing engine to monitor a stream processing engine ("who watches the watchmen", "it's turtles all the way down", etc etc).

I think this highlights an issue with the centralised metrics sink design in Heron. In Storm the sink is part of the topology and is included in the deployed fat JAR, so is just pulled in as needed by the topology developers. I am wondering if there may be a way to do an "on demand" version of sinks for Heron? Probably not as they are part of the metrics manager.

nwangtw · 2018-08-13T15:44:02Z

"I could rewrite the client to do raw HTTP requests but I would simply be replicating the work of Influx client."

@Code0x58 made a good point. For a metrics sink, it is quite strange to me that so many dependencies are required. In theory, a sink client should be just a simple wrapper for a few HTTP requests, so it may only need a HTTP lib (and/or thrift requests but still). I am wondering how these dependencies are used and if there is a simpler solution.

tomncooper · 2018-08-13T15:57:38Z

Well most of the dependencies come from the http lib that the influxDB java client is using (com_squareup_okhttp3). It looks excessive as I have to list every sub dependency in the WORKSPACE file. Is there a way for bazel to resolve the java dependencies itself? Then I would just need to include the InfluxDB client.

If adding these deps really is a hard no, then I can rewrite the sink to just do HTTP requests on the InfluxDB line protocol using the Apache Common client we are already including.

nwangtw · 2018-08-13T16:17:19Z

I see.

It is not a hard no, just makes maintenance a bit harder. Personally I am ok (it seems this is the official influxDB client), if they are necessary.

Altered the influx DB sink so that each topology has its own database on the influx server. Updated configs and documentation to match this change.

Changed the source tag so that it is parsed, as it is in the Tmaster sink, into seperate tags for HostPort, component and instance ID tags.

tomncooper and others added 19 commits August 9, 2017 11:45

Added .vagrant to gitignore

d6dcc55

Merge branch 'master' of https://github.com/twitter/heron

0517bed

Merge branch 'master' of github.com:tomncooper/heron

1f7a0a1

Merge branch 'master' of https://github.com/twitter/heron

cb31800

Merge branch 'master' of github.com:tomncooper/heron

360a2af

Merge branch 'master' of https://github.com/twitter/heron

5ad3586

Merge branch 'master' of https://github.com/twitter/heron

f8bb3f6

Merge branch 'master' of https://github.com/twitter/heron

2773267

Merge branch 'master' of https://github.com/twitter/heron

f534cb1

Merge branch 'master' of https://github.com/twitter/heron

06b0db3

Merge branch 'master' of https://github.com/apache/incubator-heron

fd330dd

Merge branch 'master' of https://github.com/apache/incubator-heron

466e0f5

Merge branch 'master' of https://github.com/apache/incubator-heron

18e34ee

Merge branch 'master' of https://github.com/apache/incubator-heron

e5baf4d

Merge branch 'master' of https://github.com/apache/incubator-heron

55538ef

Merge branch 'master' of https://github.com/apache/incubator-heron

53ff368

Merge branch 'master' of https://github.com/apache/incubator-heron

5e9daf7

Merge branch 'master' of https://github.com/apache/incubator-heron

201c385

Merge branch 'master' of https://github.com/apache/incubator-heron

e580842

tomncooper changed the title ~~Add InfluxDB metirics support~~ Add InfluxDB metrics support Aug 2, 2018

nwangtw reviewed Aug 6, 2018

View reviewed changes

tomncooper added 4 commits August 9, 2018 14:45

Added .vagrant to gitignore

1ee842a

Merge branch 'master' of github.com:tomncooper/heron

9a3d031

Added .vagrant to gitignore

332ab5c

Merge branch 'master' of github.com:tomncooper/heron

d39480f

tomncooper added 9 commits August 15, 2018 15:10

Merge branch 'master' of https://github.com/apache/incubator-heron

6782ff7

Added Influx sink code and deps to workspace

7be30f8

Completed InfluxDB sink and added deps and config stubs

1576772

Added InfluxDB information to website

1617b2c

Changed sink init so that each topo has its own DB

83e9e85

Altered the influx DB sink so that each topology has its own database on the influx server. Updated configs and documentation to match this change.

Separated out the source into host, component and instance ID tags

ecc525c

Changed the source tag so that it is parsed, as it is in the Tmaster sink, into seperate tags for HostPort, component and instance ID tags.

Added error handelling to source name parsing in sink

a78596a

Edits in Influx sink after feedback

c7f86ed

Updated tests

9132e56

tomncooper closed this Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add InfluxDB metrics support #2972

Add InfluxDB metrics support #2972

tomncooper commented Aug 2, 2018 •

edited

nwangtw Aug 6, 2018

nwangtw Aug 6, 2018

nwangtw Aug 6, 2018

nwangtw Aug 6, 2018

tomncooper commented Aug 8, 2018

Code0x58 commented Aug 11, 2018

Kapacitor & Pull

tomncooper commented Aug 13, 2018

nwangtw commented Aug 13, 2018

tomncooper commented Aug 13, 2018

nwangtw commented Aug 13, 2018

Add InfluxDB metrics support #2972

Add InfluxDB metrics support #2972

Conversation

tomncooper commented Aug 2, 2018 • edited

nwangtw Aug 6, 2018

Choose a reason for hiding this comment

nwangtw Aug 6, 2018

Choose a reason for hiding this comment

nwangtw Aug 6, 2018

Choose a reason for hiding this comment

nwangtw Aug 6, 2018

Choose a reason for hiding this comment

tomncooper commented Aug 8, 2018

Code0x58 commented Aug 11, 2018

Kapacitor & Pull

tomncooper commented Aug 13, 2018

nwangtw commented Aug 13, 2018

tomncooper commented Aug 13, 2018

nwangtw commented Aug 13, 2018

tomncooper commented Aug 2, 2018 •

edited