Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Add InfluxDB metrics support #2972

Closed
wants to merge 32 commits into from
Closed

Add InfluxDB metrics support #2972

wants to merge 32 commits into from

Conversation

tomncooper
Copy link
Contributor

@tomncooper tomncooper commented Aug 2, 2018

This PR adds a new metrics sink for InfluxDB. It has options for batch sending (at the flush frequency) or sending as soon as metrics are received. I have updated the website source with documentation and added config stubs to all the metric-sinks.yaml files I could find.

This has been tested with the Local scheduler, but not any others.

@tomncooper tomncooper changed the title Add InfluxDB metirics support Add InfluxDB metrics support Aug 2, 2018
import org.influxdb.InfluxDBFactory;
import org.influxdb.dto.BatchPoints;
import org.influxdb.dto.Point;
import org.influxdb.BatchOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort

serverHost = (String) conf.get(SERVER_HOST_KEY);

// The InfluxDB client expects the host connection string to begin with http or https
if(!serverHost.contains("http://") & !serverHost.contains("https://")){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startsWith() and && might be better?

And please add a space after "if" and before "{"

* @param context context object containing information on the Topology and Heron system this sink
* is registered with.
*/
public void init(Map<String, Object> conf, SinkContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a bit long. suggest to refactor some blocks into smaller ones.

*/
public void flush() {
if(batchEnabled) {
LOG.info("Flushing buffered metrics to InfluxDB");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad indent

@tomncooper
Copy link
Contributor Author

@nwangtw Hope those changes address your comments. Thanks for looking at this for me.

@Code0x58
Copy link
Contributor

Out of curiosity, is there a pull based alternative to this which doesn't require changes? I was inclined to look as this introduces a lot of dependencies. Here's a post from May 2017 that made me think it might be an option:

Kapacitor & Pull

With this release, we’ve integrated Prometheus’ service discovery and scraping code into Kapacitor. That means that any service discovery target that works with Prometheus will work with Kapacitor. Combined with a TICK script, you can use Kapacitor to monitor Prometheus scrape targets, write data into InfluxDB, and perform filtering, transformation and other tasks. With Kapacitor’s user defined functions (UDFs) it becomes trivial to pull in advanced anomaly detection and custom logic.

@tomncooper
Copy link
Contributor Author

I agree the added dependencies are excessive. I could rewrite the client to do raw HTTP requests but I would simply be replicating the work of Influx client.

To cut down on dependencies, could I package the influx sink on maven central and have it as a single external dependency? Or is that overkill?

The Prometheus/Kapacitor option is a great solution (wish I had found it before I wrote the Influx sink). However, it does require Kapacitor (a stream processing engine) to be setup, so we have a stream processing engine to monitor a stream processing engine ("who watches the watchmen", "it's turtles all the way down", etc etc).

I think this highlights an issue with the centralised metrics sink design in Heron. In Storm the sink is part of the topology and is included in the deployed fat JAR, so is just pulled in as needed by the topology developers. I am wondering if there may be a way to do an "on demand" version of sinks for Heron? Probably not as they are part of the metrics manager.

@nwangtw
Copy link
Contributor

nwangtw commented Aug 13, 2018

"I could rewrite the client to do raw HTTP requests but I would simply be replicating the work of Influx client."

@Code0x58 made a good point. For a metrics sink, it is quite strange to me that so many dependencies are required. In theory, a sink client should be just a simple wrapper for a few HTTP requests, so it may only need a HTTP lib (and/or thrift requests but still). I am wondering how these dependencies are used and if there is a simpler solution.

@tomncooper
Copy link
Contributor Author

Well most of the dependencies come from the http lib that the influxDB java client is using (com_squareup_okhttp3). It looks excessive as I have to list every sub dependency in the WORKSPACE file. Is there a way for bazel to resolve the java dependencies itself? Then I would just need to include the InfluxDB client.

If adding these deps really is a hard no, then I can rewrite the sink to just do HTTP requests on the InfluxDB line protocol using the Apache Common client we are already including.

@nwangtw
Copy link
Contributor

nwangtw commented Aug 13, 2018

I see.

It is not a hard no, just makes maintenance a bit harder. Personally I am ok (it seems this is the official influxDB client), if they are necessary.

@tomncooper tomncooper closed this Oct 31, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants