Adding ParserSpec for Influx Line Protocol #5440

njhartwell · 2018-02-28T20:02:44Z

This is a fairly complete parser for the current version of Influx Line Protocol (https://docs.influxdata.com/influxdb/v1.4/write_protocols/line_protocol_tutorial/) which is a common format for sending time series metric data.

b-slim · 2018-02-28T20:06:40Z

extensions-contrib/influx-extensions/src/main/java/io/druid/data/input/influx/InfluxParser.java

+    return new Double(raw);
+  }
+
+  // TODO: Support returning numeric value?


can you please fix this?

This would allow true/false to be converted to an int 1/0, which would exposing an additional config param and might not be very useful. If you think it's necessary / highly useful, I could add it but my first inclination is to wait and see if anyone needs it before complicating the code and config. If it would be sufficient to just remove the TODO that would be my preference.

b-slim · 2018-02-28T20:08:18Z

could you please fill CLA http://druid.io/community/cla.html ?

njhartwell · 2018-02-28T22:03:15Z

Hi @b-slim, thanks for the quick feedback. I will work with my employer (Target) to get the ball rolling on signing the corporate CLA. It might take a while but I will let you know as soon as I hear from the authorities.

jon-wei · 2018-03-06T19:45:54Z

extensions-contrib/influx-extensions/src/main/java/io/druid/data/input/influx/InfluxParser.java

+
+  private void parseTimestamp(String timestamp, Map<String, Object> dest)
+  {
+    if (timestamp.length() < 7) {


Can you add a comment about the nanosecond->millisecond conversion here?

Some other comments:

Should timestamps under 1 millisecond be treated as invalid? To me it seems like 0 would be a better return value for that case, or rounding the number to nearest milli

The truncation could let bad timestamps slip by, e.g., if I got a timestamp "152036516185300abcd", this is probably bad data but this conversion would allow that timestamp

Thanks for the feedback @jon-wei; I'll add comment and improve default handling there. Still waiting on the Man to sign the CLA 😁; should be pretty soon.

- Remove extraneous TODO - Better handling of parse errors (e.g. invalid timestamp) - Handle sub-millisecond timestamps

njhartwell · 2018-03-08T22:23:36Z

CLA was signed and submitted; I made one additional commit to address PR feedback (removed todo, better handling of short / invalid timestamps) but did not squash to make it easier to see what changed -- I can squash if desired.

drcrallen · 2018-03-10T23:44:33Z

super cool, @njhartwell can you please make sure to add documentation for the extension?

jon-wei · 2018-03-12T21:30:24Z

...ons-contrib/influx-extensions/src/test/java/io/druid/data/input/influx/InfluxParserTest.java

+        ),
+        testCase(
+            "truncated timestamp",
+            "foo,region=us-east-1,host=127.0.0.1 m=1.0,n=3.0,o=500i 123",


I think a timestamp like "500i 123" should throw a ParseException instead of being treated as 0, can you add a check on the truncated portion to make sure that it's also numeric?

The 500i there is the (integer) value for the measurement named o and 123 is a valid timestamp. Invalid timestamps are caught by the parser since a valid line has to end with a number, newline or EOF.

Ah okay, sorry, misread it, I see you have a case already for that type of failure in testParseFailures

jon-wei · 2018-03-12T22:01:55Z

LGTM, I'll merge after docs, the merge will auto-squash

njhartwell · 2018-03-13T15:57:03Z

Thanks @jon-wei! Just added docs

josephglanville

Docs nits.

josephglanville · 2018-03-13T16:11:00Z

docs/content/development/extensions-contrib/influx.md

+
+To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-influx-extensions`.
+
+This extension enables Druid to parse [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v1.5/write_protocols/line_protocol_tutorial/), a popular text-based timeseries metric serialization format. 


Perhaps "parse the InfluxDB Line Protocol" would read slightly better.

josephglanville · 2018-03-13T16:12:00Z

docs/content/development/extensions-contrib/influx.md

+```cpu,application=dbhost=prdb123,region=us-east-1 usage_idle=99.24,usage_user=0.55 1520722030000000000```
+
+which contains four parts:
+  - measurement: A string indicating what kind of measurement is represented (e.g. cpu, network, web_requests)


I would stick with "name" over "kind" here. It's also the nomenclature in the Influx docs that you linked above.

josephglanville · 2018-03-13T16:13:20Z

docs/content/development/extensions-contrib/influx.md

+  - measurements: one or more key-value pairs; values can be numeric, boolean, or string
+  - timestamp: nanoseconds since Unix epoch (the parser truncates it to milliseconds)
+
+The parser extracts these fields into a map, giving the measurement the key `measurement` and the timestamp the key `_ts`. The tag and measurement keys are copied verbatum, so users should take care to avoid name collisions. It is up to the ingestion spec to decide which fields should be treated as dimensions and which should be treated as metrics (typically tags correspond to dimensions and values correspond to measurements).


s/verbatum/verbatim/

"metrics" and "measurements" seem to be used interchangeably here but I believe it should always be "metrics" no?

Thanks for the feedback @josephglanville. Fixed first three as you suggested. measurements is the term used in the influx line protocol to refer to the second set of k/v pairs and I stuck with that in the docs here to avoid ambiguity with Druid metrics since they don't necessarily line up--you might have some influx measurements that you ingest as dimensions. That suggestion did lead me to fix this line "typically tags correspond to dimensions and values correspond to measurements" which I updated to "typically tags correspond to dimensions and measurements correspond to metrics".

josephglanville · 2018-03-13T23:46:09Z

LGTM. 👍

njhartwell · 2018-03-26T21:16:49Z

@jon-wei do you need anything else on this or can it be merged? Thanks!

jon-wei · 2018-03-26T21:28:51Z

@njhartwell thanks for the contrib!

gianm · 2018-03-27T18:36:29Z

Hi @njhartwell, can you please fill out the Druid CLA here: http://druid.io/community/cla.html

njhartwell · 2018-03-27T19:33:43Z

Just did, although this contribution is on behalf of Target Corp. which I am told has already signed the corporate CLA.

gianm · 2018-03-27T20:38:28Z

Got it, thanks for making that clear.

leventov · 2018-11-12T16:40:19Z

extensions-contrib/influx-extensions/src/main/java/io/druid/data/input/influx/InfluxParser.java

+
+  private Object parseQuotedString(String text)
+  {
+    return text.substring(1, text.length() - 1).replaceAll("\\\\\"", "\"");


Is this the right number of back slashes (5)? Maybe just 4 needed?

Yes, 4 slashes is a syntax error. The first four are read into the java string as two backlashes, and the fifth is paired with the double quote, so that the string passed to the regex library is \\" which matches the string \", i.e. an escaped quote.

Adding ParserSpec for Influx Line Protocol

8c51d8c

b-slim reviewed Feb 28, 2018

View reviewed changes

jon-wei reviewed Mar 6, 2018

View reviewed changes

Addressing PR feedback

c2c9e03

- Remove extraneous TODO - Better handling of parse errors (e.g. invalid timestamp) - Handle sub-millisecond timestamps

jon-wei reviewed Mar 12, 2018

View reviewed changes

Adding documentation for Influx parser

b0f8d10

josephglanville reviewed Mar 13, 2018

View reviewed changes

Fixing docs

d0e4186

jon-wei merged commit ea30c05 into apache:master Mar 26, 2018

njhartwell deleted the influx-parser-spec branch July 25, 2018 21:00

gianm mentioned this pull request Aug 23, 2018

Adding licenses and enable apache-rat-plugin. #6215

Merged

dclim added this to the 0.13.0 milestone Oct 8, 2018

dclim mentioned this pull request Oct 10, 2018

Druid 0.13.0-incubating release notes #6442

Closed

leventov reviewed Nov 12, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ParserSpec for Influx Line Protocol #5440

Adding ParserSpec for Influx Line Protocol #5440

njhartwell commented Feb 28, 2018

b-slim Feb 28, 2018

njhartwell Feb 28, 2018

b-slim commented Feb 28, 2018

njhartwell commented Feb 28, 2018

jon-wei Mar 6, 2018

njhartwell Mar 6, 2018

njhartwell commented Mar 8, 2018

drcrallen commented Mar 10, 2018

jon-wei Mar 12, 2018

njhartwell Mar 12, 2018

jon-wei Mar 12, 2018

jon-wei commented Mar 12, 2018

njhartwell commented Mar 13, 2018

josephglanville left a comment

josephglanville Mar 13, 2018

josephglanville Mar 13, 2018

josephglanville Mar 13, 2018

josephglanville Mar 13, 2018

njhartwell Mar 13, 2018 •

edited

josephglanville commented Mar 13, 2018

njhartwell commented Mar 26, 2018

jon-wei commented Mar 26, 2018

gianm commented Mar 27, 2018

njhartwell commented Mar 27, 2018

gianm commented Mar 27, 2018

leventov Nov 12, 2018

njhartwell Nov 12, 2018


		To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-influx-extensions`.

		This extension enables Druid to parse [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v1.5/write_protocols/line_protocol_tutorial/), a popular text-based timeseries metric serialization format.

Adding ParserSpec for Influx Line Protocol #5440

Adding ParserSpec for Influx Line Protocol #5440

Conversation

njhartwell commented Feb 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-slim commented Feb 28, 2018

njhartwell commented Feb 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhartwell commented Mar 8, 2018

drcrallen commented Mar 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei commented Mar 12, 2018

njhartwell commented Mar 13, 2018

josephglanville left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhartwell Mar 13, 2018 • edited

Choose a reason for hiding this comment

josephglanville commented Mar 13, 2018

njhartwell commented Mar 26, 2018

jon-wei commented Mar 26, 2018

gianm commented Mar 27, 2018

njhartwell commented Mar 27, 2018

gianm commented Mar 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhartwell Mar 13, 2018 •

edited