New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

key_value ignoring whitespace #4920

Closed
robp1234 opened this Issue Jul 18, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@robp1234

robp1234 commented Jul 18, 2018

I was directed here from the community forum, as this behaviour appears to be a bug (https://community.graylog.org/t/alert-processing-query/5964).

I have a set of data with key value pairs separated by a colon and a whitespace ": ". The whitespace is being ignored, resulting in data being split on a colon. This breaks the data, as it contains timestamps (e.g. 12:34:34).

Expected Behaviour

I have a data set coming into graylog, into a pipeline and then using a rule to extract the field names and data values.

email: rob@email.com, event: user login, time: 2018-07-18T08:02:19Z, user_id: 12345

My processing rule looks like this:

                    key_value(
                        value: to_string($message.message),
                        delimiters: ",",
                        kv_delimiters: ":",
                        allow_dup_keys:true,
                        handle_dup_keys: ","

This breaks, as it splits on the timestamp. I changed kv_delimiters to this:

kv_delimiters: ": "

This includes the whitespace, but the behaviour does not change. The split is done exclusively on the colon, breaking timestamps

I tried escaping the whitespace with \ but that results in an error.

Context

The separator is always a ": " but : can occur in a data field, as can whitespace. Being able to split on ": " would be extremely useful.

This is Graylog 2.4.5.

thanks
Rob

@joschi joschi added the bug label Jul 18, 2018

@radykal-com

This comment has been minimized.

Contributor

radykal-com commented Jul 20, 2018

Hello, I'm trying to simulate the bug and noticed another bug.

Using the message in your example:

"email: rob@email.com, event: user login, time: 2018-07-18T08:02:19Z, user_id: 12345"

Not only the timestamp gets truncated in the first occurrence of the colon inside the timestamp but also the event gets truncated in the space character, so the event gets mapped to just "user" and the "login" gets lost.
Can you confirm this other bug is also happening in your environment?

@radykal-com

This comment has been minimized.

Contributor

radykal-com commented Jul 20, 2018

Continuing with this, the problem resides in the class:

https://github.com/Graylog2/graylog2-server/blob/master/graylog2-server/src/main/java/org/graylog/plugins/pipelineprocessor/functions/strings/KeyValue.java

Graylog is using CharMatcher from Guava library, that is specially built to work with single characters, not strings.
To solve this, it is required to avoid using the CharMatcher and use directly the Splitter passing strings instead of CharMatcher instances, otherwise if your delimiter strings have multiple characters it would split in any of them and not when all of them matches.

@radykal-com

This comment has been minimized.

Contributor

radykal-com commented Jul 21, 2018

another approach would be to fix current behaviour to only apply the inner splitter on the first occurrence of the kv_delimiter char, so next occurrences inside the value string would not be treated.

This way it can still working with the CharMatcher.

So, how should I try to fix it?
Allow delimiters to be multi-char strings or just ensure that with single chars it works as expected?

@radykal-com

This comment has been minimized.

Contributor

radykal-com commented Jul 21, 2018

I decided to just fix the behaviour using CharMatcher, after this fix, setting the kv_delimiter to ":" (without the space) will work as expected

@bernd bernd closed this in #4927 Aug 10, 2018

@bernd bernd added this to the 3.0.0 milestone Aug 10, 2018

kroepke added a commit to Graylog2/graylog-plugin-pipeline-processor that referenced this issue Sep 17, 2018

Fix KeyValue rule function to split entries once (#249)
Prevent splitting values if delimiter chars are found inside values

Fixes Graylog2/graylog2-server#4920

(cherry picked from Graylog2/graylog2-server@cfcb622 in Graylog2/graylog2-server#4927)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment