Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

key_value ignoring whitespace #4920

robp1234 opened this issue Jul 18, 2018 · 5 comments · Fixed by #4927

key_value ignoring whitespace #4920

robp1234 opened this issue Jul 18, 2018 · 5 comments · Fixed by #4927


Copy link

I was directed here from the community forum, as this behaviour appears to be a bug (

I have a set of data with key value pairs separated by a colon and a whitespace ": ". The whitespace is being ignored, resulting in data being split on a colon. This breaks the data, as it contains timestamps (e.g. 12:34:34).

Expected Behaviour

I have a data set coming into graylog, into a pipeline and then using a rule to extract the field names and data values.

email:, event: user login, time: 2018-07-18T08:02:19Z, user_id: 12345

My processing rule looks like this:

                        value: to_string($message.message),
                        delimiters: ",",
                        kv_delimiters: ":",
                        handle_dup_keys: ","

This breaks, as it splits on the timestamp. I changed kv_delimiters to this:

kv_delimiters: ": "

This includes the whitespace, but the behaviour does not change. The split is done exclusively on the colon, breaking timestamps

I tried escaping the whitespace with \ but that results in an error.


The separator is always a ": " but : can occur in a data field, as can whitespace. Being able to split on ": " would be extremely useful.

This is Graylog 2.4.5.


@joschi joschi added the bug label Jul 18, 2018
Copy link

Hello, I'm trying to simulate the bug and noticed another bug.

Using the message in your example:

"email:, event: user login, time: 2018-07-18T08:02:19Z, user_id: 12345"

Not only the timestamp gets truncated in the first occurrence of the colon inside the timestamp but also the event gets truncated in the space character, so the event gets mapped to just "user" and the "login" gets lost.
Can you confirm this other bug is also happening in your environment?

Copy link

Continuing with this, the problem resides in the class:

Graylog is using CharMatcher from Guava library, that is specially built to work with single characters, not strings.
To solve this, it is required to avoid using the CharMatcher and use directly the Splitter passing strings instead of CharMatcher instances, otherwise if your delimiter strings have multiple characters it would split in any of them and not when all of them matches.

Copy link

radykal-com commented Jul 21, 2018

another approach would be to fix current behaviour to only apply the inner splitter on the first occurrence of the kv_delimiter char, so next occurrences inside the value string would not be treated.

This way it can still working with the CharMatcher.

So, how should I try to fix it?
Allow delimiters to be multi-char strings or just ensure that with single chars it works as expected?

Copy link

I decided to just fix the behaviour using CharMatcher, after this fix, setting the kv_delimiter to ":" (without the space) will work as expected

@bernd bernd added this to the 3.0.0 milestone Aug 10, 2018
kroepke pushed a commit to Graylog2/graylog-plugin-pipeline-processor that referenced this issue Sep 17, 2018
Prevent splitting values if delimiter chars are found inside values

Fixes Graylog2/graylog2-server#4920

(cherry picked from Graylog2/graylog2-server@cfcb622ce in Graylog2/graylog2-server#4927)
Copy link

zez3 commented May 6, 2020

@robp1234 Did you perhaps had '?' in your field values ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

Successfully merging a pull request may close this issue.

5 participants