Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail fast and loud for invalid GELF messages #3972

Merged
merged 5 commits into from Jul 5, 2017
Merged

Fail fast and loud for invalid GELF messages #3972

merged 5 commits into from Jul 5, 2017

Conversation

@joschi
Copy link
Contributor

@joschi joschi commented Jul 3, 2017

Instead of waiting for a later stage and "silently" dropping (logged on DEBUG)
invalid messages, GelfCodec now actively checks for the existence and validity
of mandatory GELF message fields (such as "version", "host", "short_message", and
"timestamp", according to the GELF spec).

Refs http://docs.graylog.org/en/2.2/pages/gelf.html
Fixes #3970

Jochen Schalanda
Instead of waiting for a later stage and "silently" dropping (logged on DEBUG)
invalid messages, `GelfCodec` now actively checks for the existence and validity
of mandatory GELF message fields (such as "version", "host", "short_message", and
"timestamp", according to the GELF spec).

Refs http://docs.graylog.org/en/2.2/pages/gelf.html
Fixes #3970
@joschi joschi added this to the 2.3.0 milestone Jul 3, 2017
Jochen Schalanda
There are still many GELF client libraries out there using either "1.0" or
no value at all for the GELF "version" field.
@bernd bernd self-assigned this Jul 3, 2017
Copy link
Member

@bernd bernd left a comment

We cannot add this because until now, we accepted minimal GELF messages like these:

  • {"message":"hello"}
  • {"short_message":"hello"}

If we drop messages like these now, lots of libraries and custom code will break.

@joschi
Copy link
Contributor Author

@joschi joschi commented Jul 3, 2017

@bernd An alternative would be to allow messages with any fields in DecodingProcessor, so that invalid (GELF) message, e. g. the ones with missing "short_message" or "message" field respectively, won't be dropped at a later stage.

Anyway, we should clarify this in the GELF "specification" in the documentation because right now it's non-obvious (and impossible to find out without reading code of each respective Graylog release) to know why a message was discarded.

@joschi joschi closed this Jul 3, 2017
@joschi joschi deleted the issue-3970 branch Jul 3, 2017
@joschi joschi removed the ready-for-review label Jul 3, 2017
@bernd
Copy link
Member

@bernd bernd commented Jul 4, 2017

@joschi Can't we use the check you implemented for the short_message and message field? That would already help #3970, no?

I am not sure about adjusting the GELF spec. The problem is that we did a really bad implementation of the spec. (the irony) Telling people that our implementation accepts basically everything as long as a short_message or message field exists might motivate people to actually do that instead of following the spec. 😉 IMHO this makes the problem worse for them in the future when we have GELF 2.0 and do a proper implementation. I would rather use the check and log message you implemented to check for the short_message and message field. (as mentioned above)

@bernd
Copy link
Member

@bernd bernd commented Jul 4, 2017

@joschi Any comments?

@joschi joschi restored the issue-3970 branch Jul 5, 2017
@joschi joschi reopened this Jul 5, 2017
@joschi joschi added the in progress label Jul 5, 2017
Jochen Schalanda added 2 commits Jul 5, 2017
Jochen Schalanda
Jochen Schalanda
@joschi joschi added ready-for-review and removed in progress labels Jul 5, 2017
final JsonNode hostNode = jsonNode.path("host");
if (hostNode.isMissingNode()) {
log.warn(prefix + "is missing mandatory \"host\" field.");
return;

This comment has been minimized.

@bernd

bernd Jul 5, 2017
Member

Shouldn't we check for the short_message / message fields first before returning? As it is now, a missing host field would skip the short_message / message field checks.

This comment has been minimized.

@joschi

joschi Jul 5, 2017
Author Contributor

Correct, this is missing the alternative case.

Jochen Schalanda
@bernd
bernd approved these changes Jul 5, 2017
@bernd
Copy link
Member

@bernd bernd commented Jul 5, 2017

@joschi Thank you! 👍

@bernd bernd merged commit 090805b into master Jul 5, 2017
4 checks passed
4 checks passed
ci-web-linter Jenkins build graylog-pr-linter-check 1801 has succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
graylog-project/pr Jenkins build graylog-project-pr-snapshot 309 has succeeded
Details
@bernd bernd deleted the issue-3970 branch Jul 5, 2017
@bernd bernd removed the ready-for-review label Jul 5, 2017
@NicholasMeacoe
Copy link

@NicholasMeacoe NicholasMeacoe commented Jul 24, 2017

Hi, when trying out the RC1 build noticed this was now stopping pretty much all of the traffic shipped over UDP via Gelf4Net. The datetime object it passes over contains the timezone, and the code change to only accept a number kicks the message out. If we get Gelf4Net changed, will we need to send in another field to indicate the timezone? We have packets coming over the wire from UK, US and APAC.

Cheers,
Nick

@joschi
Copy link
Contributor Author

@joschi joschi commented Jul 26, 2017

@NicholasMeacoe This has been addressed in #4027/#4031/#4034.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.