New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid gelf message causes processing to stop #2627

Closed
colmaengus opened this Issue Aug 8, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@colmaengus

colmaengus commented Aug 8, 2016

Expected Behavior

I would expect that bad messages would be logged and dropped rather than block message processing.

Current Behavior

As soon as the bad message is received processing stops and the journal starts filling up an no further messages are written to elasticsearch. Restarting graylog has no impact.

Steps to Reproduce (for bugs)

I've not got an easy way to reproduce this but it occurs in our setup when we include logs including base64 encoded images (not intentionally but appear due to a kafka logging)
(see log snippet below)

Context

Your Environment

  • Graylog Version: 2.1.0-beta.2
  • Elasticsearch Version: 2.3.3
  • MongoDB Version:3
  • Operating System: Ubuntu
8/5/2016 7:41:43 PM2016-08-05 18:41:43,359 INFO : org.graylog2.inputs.InputStateListener - Input [GELF UDP/5783c5fbcff47e000122fc78] is now RUNNING
8/5/2016 7:41:43 PM2016-08-05 18:41:43,668 ERROR: org.graylog2.shared.buffers.processors.DecodingProcessor - Unable to decode raw message 4163be40-5b3c-11e6-8f49-0242ac110007 (journal offset 350) encoded as gelf received from unknown source.
8/5/2016 7:41:43 PM2016-08-05 18:41:43,678 ERROR: org.graylog2.shared.buffers.processors.DecodingProcessor - Error processing message RawMessage{id=4163be40-5b3c-11e6-8f49-0242ac110007, journalOffset=350, codec=gelf, payloadSize=37319, timestamp=2016-08-05T18:41:43.460Z}
8/5/2016 7:41:43 PMjava.util.zip.ZipException: incorrect data check
8/5/2016 7:41:43 PM at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164) ~[?:1.8.0_91]
8/5/2016 7:41:43 PM at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[?:1.8.0_91]
8/5/2016 7:41:43 PM at com.google.common.io.ByteStreams.copy(ByteStreams.java:110) ~[graylog.jar:?]
8/5/2016 7:41:43 PM at com.google.common.io.ByteStreams.toByteArray(ByteStreams.java:168) ~[graylog.jar:?]
8/5/2016 7:41:43 PM at org.graylog2.plugin.Tools.decompressZlib(Tools.java:190) ~[graylog.jar:?]
8/5/2016 7:41:43 PM at org.graylog2.inputs.codecs.gelf.GELFMessage.getJSON(GELFMessage.java:55) ~[graylog.jar:?]
8/5/2016 7:41:43 PM at org.graylog2.inputs.codecs.GelfCodec.decode(GelfCodec.java:110) ~[graylog.jar:?]
8/5/2016 7:41:43 PM at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:136) ~[graylog.jar:?]
8/5/2016 7:41:43 PM at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:82) [graylog.jar:?]
8/5/2016 7:41:43 PM at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:58) [graylog.jar:?]
8/5/2016 7:41:43 PM at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:35) [graylog.jar:?]
8/5/2016 7:41:43 PM at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:139) [graylog.jar:?]
8/5/2016 7:41:43 PM at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
8/5/2016 7:41:43 PM at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

@bernd bernd added the bug label Aug 8, 2016

@bernd bernd added this to the 2.1.0 milestone Aug 8, 2016

@bernd bernd added S2 P2 labels Aug 8, 2016

@bernd bernd self-assigned this Aug 8, 2016

@bernd

This comment has been minimized.

Member

bernd commented Aug 8, 2016

@colmaengus Do you have any other inputs that receive messages on that Graylog system or is the input with the faulty messages your only one that receives messages?

@colmaengus

This comment has been minimized.

colmaengus commented Aug 8, 2016

The logs are coming from a docker container via fluentd. When I get time I'll try to isolate the error source to see if it will help you reproduce the blockage.

Sent from my iPhone

On 8 Aug 2016, at 17:37, Bernd Ahlers notifications@github.com wrote:

@colmaengus Do you have any other inputs that receive messages on that Graylog system or is the input with the faulty messages your only one that receives messages?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@colmaengus

This comment has been minimized.

colmaengus commented Aug 8, 2016

We only have one input right now and fluentd is using it.

Sent from my iPhone

On 8 Aug 2016, at 17:37, Bernd Ahlers notifications@github.com wrote:

@colmaengus Do you have any other inputs that receive messages on that Graylog system or is the input with the faulty messages your only one that receives messages?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

bernd added a commit that referenced this issue Aug 9, 2016

Mark message offset as committed in case of a decoding error
This fixes an edge case where the journal grows when there is only one
input and no message can be decoded.

Fixes #2627
@bernd

This comment has been minimized.

Member

bernd commented Aug 9, 2016

@colmaengus Thank you for the update!

You are running into an edge case that happens when there is only one input and no message can be decoded correctly. In that case the journal just grows because we are not committing the processed offset back to the journal.

If you would have another input that receives messages (which can be decoded), this issue wouldn't happen.

This will be fixed with #2643.

@joschi joschi closed this in #2643 Aug 9, 2016

joschi added a commit that referenced this issue Aug 9, 2016

Mark message offset as committed in case of a decoding error (#2643)
This fixes an edge case where the journal grows when there is only one
input and no message can be decoded.

Fixes #2627

@kroepke kroepke added triaged and removed triaged labels Sep 21, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment