New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probable concurrency bug in GelfChunkAggregator.checkForCompletion () #1561

Closed
dheygere opened this Issue Nov 13, 2015 · 6 comments

Comments

Projects
None yet
3 participants
@dheygere

dheygere commented Nov 13, 2015

Hello,

I investigated on #1544 a little bit further.

For a message with 6 chunks I have the following behaviour:

  • reception of the first chunks...
  • reception of chunk number 3
    • checkForCompletion detects that we have every chunks
    • it fails to construct a full message. Chunks 4, 5 and 6 are not yet in entry.payloadArray
  • reception of chunk number 6
    • checkForCompletion detects that we have every chunks
    • it fails to construct a full message. Chunks 1, 2 and 3 are not yet in entry.payloadArray

I noticed that the test if (chunkWatermark == sequenceCount) may return true multiple times for the same message.

Somehow, incrementAndGet() returns the same value multiple times event though the operation should be atomic.

In graylog.conf we have:

processor_wait_strategy = blocking
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
@dheygere

This comment has been minimized.

dheygere commented Nov 13, 2015

This is probably not the cause, but the method equals() of ChunkEntry will always return false.
Line 255 tries to compare two AtomicInteger:

 if (!chunkSlotsWritten.equals(that.chunkSlotsWritten)) return false;

This comparison is always false (cf. Why are two AtomicIntegers never equal?). But I don't see how that would affect the rest of the code.

Similarly, hashCode() cannot use chunkSlotsWritten.hashCode() (and maybe also payloadArray.hashCode())

      public int hashCode() {
            int result = chunkSlotsWritten.hashCode(); // invalid
            result = 31 * result + (int) (firstTimestamp ^ (firstTimestamp >>> 32));
            result = 31 * result + payloadArray.hashCode(); // also invalid ?
            return result;
        }
@dheygere

This comment has been minimized.

dheygere commented Nov 13, 2015

I found the problem: it's in the GELF Client we are using 😖
Gelf4Net's UDP appender generates consecutive messages with the same messageIds when logging at high frequency.

I don't know what you want to do with my comments on GelfChunkAggregator. Maybe add some defensive code when receiving the same chunk number for the same messageId. Something like:

if (!entry.payloadArray.compareAndSet(chunk.getSequenceNumber(), null, chunk)) {
    log.error("Duplicated messageId {}", chunk.getId());
    return null;
}
@kroepke

This comment has been minimized.

Member

kroepke commented Nov 19, 2015

I agree with #1561 (comment) that we should add some defensive code for that scenario because it is either a programming error in the client, or a maliciously crafted packet.

I'm not overly sure about the error log level, because that can easily flood the server log, too.
At the very least we should add an additional counter metric, I think.

What do you think @joschi ?

@joschi

This comment has been minimized.

Contributor

joschi commented Nov 19, 2015

I'm not overly sure about the error log level, because that can easily flood the server log, too.

This error should not happen often and if it does it's a sign of a broken client, so I think it's ok to flood the logs with those messages if this error occurs.

At the very least we should add an additional counter metric, I think.

Yes.

@kroepke

This comment has been minimized.

Member

kroepke commented Nov 19, 2015

Agreed on the error level.

joschi added a commit that referenced this issue Nov 26, 2015

Fix GelfChunkAggregator.ChunkEntry#equals() and #hashCode()
Refs #1561

 - Add detection of duplicate chunks in GelfChunkAggregator
 - Remove ChunkEntry.{chunkSlotsWritten,payloadArray} from equals() and hashCode()

joschi added a commit that referenced this issue Nov 26, 2015

Fix GelfChunkAggregator.ChunkEntry#equals() and #hashCode()
Refs #1561

 - Add detection of duplicate chunks in GelfChunkAggregator
 - Remove ChunkEntry.{chunkSlotsWritten,payloadArray} from equals() and hashCode()

(cherry picked from commit a748fa6)

joschi added a commit that referenced this issue Nov 26, 2015

Fix GelfChunkAggregator.ChunkEntry#equals() and #hashCode()
Refs #1561

 - Add detection of duplicate chunks in GelfChunkAggregator
 - Remove ChunkEntry.{chunkSlotsWritten,payloadArray} from equals() and hashCode()

(cherry picked from commit a748fa6)
@kroepke

This comment has been minimized.

Member

kroepke commented Nov 26, 2015

fixed by referenced PR, which will be in 1.3 and 2.0.

Many thanks for the investigation!

@kroepke kroepke closed this Nov 26, 2015

@kroepke kroepke added this to the 1.3.0 milestone Nov 26, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment