Probable concurrency bug in GelfChunkAggregator.checkForCompletion () #1561
Comments
This is probably not the cause, but the method equals() of ChunkEntry will always return false. if (!chunkSlotsWritten.equals(that.chunkSlotsWritten)) return false; This comparison is always false (cf. Why are two AtomicIntegers never equal?). But I don't see how that would affect the rest of the code. Similarly, hashCode() cannot use public int hashCode() {
int result = chunkSlotsWritten.hashCode(); // invalid
result = 31 * result + (int) (firstTimestamp ^ (firstTimestamp >>> 32));
result = 31 * result + payloadArray.hashCode(); // also invalid ?
return result;
} |
I found the problem: it's in the GELF Client we are using I don't know what you want to do with my comments on GelfChunkAggregator. Maybe add some defensive code when receiving the same chunk number for the same messageId. Something like: if (!entry.payloadArray.compareAndSet(chunk.getSequenceNumber(), null, chunk)) {
log.error("Duplicated messageId {}", chunk.getId());
return null;
} |
I agree with #1561 (comment) that we should add some defensive code for that scenario because it is either a programming error in the client, or a maliciously crafted packet. I'm not overly sure about the error log level, because that can easily flood the server log, too. What do you think @joschi ? |
This error should not happen often and if it does it's a sign of a broken client, so I think it's ok to flood the logs with those messages if this error occurs.
Yes. |
Agreed on the error level. |
Refs #1561 - Add detection of duplicate chunks in GelfChunkAggregator - Remove ChunkEntry.{chunkSlotsWritten,payloadArray} from equals() and hashCode()
fixed by referenced PR, which will be in 1.3 and 2.0. Many thanks for the investigation! |
Hello,
I investigated on #1544 a little bit further.
For a message with 6 chunks I have the following behaviour:
I noticed that the test if (chunkWatermark == sequenceCount) may return true multiple times for the same message.
Somehow, incrementAndGet() returns the same value multiple times event though the operation should be atomic.
In graylog.conf we have:
The text was updated successfully, but these errors were encountered: