New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiMessageCodec interface #1307

Merged
merged 6 commits into from Jul 20, 2015

Conversation

Projects
None yet
4 participants
@bernd
Member

bernd commented Jul 16, 2015

It makes it possible to decode a list of Message objects from a RawMessage instead of just a single one. This is useful for protocols like NetFlow or collectd where one packet contains multiple messages.

The new interface is needed to keep the Codec interface compatible. A breaking change is only possible with the next major version.

Adjust DecodingProcessor and ProcessBufferProcessor to handle a list of Message objects.


I created another branch with a different implementation here:

https://github.com/Graylog2/graylog2-server/compare/feature-message-list-codec-v1

The v1 branch tries to minimize the message list usage, this one (v2) converts the single message to a list as soon as possible to avoid special cases all over the place.

I benchmarked both branches agains master and it looks like there isn't much difference. All branches have been benchmarked twice.

decoder-bench

I prefer this diff. Any opinions?

Add MessageListCodec interface.
It makes it possible to decode a list of Message objects from a
RawMessage instead of just a single one. This is useful for protocols
like NetFlow or collectd where one packet contains multiple messages.

The new interface is needed to keep the Codec interface compatible. A
breaking change is only possible with the next major version.

Adjust DecodingProcessor and ProcessBufferProcessor to handle a list of
Message objects.
@lennartkoopmann

This comment has been minimized.

Member

lennartkoopmann commented Jul 16, 2015

I initially cringed about the instanceof and the costs I feared it would add but the benchmarks look like we add almost no measurable processing complexity to it.

The need for the feature/change is definitely given. I was stumbling upon this with other inputs already, too. Thanks!

}
}
private void dispatchMessage(final Message msg) {
incomingMessages.mark();
final Timer.Context tcx = processTime.time();

This comment has been minimized.

@joschi

joschi Jul 16, 2015

Contributor

Minor issue: tcx should be used in a try-with-resources block below.

@@ -66,6 +77,7 @@ public void setMessage(@Nullable final Message msg)
public void setRaw(@Nonnull RawMessage raw) {
this.raw = raw;
setMessage(null);
setMessageList(null);

This comment has been minimized.

@joschi

joschi Jul 16, 2015

Contributor

Nulling messageList should also be mentioned in the Javadoc of this method.

@@ -45,6 +46,7 @@ public MessageEvent newInstance()
private RawMessage raw;
private Message msg;
private List<Message> msgList;

This comment has been minimized.

@joschi

joschi Jul 16, 2015

Contributor

Could we make this field a Collection<Message> and just call it messages? I don't think there's any advantage in using a List here.

This comment has been minimized.

@bernd

bernd Jul 17, 2015

Member

Aye.

import javax.annotation.Nullable;
import java.util.List;
public interface MessageListCodec extends Codec {

This comment has been minimized.

@joschi

joschi Jul 16, 2015

Contributor

Maybe MultiMessageCodec would describe the purpose of this interface better.

This comment has been minimized.

@bernd

bernd Jul 17, 2015

Member

Aye, thanks!

@@ -45,6 +46,7 @@ public MessageEvent newInstance()
private RawMessage raw;
private Message msg;

This comment has been minimized.

@joschi

joschi Jul 16, 2015

Contributor

It would probably make sense to get rid of the single message field, express getMessage() and setMessage() as delegates to getMessageList() and setMessageList(), and mark the singular methods as deprecated.

This comment has been minimized.

@bernd

bernd Jul 17, 2015

Member

I kept that because the OutputBuffer is only using the single message field and does not need a list of messages.

This comment has been minimized.

@joschi

joschi Jul 17, 2015

Contributor

I think that's a disaster waiting to happen. Either msg or msgList is not null but never both. So you have to know exactly, which one has been set before or always check both.

This comment has been minimized.

@bernd

bernd Jul 17, 2015

Member

Maybe we should just have two different MessageEvent classes. One for the ProcessBuffer with a collection of messages and one for the OutputBuffer with a single message.

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015

Member

That might make sense, but let's wait a little before making that change.
If message filters eventually want to synthesize messages on their own, they need a place to inject those, too.

bernd added some commits Jul 17, 2015

Refactor after PR comments.
- Use the more generic Collection instead of List in MessageEvent.
- Rename getMessageList/setMessageList methods to getMessages/setMessages.
- Adjust JavaDoc for setRaw in MessageEvent.
- Rename MessageListCodec to MultiMessageCodec.
@bernd

This comment has been minimized.

Member

bernd commented Jul 17, 2015

Updated after comments.

public interface MultiMessageCodec extends Codec {
@Nullable
List<Message> decodeMessages(@Nonnull RawMessage rawMessage);

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015

Member

Also Collection<Message> as in the event class?

This comment has been minimized.

@bernd

bernd Jul 17, 2015

Member

Oupts, good catch.

@bernd

This comment has been minimized.

Member

bernd commented Jul 17, 2015

Updated after comments.

@@ -76,17 +81,20 @@ public void onEvent(MessageEvent event, long sequence, boolean endOfBatch) throw
try {
// always set the result of processMessage, even if it is null, to avoid later stages to process old messages.
// basically this will make sure old messages are cleared out early.
event.setMessage(processMessage(event.getRaw()));
event.setMessages(processMessage(event.getRaw()));

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015

Member

Changing it this way will force two new object allocations for each message (and single messages are the norm).
Could we refactor this so that we don't have to do those two allocations and simply change the ProcessBufferProcessor#onEvent to figure out whether to invoke the MessageFilter more than once?

messages = ((MultiMessageCodec) codec).decodeMessages(raw);
} else {
final Message message = codec.decode(raw);
messages = message == null ? null : Collections.singletonList(message);

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015

Member

first allocation (see above comment)

return null;
}
final List<Message> processedMessages = Lists.newArrayListWithCapacity(messages.size());

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015

Member

second allocation

for (final Message message : messageList) {

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015

Member

this would simply change into if (event.getMsg() == null) { for ...} else {dispatchMessage(event.getMsg())}, right?

Avoid collection allocation and iteration when possible.
This adds a few more null checks but avoids collection allocation and
iteration in a few places.

@bernd bernd changed the title from Add MessageListCodec interface to Add MultiMessageCodec interface Jul 20, 2015

@bernd

This comment has been minimized.

Member

bernd commented Jul 20, 2015

Updated to avoid collection allocations where possible.

@joschi

This comment has been minimized.

Contributor

joschi commented Jul 20, 2015

LGTM. 👍

@joschi joschi added this to the 1.2.0 milestone Jul 20, 2015

@joschi joschi self-assigned this Jul 20, 2015

joschi added a commit that referenced this pull request Jul 20, 2015

@joschi joschi merged commit bbd12e0 into master Jul 20, 2015

2 checks passed

ci Jenkins build graylog2-server-integration-pr 66 has succeeded
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@joschi joschi deleted the feature-message-list-codec-v2 branch Jul 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment