Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiMessageCodec interface #1307

Merged
merged 6 commits into from Jul 20, 2015
Merged

Add MultiMessageCodec interface #1307

merged 6 commits into from Jul 20, 2015

Conversation

@bernd
Copy link
Member

@bernd bernd commented Jul 16, 2015

It makes it possible to decode a list of Message objects from a RawMessage instead of just a single one. This is useful for protocols like NetFlow or collectd where one packet contains multiple messages.

The new interface is needed to keep the Codec interface compatible. A breaking change is only possible with the next major version.

Adjust DecodingProcessor and ProcessBufferProcessor to handle a list of Message objects.


I created another branch with a different implementation here:

https://github.com/Graylog2/graylog2-server/compare/feature-message-list-codec-v1

The v1 branch tries to minimize the message list usage, this one (v2) converts the single message to a list as soon as possible to avoid special cases all over the place.

I benchmarked both branches agains master and it looks like there isn't much difference. All branches have been benchmarked twice.

decoder-bench

I prefer this diff. Any opinions?

It makes it possible to decode a list of Message objects from a
RawMessage instead of just a single one. This is useful for protocols
like NetFlow or collectd where one packet contains multiple messages.

The new interface is needed to keep the Codec interface compatible. A
breaking change is only possible with the next major version.

Adjust DecodingProcessor and ProcessBufferProcessor to handle a list of
Message objects.
@lennartkoopmann
Copy link
Member

@lennartkoopmann lennartkoopmann commented Jul 16, 2015

I initially cringed about the instanceof and the costs I feared it would add but the benchmarks look like we add almost no measurable processing complexity to it.

The need for the feature/change is definitely given. I was stumbling upon this with other inputs already, too. Thanks!

}
}

private void dispatchMessage(final Message msg) {
incomingMessages.mark();
final Timer.Context tcx = processTime.time();

This comment has been minimized.

@joschi

joschi Jul 16, 2015
Contributor

Minor issue: tcx should be used in a try-with-resources block below.

@@ -66,6 +77,7 @@ public void setMessage(@Nullable final Message msg)
public void setRaw(@Nonnull RawMessage raw) {
this.raw = raw;
setMessage(null);
setMessageList(null);

This comment has been minimized.

@joschi

joschi Jul 16, 2015
Contributor

Nulling messageList should also be mentioned in the Javadoc of this method.

@@ -45,6 +46,7 @@ public MessageEvent newInstance()

private RawMessage raw;
private Message msg;
private List<Message> msgList;

This comment has been minimized.

@joschi

joschi Jul 16, 2015
Contributor

Could we make this field a Collection<Message> and just call it messages? I don't think there's any advantage in using a List here.

This comment has been minimized.

@bernd

bernd Jul 17, 2015
Author Member

Aye.

import javax.annotation.Nullable;
import java.util.List;

public interface MessageListCodec extends Codec {

This comment has been minimized.

@joschi

joschi Jul 16, 2015
Contributor

Maybe MultiMessageCodec would describe the purpose of this interface better.

This comment has been minimized.

@bernd

bernd Jul 17, 2015
Author Member

Aye, thanks!

@@ -45,6 +46,7 @@ public MessageEvent newInstance()

private RawMessage raw;
private Message msg;

This comment has been minimized.

@joschi

joschi Jul 16, 2015
Contributor

It would probably make sense to get rid of the single message field, express getMessage() and setMessage() as delegates to getMessageList() and setMessageList(), and mark the singular methods as deprecated.

This comment has been minimized.

@bernd

bernd Jul 17, 2015
Author Member

I kept that because the OutputBuffer is only using the single message field and does not need a list of messages.

This comment has been minimized.

@joschi

joschi Jul 17, 2015
Contributor

I think that's a disaster waiting to happen. Either msg or msgList is not null but never both. So you have to know exactly, which one has been set before or always check both.

This comment has been minimized.

@bernd

bernd Jul 17, 2015
Author Member

Maybe we should just have two different MessageEvent classes. One for the ProcessBuffer with a collection of messages and one for the OutputBuffer with a single message.

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015
Member

That might make sense, but let's wait a little before making that change.
If message filters eventually want to synthesize messages on their own, they need a place to inject those, too.

bernd added 2 commits Jul 17, 2015
- Use the more generic Collection instead of List in MessageEvent.
- Rename getMessageList/setMessageList methods to getMessages/setMessages.
- Adjust JavaDoc for setRaw in MessageEvent.
- Rename MessageListCodec to MultiMessageCodec.
@bernd
Copy link
Member Author

@bernd bernd commented Jul 17, 2015

Updated after comments.


public interface MultiMessageCodec extends Codec {
@Nullable
List<Message> decodeMessages(@Nonnull RawMessage rawMessage);

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015
Member

Also Collection<Message> as in the event class?

This comment has been minimized.

@bernd

bernd Jul 17, 2015
Author Member

Oupts, good catch.

@bernd
Copy link
Member Author

@bernd bernd commented Jul 17, 2015

Updated after comments.

@@ -76,17 +81,20 @@ public void onEvent(MessageEvent event, long sequence, boolean endOfBatch) throw
try {
// always set the result of processMessage, even if it is null, to avoid later stages to process old messages.
// basically this will make sure old messages are cleared out early.
event.setMessage(processMessage(event.getRaw()));
event.setMessages(processMessage(event.getRaw()));

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015
Member

Changing it this way will force two new object allocations for each message (and single messages are the norm).
Could we refactor this so that we don't have to do those two allocations and simply change the ProcessBufferProcessor#onEvent to figure out whether to invoke the MessageFilter more than once?

messages = ((MultiMessageCodec) codec).decodeMessages(raw);
} else {
final Message message = codec.decode(raw);
messages = message == null ? null : Collections.singletonList(message);

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015
Member

first allocation (see above comment)

return null;
}

final List<Message> processedMessages = Lists.newArrayListWithCapacity(messages.size());

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015
Member

second allocation


for (final Message message : messageList) {

This comment has been minimized.

@kroepke

kroepke Jul 17, 2015
Member

this would simply change into if (event.getMsg() == null) { for ...} else {dispatchMessage(event.getMsg())}, right?

This adds a few more null checks but avoids collection allocation and
iteration in a few places.
@bernd bernd changed the title Add MessageListCodec interface Add MultiMessageCodec interface Jul 20, 2015
@bernd
Copy link
Member Author

@bernd bernd commented Jul 20, 2015

Updated to avoid collection allocations where possible.

@joschi
Copy link
Contributor

@joschi joschi commented Jul 20, 2015

LGTM. 👍

@joschi joschi added this to the 1.2.0 milestone Jul 20, 2015
@joschi joschi self-assigned this Jul 20, 2015
joschi added a commit that referenced this pull request Jul 20, 2015
Add MultiMessageCodec interface
@joschi joschi merged commit bbd12e0 into master Jul 20, 2015
2 checks passed
2 checks passed
@garybot2
ci Jenkins build graylog2-server-integration-pr 66 has succeeded
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@joschi joschi deleted the feature-message-list-codec-v2 branch Jul 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants