Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drools applies not to all messages, if more than 1 processor enabled #2119

Closed
hc4 opened this issue Apr 21, 2016 · 16 comments
Closed

Drools applies not to all messages, if more than 1 processor enabled #2119

hc4 opened this issue Apr 21, 2016 · 16 comments
Assignees
Labels
Milestone

Comments

@hc4
Copy link
Contributor

hc4 commented Apr 21, 2016

Problem description

If I set processbuffer_processors to more than 1, Drools will process not every message

Steps to reproduce the problem

  1. Set processbuffer_processors in server.conf to 3 (for example)
  2. Create simple Drools rule, which will just create new field for every message:
    rule "Test Rule" when m : Message() then m.addField("test", "test_value"); end
  3. Push some messages
  4. Not all messages will have new field

Environment

  • Graylog Version:2.0 beta3
  • Elasticsearch Version:2.3.1
@kroepke
Copy link
Member

kroepke commented Apr 21, 2016

I cannot reproduce this issue, just tried with exactly that setup on current master and everything looks fine.
Are there any exceptions in your log?

@hc4
Copy link
Contributor Author

hc4 commented Apr 21, 2016

No errors in log.
Maybe the problem is in BeatsInput plugin, which I'm currently using.
Also I use TopBeat as data source (it generates hundreds of messages in single packet)
Also I have one extractor, which I can not edit - will try to remove it and reproduce problem.

@kroepke kroepke self-assigned this Apr 21, 2016
@hc4
Copy link
Contributor Author

hc4 commented Apr 21, 2016

Removed all extractors and disconnected all pipelines. Still no luck.
I've attached csv export of saved messages.
Time is differend for processed and unprocessed messages, because I also change timestamp by Drools (actually it could be problem maybe)

graylog-search-result-relative-300.txt

@kroepke
Copy link
Member

kroepke commented Apr 21, 2016

The input plugin should not have anything to do with it, however, it is also not built for Graylog 2.0 yet, so there might be incompatibilities.

Can you try to reproduce this issue with the random http message generator input?

@hc4
Copy link
Contributor Author

hc4 commented Apr 21, 2016

Checked with random HTTP generator and it works.
Maybe the problem with BeatsTransport?

@kroepke
Copy link
Member

kroepke commented Apr 21, 2016

Ok, then it sounds like there's an issue with the beat input.

It could very well be that some fields have a . in them? Those are no longer compatible with elasticsearch and the Graylog behavior changed after beta.3, iirc: 3953af2

@hc4
Copy link
Contributor Author

hc4 commented Apr 21, 2016

I fixed that by changing all dots to "_" in input.
Is it possible, that Input incorrectly pushes messages to queue?

@kroepke
Copy link
Member

kroepke commented Apr 21, 2016

All messages from an input are processed in the same manner, I don't know the beats input plugin that well, but there's usually no way to avoid processing them.
With the _ is the issue resolved?

@hc4
Copy link
Contributor Author

hc4 commented Apr 21, 2016

No. I've faced dot problem on first Graylog 2.0 start and fixed it in first place

@kroepke kroepke added this to the 2.1.0 milestone Apr 25, 2016
@hc4
Copy link
Contributor Author

hc4 commented May 4, 2016

created log with trace level while reproducing problem (GA version still have this bug)
Also attached CSV export. Rows with topbeat_proc_cpu_system field are bad (don't know how to export GUID to CSV)

gaylog_bug_trace.txt
gaylog_bug_data.txt

For example.
good message: ad48ae13-121a-11e6-b5bc-005056b37533
bad message: ad3eea10-121a-11e6-b5bc-005056b37533

@hc4
Copy link
Contributor Author

hc4 commented May 4, 2016

Reproduced with Random HTML generator

Generator settings:

sleep: 1
override_source:
source: example.org
sleep_deviation: 2
throttling_allowed:

Rule:

rule "Test Rule"
    when
        m : Message( true )
    then
        m.addField("test", "1");
end

image

@ghost
Copy link

ghost commented May 12, 2016

I can also confirm that drool rules are not applied to all messages in Graylog 2.0.0.

  • Graylog server 2.0.0 (2dc6c03)
  • The issue did not exist on 1.x.
  • Both SyslogUDP and GELF UDP inputs are affected (that's all of the inputs we're using Drool rules with)
  • We do not use the pipeline processor. The system was just upgraded from a 1.x install.
  • @hc4 's observation that setting processbuffer_processors=1 works around the issue is correct.

@kroepke
Copy link
Member

kroepke commented May 13, 2016

I've ran a longer test to try to reproduce this and once in a while it looks like Drools itself is not matching any rules:

2016-05-13 13:00:47,581 ERROR: org.graylog2.rules.DroolsRulesSession - Should have fired rules! Thread 30

This happens across all processor threads (I am running three in this case), without an obvious pattern.

The weird thing about this issue is, that nothing about Drools has changed since 1.2, in fact nothing has changed since July 2014. I'll keep digging, though.

@hc4
Copy link
Contributor Author

hc4 commented May 13, 2016

Could you check if RulesFilter are singleton, or created per processor thread? (it is hard for me to check without IDE )
I think, that RulesFilter is not thread safe and must be called only from single thread (because Drools are not thread safe itself).

Also i found, that extractors processed before drools according to that code:

        Multibinder<MessageFilter> messageFilters = Multibinder.newSetBinder(binder(), MessageFilter.class);
        messageFilters.addBinding().to(StaticFieldFilter.class);
        messageFilters.addBinding().to(ExtractorFilter.class);
        messageFilters.addBinding().to(RulesFilter.class);
        messageFilters.addBinding().to(StreamMatcherFilter.class);

Could it be made configurable throught "Message Processors Configuration" page?

@kroepke
Copy link
Member

kroepke commented May 13, 2016

I have found the issue and will push a fix shortly.
It is because the OrderedMessageProcessors was a singleton, but shouldn't be. Previously each thread was injected its own copy, which is how it was supposed to, but afterwards, the instances were shared.

All filters can cope with that, except Drools. Its sessions need to be isolated.

We will not be making the filters configurable, because that interface will be deprecated soon. Instead the Drools support will move into a plugin in an upcoming version and implement MessageProcessor instead.

kroepke added a commit that referenced this issue May 13, 2016
When introducing the MessageProcessor interface, the processing threads accidentally shared the instances (and by induction the MessageFilter instances as well).
That posed no problem for most of the filters, because they do not rely on shared state, but the Drools filter does and could skip messages (because of Drools itself returning early)

This change uses a Provider to get the OrderedMessageProcessor instances explicitly and those do not get shared across threads.

fixes #2119
fixes #2188
@kroepke
Copy link
Member

kroepke commented May 13, 2016

Also to avoid confusion, the binding code quoted above does not influence the execution order, filters are ordered by a numeric priority which is hardcoded.

@kroepke kroepke modified the milestones: 2.0.1, 2.1.0, 2.0.2 May 13, 2016
joschi pushed a commit that referenced this issue May 17, 2016
When introducing the MessageProcessor interface, the processing threads accidentally shared the instances (and by induction the MessageFilter instances as well).
That posed no problem for most of the filters, because they do not rely on shared state, but the Drools filter does and could skip messages (because of Drools itself returning early)

This change uses a Provider to get the OrderedMessageProcessor instances explicitly and those do not get shared across threads.

Fixes #2119, fixes #2188
@bernd bernd closed this as completed in ca59a00 May 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants