Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add filename based sorting to pdbtool merge in a similar vein to what run-parts does #294

Closed
faxm0dem opened this issue Nov 6, 2014 · 15 comments · Fixed by #2664
Closed
Labels

Comments

@faxm0dem
Copy link
Contributor

faxm0dem commented Nov 6, 2014

https://bugzilla.balabit.com/show_bug.cgi?id=211

This issue is still open and IMHO very important.
The use case I have is when merging a pam_unix ruleset which matches PROGRAM==sshd and an openssh ruleset matching PROGRAM==sshd there are unpredictable results unless both rulesets are in the same file.

@bazsi
Copy link
Collaborator

bazsi commented Nov 11, 2014

I've tried to reproduce the issue in the original bugzilla ticket, but so far without success. db-parser() tries to find the longest match if there are multiple rules with the same prefix.

this is what I've done with the PostgreSQL sample:

patterndb:

<patterndb version='3' pub_date='2010-02-22'>

 <ruleset name='testset' id='1'>
   <patterns>
     <pattern>pgsql</pattern>
   </patterns>
   <rule provider='test' id='11' class='system'>
     <patterns>
       <pattern>xlog: backup pg_xlog/@SET:xid:0123456789ABCDEF@</pattern>
     </patterns>
   </rule>
 </ruleset>
 <ruleset name='testset' id='2'>
   <patterns>
     <pattern>pgsql</pattern>
   </patterns>
   <rule provider='test' id='12' class='system'>
     <patterns>
       <pattern>xlog: backup pg_xlog/@SET:xid:0123456789ABCDEF@ failed</pattern>
     </patterns>
   </rule>
 </ruleset>

</patterndb>

It always matched the longer pattern, regardless of the order of the ruleset blocks. I've tested it via:

$ pdbtool match -p pgsql.xml  -M 'xlog: backup pg_xlog/000000010000014700000076 failed' -P pgsql

@bazsi
Copy link
Collaborator

bazsi commented Nov 11, 2014

I'll check the ssh example.

@bazsi
Copy link
Collaborator

bazsi commented Nov 11, 2014

Do you have an sshd sample message that I should test with?

@bazsi
Copy link
Collaborator

bazsi commented Nov 11, 2014

I think I've found a sample message:

          <pattern>Accepted @ESTRING:usracct.authmethod: @for @ESTRING:usracct.username: @from @ESTRING:usracct.device: @port @ESTRING:: @@ANYSTRING:usracct.service@</pattern>

patterndb would merge these patterns iff the variable extraction is the same. If they are different, then the order in which the patterns were loaded is significant.

I think we could (and probably should) report this as an error and refuse to load the patterns or explicitly ignore the second and subsequent conflicting rules.

What do you think?

@bazsi
Copy link
Collaborator

bazsi commented Nov 12, 2014

I was giving this issue a bit more thought, and I think. I came up with a way to solve this issue.

The problem is caused by the fact the patterndb doesn't do an exhaustive search of the rules, but rather it tries to locate the first one to match.

However rules can overlap, sometimes two distinct rules use different parsers to match the same input. This complicates things, as this is not immediately obvious.

Also, dbparser is willing to accept a partial match, if the rule matches the prefix of the message.

The problem is at its worst, when two rules collide (with a different pattern), and at the same time the first of these is partial. I would expect that the longer pattern prevails, but that only happens currently if the prefix of the two rules are completely identical.

The solution I came up with is twofold:

  1. to always prefer a complete match over a partial match.
    This slows down dbparser a bit, but more specific rules will clearly be preferred.
  2. To add conflict search capabilities to pdbtool test.

Pdbtool test would do an exhaustive search of all potentially matches, and warn if there are more than one.

What do you think?

@faxm0dem
Copy link
Contributor Author

What is a partial match? do you have an example?

@bazsi
Copy link
Collaborator

bazsi commented Nov 12, 2014

A partial match is when the message is longer than the pattern.

It happens when dbparser feels there are no further rules to process (for a
longer match), but in reality there might be other rules, which use
slightly different parsers.

That's why prioritizing complete matches help, as you can always add a more
specific rule.
On Nov 12, 2014 1:26 PM, "Fabien Wernli" notifications@github.com wrote:

What is a partial match? do you have an example?


Reply to this email directly or view it on GitHub
#294 (comment).

@faxm0dem
Copy link
Contributor Author

you mean longer than in bytes?
Sorry, still confused - do you have an example?

@bazsi
Copy link
Collaborator

bazsi commented Nov 12, 2014

Sorry, I was typing my last entry from my phone, and that's why it was so concise. Sorry about that.

If you have this as a pattern:

   <pattern>This is a message</pattern>

Then if you have a message, which has the pattern above as a prefix would match, as long as there's no better rule.

msg="This is a message with a tail"

Would match the rule above, unless you have a more specific rule. The issue is that the radix tree behind patterndb may not be perfectly right whether there's a more specific rule.

As long as the prefix of two rules are completely the same, patterndb would properly merge them internally:

   <pattern>This is a @ESTRING:foo: @</pattern>
   <pattern>This is a @ESTRING:foo: @with a tail</pattern>

In this case, db-parser would always match the 2nd rule, as the ESTRING parser is merged and they fork out two two alternatives with the literal string following the parser node.

However, if any of the parameters of the parsers differ, for instance the name of the name-value pair, they wouldn't get merged:

   <pattern>This is a @ESTRING:foo1: @</pattern>
   <pattern>This is a @ESTRING:foo2: @with a tail</pattern>

This would already fork into two branches at the parser nodes, and db-parser iterates those sequentially. In this case the first rule would match (partially, as the "with a tail" portion is not in the rule, only its prefix matches), however db-parser accepts this as a hit.

With the change I propose (and which is implemented on the branch f/patterndb-prefer-complete-match), db-parser would always aim for a complete match first, and only if that is not found does it fall back to partial matching.

Is this clearer now?
Thanks for the feedback.

@faxm0dem
Copy link
Contributor Author

yes, much clearer thanks for taking the time :-)
What's still not clear in my mind is how rulesets relate to one another. You mention "rule merging". Does that happen independantly of the rulesets?

@bazsi
Copy link
Collaborator

bazsi commented Nov 12, 2014

It happens in the data structure behind db-parser. It's a radix tree:

http://en.wikipedia.org/wiki/Radix_tree

Although the plain radix tree is extended with parser nodes that extract
information from the input and represent a series of characters.

Bazsi

On Wed, Nov 12, 2014 at 3:21 PM, Fabien Wernli notifications@github.com
wrote:

yes, much clearer thanks for taking the time :-)
What's still not clear in my mind is how rulesets relate to one another.
You mention "rule merging". Does that happen independantly of the rulesets?


Reply to this email directly or view it on GitHub
#294 (comment).

@ihrwein
Copy link
Contributor

ihrwein commented Mar 9, 2015

@faxm0dem do you have any questions? I'm checking the old pending issues.

@faxm0dem
Copy link
Contributor Author

faxm0dem commented Mar 9, 2015

sure @Baszi can you confirm this is solved using recent changes in pdb?

@bazsi
Copy link
Collaborator

bazsi commented Mar 10, 2015

Well, rule conflicts were improved a lot with a recent fix and the complete
match approach taken in db-parser. But I am afraid it is so much we can do,
if there are two independent rules in two separate files, the load order
will matter and only one of them will match (the first one).

I am afraid update-patterndb will not sort the input filenames which it
should, so you could at least prioritize one file over another by naming it
something else.

So we should probably file an issue to sort the input files in pdbtool
merge, it is not an issue if you are using a filesystem that implicitly
sorts filenames within directories (which it does for a number of
filesystems), however this is not the best thing to rely on.

I would suggest to rename this issue to or refile a new issue with the
title: "add filename based sorting to pdbtool merge in a simimlar vein to
what run-parts does"

Bazsi

On Mon, Mar 9, 2015 at 5:12 PM, Fabien Wernli notifications@github.com
wrote:

sure @Baszi can you confirm this is solved using recent changes in pdb?


Reply to this email directly or view it on GitHub
#294 (comment).

@faxm0dem
Copy link
Contributor Author

Fair enough

@faxm0dem faxm0dem changed the title dbparser pattern order breaks validation add filename based sorting to pdbtool merge in a simimlar vein to what run-parts does Mar 11, 2015
@faxm0dem faxm0dem changed the title add filename based sorting to pdbtool merge in a simimlar vein to what run-parts does add filename based sorting to pdbtool merge in a similar vein to what run-parts does Mar 11, 2015
@presidento presidento added the bug label Jun 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants