Add support for new operator rxGlobal #2396

martinhsv · 2020-09-09T04:00:12Z

No description provided.

dune73 · 2020-09-09T08:33:32Z

Hey @martinhsv, you did not provide a description. But is this the operator that is going to do global matching after the standard rx operator returns to the previous behavior with the limit to the matches?

The name pretty much speaks for itself, but I just want to confirm what I am deducing.

martinhsv · 2020-09-09T13:46:56Z

Yes, that is correct. In June the rx operator was adjusted to do only a single full match, then exit.

This new rxGlobal operator is meant to support the use case where one wishes to do multiple full captures from some content. Although such use cases are expected to be rare, it was felt that it would be beneficial for rule writers to have a tool available to enable emulation of pcre's '/g' (global) behaviour.

dune73 · 2020-09-09T13:58:42Z

Thank you for the confirmation. I certainly like more options with rule writing.

zimmerle · 2020-09-10T12:23:32Z

Related issues: #2394, #2336, #2393, #2348, and #2012.

zimmerle · 2020-09-11T13:00:15Z

The SecRule language tests cases were update to support this new operator.

owasp-modsecurity/secrules-language-tests#9

zimmerle · 2020-09-11T13:20:50Z

Little modifications on the pull request:

removed updates on the TX test case.
updated test sub-module (Add support for new operator rxGlobal (secrule tests) secrules-language-tests#9).
updated the generated parser files.

I have it running on our QA as: v3/dev/pull_2396
Waiting for the QA results to continue the merging.

Merging this may imply in more work on: #2012. The other two patches #2394, and #2336 needs to be visited as well. They all somewhat tackle regex support.

mirkodziadzka-avi · 2020-09-16T12:23:49Z

Some questions / proposal:

I think it would make sense to limit the number of repeated searches here. Maybe to 10 as default and changeable by a control-action - if the user wants more. An unlimited search will create the same situation as in ModSecurity, Regular Expressions and Disputed CVE-2020-15598 #2401 again.
Can you please share an example for a use case for this new operator?

martinhsv · 2020-09-16T14:08:33Z

Hi @mirkodziadzka-avi ,

Having global matching within the standard rx operator had several disadvantages, including:

the rx operator has a great many applications for which single-match-then-exit is sufficient (and global match confers no additional benefit)
the cases where global matching is beneficial are expected to be considerably more rare
there was no alternative whereby one could specify that one wanted single-match-then-exit (albeit one could craft rules to specifically consume the remainder of the string beyond the intended match)
for many rule writers, the global matching functionality would be unexpected, and hence they might not take that implementation detail into account when writing rules

Part of the thinking here is that we will now have a separate operator that does not impact the simpler, and far more common use cases that can be handled with rx. The guidance will be that rule writers should use rx rather than rxGlobal, unless they actually need the more extensive capture functionality provided by the latter operator. In other words, a more narrowly used operator, where rule writers have a greater awareness of the consequences that go along with greater power, can make any remaining risk reasonable.

All that said, your suggestion about having a configurable limit on the maximum number of repeated matches done by a 'global' match is still a reasonable thing to think about. However, since ModSecurity already has quite a few knobs and switches that seem to be little used, some points that I would consider to make sure that a new configuration wouldn't merely add to clutter:

the rxGlobal operator's global matching functionality is particularly useful when the number of matches is non-small. If one is trying to do an unknown (but small) number of matches within a string, then one could use the ordinary rx operator but with pattern nesting for the possible 2nd and 3rd matches (if up to 3 matches are expected).
in many cases where one has a compelling use case for rxGlobal, it's probable that the rule writer will have a better idea of what is a reasonable maximum size for the variable in question, in which case an accompanying rule can be written to limit length

Again, that's not to say that your suggestion shouldn't be considered -- merely that there are some reasons to give pause. Input from the community on questions like this is invaluable

In terms of the use cases where rxGlobal might be useful, we recently saw a use case where a particular POST argument had parseable content (parseable by a regex but not by ModSecurity's builtin parsers), where individual parsed items could be detected as SQL injection via detectSQLi -- but the POST argument as a whole could not be so detected.

More broadly, any time you have substrings that you can capture in a regex, but where it's useful for the substrings themselves to be examined by a non-regex operator (for example detectSQLi, detectXSS, ... perhaps pmFromFile, ...), then the rxGlobal operator could be used to capture substrings into TX variables, and then examine those variables against the non-regex operator.

But, as emphasized earlier, these cases are expected to be rare.

mirkodziadzka-avi · 2020-09-16T14:47:22Z

Hi @martinhsv

Thanks for the answer

the rxGlobal operator's global matching functionality is particularly useful when the number of matches is non-small. If one is trying to do an unknown (but small) number of matches within a string, then one could use the ordinary rx operator but with pattern nesting for the possible 2nd and 3rd matches (if up to 3 matches are expected).

I agree that this is possible, but it make the regex even more unreadable. Especially if you need more than 3 possible elements.

In terms of the use cases where rxGlobal might be useful, we recently saw a use case where a particular POST argument had parseable content (parseable by a regex but not by ModSecurity's builtin parsers), where individual parsed items could be detected as SQL injection via detectSQLi -- but the POST argument as a whole could not be so detected.

So, the idea is to use a single regex as a body parser and use TX:/\d+/ instead of ARGS in further rules?

I see the use case for your example, so thanks for pointing it out.

Just curious about your opinion: Wouldn't it be more useful (as a future project) to allow writing body parsers in Lua which then would populate ARGS?

zimmerle · 2020-09-16T16:21:39Z

Just curious about your opinion: Wouldn't it be more useful (as a future project) to allow writing body parsers in Lua which then would populate ARGS?

TX and ENV collections can be populated by Lua. The first will keep the results in memory till the end of request, second will share info among the same process and lasts till deleted.

martinhsv · 2020-09-16T16:23:46Z

So, the idea is to use a single regex as a body parser and use TX:/\d+/ instead of ARGS in further rules?

Yes, that's the basic idea.

zimmerle · 2020-10-21T11:56:14Z

v3/dev/pull_2396 branch was rebase on v3/master. Having it on QA one more time.

zimmerle · 2020-10-23T17:56:47Z

Having the test case on the test-cases execution list -
https://github.com/SpiderLabs/ModSecurity/blob/acd158f8f0df1c344e459a8c5ca6b3c95e1125c9/Makefile.am#L173-L175

On QA again.

airween

Some thoughts on the return value of pcre_exec.

airween · 2020-10-24T14:21:13Z

src/utils/regex.cc

+                }
+                prev_match_zero_length = false;
+            } else {
+                // normal case; no match on most recent scan (with options=0).  We are done.


Just my 2 cents...

Here you check the value of rc is greater than 0 or not - if not (eg. if it's definitely 0) you treat it that it's not matched.

Referring to the PCRE API, the value what you're looking for is -1.

I'm sure the chances are very-very-very small to get the value 0 (considering the value of the OVECTOR, which is 900), but may be here would be good to make some error handling. "no match on most recent" is only when the value of rc is -1. If it's 0, then it means the ovector out of space. And there many more different error - which also have a very-very slim chance.

Just a few thoughts...

Hi @airween ,

Thanks for the observation; it is at least something for us to have considered.

As you note, the OVECTOR is enormous. It seems incredibly unlikely that anyone would compose a regex pattern with 300 capture groups (keeping in mind that repeating capture groups like '(sub-expression)*' only count as one).

It is true that the functionality implication here is that if this use case occurs, it will be treated the same as 'no match'. Note, that this is also the case for the standard @rx operator in v3.x -- both before and after the changes made this past summer, for example.

We may well want to make some adjustments for this edge case going forward. Whether we want to change main behaviour in this case might be debatable, but we should at least produce some kind of debug message. However, I think it would make more sense to adjust all relevant occurrences of this behaviour at the same time (including the much more important 'rx' operator), rather than adjusting this pull request in isolation.

For those reasons, my inclination here is to leave the current pull request as is.

Thanks again.

zimmerle · 2020-10-26T11:59:32Z

src/utils/regex.cc

+    const char *subject = s.c_str();
+
+    bool prev_match_zero_length = false;
+    int pcre_options = 0;


int pcre_options scope could be reduce. Having it inside the while loop -

https://github.com/SpiderLabs/ModSecurity/blob/2672db103e511cbf0866efc62dc7a3b4f43a8657/src/utils/regex.cc#L139-L147

zimmerle · 2020-10-26T19:57:42Z

Merged thanks!

Add support for new operator rxGlobal

290d003

zimmerle self-requested a review September 10, 2020 12:18

zimmerle self-assigned this Sep 10, 2020

zimmerle added the 3.x Related to ModSecurity version 3.x label Sep 10, 2020

zimmerle added this to In progress in v3.1.0 via automation Sep 10, 2020

zimmerle mentioned this pull request Sep 11, 2020

Add support for new operator rxGlobal (secrule tests) owasp-modsecurity/secrules-language-tests#9

Closed

zimmerle moved this from In progress to Needs review in v3.1.0 Oct 14, 2020

This was referenced Oct 21, 2020

Add Regex::searchAllMatches method, remove searchAll #2394

Closed

Fix Regex::searchAll behaviour wrt empty capturing groups #2393

Closed

airween reviewed Oct 24, 2020

View reviewed changes

zimmerle reviewed Oct 26, 2020

View reviewed changes

zimmerle closed this Oct 26, 2020

v3.1.0 automation moved this from Needs review to Done Oct 26, 2020

airween mentioned this pull request Feb 8, 2021

How can we avoid ReDoS without trust on PCRE limits #2072

Closed

liudongmiao mentioned this pull request Dec 22, 2021

"Make error" on centos 6 #2528

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for new operator rxGlobal #2396

Add support for new operator rxGlobal #2396

martinhsv commented Sep 9, 2020

dune73 commented Sep 9, 2020

martinhsv commented Sep 9, 2020

dune73 commented Sep 9, 2020

zimmerle commented Sep 10, 2020 •

edited

Loading

zimmerle commented Sep 11, 2020 •

edited

Loading

zimmerle commented Sep 11, 2020 •

edited

Loading

mirkodziadzka-avi commented Sep 16, 2020

martinhsv commented Sep 16, 2020

mirkodziadzka-avi commented Sep 16, 2020

zimmerle commented Sep 16, 2020 •

edited

Loading

martinhsv commented Sep 16, 2020

zimmerle commented Oct 21, 2020 •

edited

Loading

zimmerle commented Oct 23, 2020

airween left a comment

airween Oct 24, 2020

martinhsv Oct 26, 2020 •

edited

Loading

zimmerle Oct 26, 2020

zimmerle commented Oct 26, 2020

Add support for new operator rxGlobal #2396

Add support for new operator rxGlobal #2396

Conversation

martinhsv commented Sep 9, 2020

dune73 commented Sep 9, 2020

martinhsv commented Sep 9, 2020

dune73 commented Sep 9, 2020

zimmerle commented Sep 10, 2020 • edited Loading

zimmerle commented Sep 11, 2020 • edited Loading

zimmerle commented Sep 11, 2020 • edited Loading

mirkodziadzka-avi commented Sep 16, 2020

martinhsv commented Sep 16, 2020

mirkodziadzka-avi commented Sep 16, 2020

zimmerle commented Sep 16, 2020 • edited Loading

martinhsv commented Sep 16, 2020

zimmerle commented Oct 21, 2020 • edited Loading

zimmerle commented Oct 23, 2020

airween left a comment

Choose a reason for hiding this comment

airween Oct 24, 2020

Choose a reason for hiding this comment

martinhsv Oct 26, 2020 • edited Loading

Choose a reason for hiding this comment

zimmerle Oct 26, 2020

Choose a reason for hiding this comment

zimmerle commented Oct 26, 2020

zimmerle commented Sep 10, 2020 •

edited

Loading

zimmerle commented Sep 11, 2020 •

edited

Loading

zimmerle commented Sep 11, 2020 •

edited

Loading

zimmerle commented Sep 16, 2020 •

edited

Loading

zimmerle commented Oct 21, 2020 •

edited

Loading

martinhsv Oct 26, 2020 •

edited

Loading