Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for new operator rxGlobal #2396

Closed
wants to merge 1 commit into from
Closed

Add support for new operator rxGlobal #2396

wants to merge 1 commit into from

Conversation

martinhsv
Copy link
Contributor

No description provided.

@dune73
Copy link
Member

dune73 commented Sep 9, 2020

Hey @martinhsv, you did not provide a description. But is this the operator that is going to do global matching after the standard rx operator returns to the previous behavior with the limit to the matches?

The name pretty much speaks for itself, but I just want to confirm what I am deducing.

@martinhsv
Copy link
Contributor Author

Yes, that is correct. In June the rx operator was adjusted to do only a single full match, then exit.

This new rxGlobal operator is meant to support the use case where one wishes to do multiple full captures from some content. Although such use cases are expected to be rare, it was felt that it would be beneficial for rule writers to have a tool available to enable emulation of pcre's '/g' (global) behaviour.

@dune73
Copy link
Member

dune73 commented Sep 9, 2020

Thank you for the confirmation. I certainly like more options with rule writing.

@zimmerle zimmerle self-requested a review September 10, 2020 12:18
@zimmerle zimmerle self-assigned this Sep 10, 2020
@zimmerle zimmerle added the 3.x Related to ModSecurity version 3.x label Sep 10, 2020
@zimmerle zimmerle added this to In progress in v3.1.0 via automation Sep 10, 2020
@zimmerle
Copy link
Contributor

zimmerle commented Sep 10, 2020

Related issues: #2394, #2336, #2393, #2348, and #2012.

@zimmerle
Copy link
Contributor

zimmerle commented Sep 11, 2020

The SecRule language tests cases were update to support this new operator.

owasp-modsecurity/secrules-language-tests#9

@zimmerle
Copy link
Contributor

zimmerle commented Sep 11, 2020

Little modifications on the pull request:

I have it running on our QA as: v3/dev/pull_2396
Waiting for the QA results to continue the merging.

Merging this may imply in more work on: #2012. The other two patches #2394, and #2336 needs to be visited as well. They all somewhat tackle regex support.

@mirkodziadzka-avi
Copy link
Contributor

Some questions / proposal:

  • I think it would make sense to limit the number of repeated searches here. Maybe to 10 as default and changeable by a control-action - if the user wants more. An unlimited search will create the same situation as in ModSecurity, Regular Expressions and Disputed CVE-2020-15598 #2401 again.
  • Can you please share an example for a use case for this new operator?

@martinhsv
Copy link
Contributor Author

Hi @mirkodziadzka-avi ,

Having global matching within the standard rx operator had several disadvantages, including:

  • the rx operator has a great many applications for which single-match-then-exit is sufficient (and global match confers no additional benefit)
  • the cases where global matching is beneficial are expected to be considerably more rare
  • there was no alternative whereby one could specify that one wanted single-match-then-exit (albeit one could craft rules to specifically consume the remainder of the string beyond the intended match)
  • for many rule writers, the global matching functionality would be unexpected, and hence they might not take that implementation detail into account when writing rules

Part of the thinking here is that we will now have a separate operator that does not impact the simpler, and far more common use cases that can be handled with rx. The guidance will be that rule writers should use rx rather than rxGlobal, unless they actually need the more extensive capture functionality provided by the latter operator. In other words, a more narrowly used operator, where rule writers have a greater awareness of the consequences that go along with greater power, can make any remaining risk reasonable.

All that said, your suggestion about having a configurable limit on the maximum number of repeated matches done by a 'global' match is still a reasonable thing to think about. However, since ModSecurity already has quite a few knobs and switches that seem to be little used, some points that I would consider to make sure that a new configuration wouldn't merely add to clutter:

  • the rxGlobal operator's global matching functionality is particularly useful when the number of matches is non-small. If one is trying to do an unknown (but small) number of matches within a string, then one could use the ordinary rx operator but with pattern nesting for the possible 2nd and 3rd matches (if up to 3 matches are expected).
  • in many cases where one has a compelling use case for rxGlobal, it's probable that the rule writer will have a better idea of what is a reasonable maximum size for the variable in question, in which case an accompanying rule can be written to limit length

Again, that's not to say that your suggestion shouldn't be considered -- merely that there are some reasons to give pause. Input from the community on questions like this is invaluable

In terms of the use cases where rxGlobal might be useful, we recently saw a use case where a particular POST argument had parseable content (parseable by a regex but not by ModSecurity's builtin parsers), where individual parsed items could be detected as SQL injection via detectSQLi -- but the POST argument as a whole could not be so detected.

More broadly, any time you have substrings that you can capture in a regex, but where it's useful for the substrings themselves to be examined by a non-regex operator (for example detectSQLi, detectXSS, ... perhaps pmFromFile, ...), then the rxGlobal operator could be used to capture substrings into TX variables, and then examine those variables against the non-regex operator.

But, as emphasized earlier, these cases are expected to be rare.

@mirkodziadzka-avi
Copy link
Contributor

Hi @martinhsv

Thanks for the answer

  • the rxGlobal operator's global matching functionality is particularly useful when the number of matches is non-small. If one is trying to do an unknown (but small) number of matches within a string, then one could use the ordinary rx operator but with pattern nesting for the possible 2nd and 3rd matches (if up to 3 matches are expected).

I agree that this is possible, but it make the regex even more unreadable. Especially if you need more than 3 possible elements.

In terms of the use cases where rxGlobal might be useful, we recently saw a use case where a particular POST argument had parseable content (parseable by a regex but not by ModSecurity's builtin parsers), where individual parsed items could be detected as SQL injection via detectSQLi -- but the POST argument as a whole could not be so detected.

So, the idea is to use a single regex as a body parser and use TX:/\d+/ instead of ARGS in further rules?

I see the use case for your example, so thanks for pointing it out.

Just curious about your opinion: Wouldn't it be more useful (as a future project) to allow writing body parsers in Lua which then would populate ARGS?

@zimmerle
Copy link
Contributor

zimmerle commented Sep 16, 2020

Just curious about your opinion: Wouldn't it be more useful (as a future project) to allow writing body parsers in Lua which then would populate ARGS?

TX and ENV collections can be populated by Lua. The first will keep the results in memory till the end of request, second will share info among the same process and lasts till deleted.

@martinhsv
Copy link
Contributor Author

So, the idea is to use a single regex as a body parser and use TX:/\d+/ instead of ARGS in further rules?

Yes, that's the basic idea.

@zimmerle zimmerle moved this from In progress to Needs review in v3.1.0 Oct 14, 2020
@zimmerle
Copy link
Contributor

zimmerle commented Oct 21, 2020

v3/dev/pull_2396 branch was rebase on v3/master. Having it on QA one more time.

@zimmerle
Copy link
Contributor

Having the test case on the test-cases execution list -
https://github.com/SpiderLabs/ModSecurity/blob/acd158f8f0df1c344e459a8c5ca6b3c95e1125c9/Makefile.am#L173-L175

On QA again.

Copy link
Member

@airween airween left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on the return value of pcre_exec.

}
prev_match_zero_length = false;
} else {
// normal case; no match on most recent scan (with options=0). We are done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2 cents...

Here you check the value of rc is greater than 0 or not - if not (eg. if it's definitely 0) you treat it that it's not matched.

Referring to the PCRE API, the value what you're looking for is -1.

I'm sure the chances are very-very-very small to get the value 0 (considering the value of the OVECTOR, which is 900), but may be here would be good to make some error handling. "no match on most recent" is only when the value of rc is -1. If it's 0, then it means the ovector out of space. And there many more different error - which also have a very-very slim chance.

Just a few thoughts...

Copy link
Contributor Author

@martinhsv martinhsv Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @airween ,

Thanks for the observation; it is at least something for us to have considered.

As you note, the OVECTOR is enormous. It seems incredibly unlikely that anyone would compose a regex pattern with 300 capture groups (keeping in mind that repeating capture groups like '(sub-expression)*' only count as one).

It is true that the functionality implication here is that if this use case occurs, it will be treated the same as 'no match'. Note, that this is also the case for the standard @rx operator in v3.x -- both before and after the changes made this past summer, for example.

We may well want to make some adjustments for this edge case going forward. Whether we want to change main behaviour in this case might be debatable, but we should at least produce some kind of debug message. However, I think it would make more sense to adjust all relevant occurrences of this behaviour at the same time (including the much more important 'rx' operator), rather than adjusting this pull request in isolation.

For those reasons, my inclination here is to leave the current pull request as is.

Thanks again.

const char *subject = s.c_str();

bool prev_match_zero_length = false;
int pcre_options = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zimmerle
Copy link
Contributor

Merged thanks!

@zimmerle zimmerle closed this Oct 26, 2020
v3.1.0 automation moved this from Needs review to Done Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x Related to ModSecurity version 3.x
Projects
v3.1.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants