New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix/feat: Macro expansions, error logs redundancy, support msg
/logdata
in inner rules
#792
Conversation
@@ -159,7 +171,7 @@ func (mr MatchedRule) AuditLog(code int) string { | |||
} else { | |||
log.WriteString("Coraza: Warning. ") | |||
} | |||
mr.matchData(matchData, log) | |||
mr.matchData(log, matchData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, swaps parameters order just to keep consistency across mr.matchData()
and mr.writeDetails()
@@ -239,12 +239,81 @@ func TestTagsAreNotPrintedTwice(t *testing.T) { | |||
} | |||
re := regexp.MustCompile(`\[tag "some1"\]`) | |||
for _, l := range logs { | |||
if len(re.FindAllString(l, -1)) < 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully get this test. The title (TestTagsAreNotPrintedTwice
) seems like that wants to avoid the same tag appearing more than once inside the logs, but then we are trying to find the same tag twice. Originally it was enforcing being just one, but it has been changed with https://github.com/corazawaf/coraza/pull/220/files#diff-090979aa58246451224457e41b61bf533048a2e7091724b5db0345c04f5ef561L248-R249.
Currently, the test is trying to match the following:
"[client \"\"] Coraza: Warning. [file \"\"] [line \"2\"] [id \"1\"] [rev \"\"] [msg \"\"] [data \"\"] [severity \"emergency\"] [ver \"\"] [maturity \"0\"] [accuracy \"0\"] [tag \"some1\"] [tag \"some2\"] [hostname \"\"] [uri \"\"] [unique_id \"xDFbZiZkbAmlAAKUNXc\"]\n[client \"\"] Coraza: Warning. [file \"\"] [line \"2\"] [id \"1\"] [rev \"\"] [msg \"\"] [data \"\"] [severity \"emergency\"] [ver \"\"] [maturity \"0\"] [accuracy \"0\"] [tag \"some1\"] [tag \"some2\"] [hostname \"\"] [uri \"\"] [unique_id \"xDFbZiZkbAmlAAKUNXc\"]\n"
And it seems that the ending part is very lengthy and not relevant (Current PR tries to avoid redundancy, therefore it does not print it):
[client \"\"] Coraza: Warning. [file \"\"] [line \"2\"] [id \"1\"] [rev \"\"] [msg \"\"] [data \"\"] [severity \"emergency\"] [ver \"\"] [maturity \"0\"] [accuracy \"0\"] [tag \"some1\"] [tag \"some2\"] [hostname \"\"] [uri \"\"] [unique_id \"xDFbZiZkbAmlAAKUNXc\"]\n"
cc. @piyushroshan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @jptosso
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## v3/dev #792 +/- ##
==========================================
+ Coverage 81.75% 81.84% +0.09%
==========================================
Files 156 156
Lines 8631 8663 +32
==========================================
+ Hits 7056 7090 +34
+ Misses 1343 1342 -1
+ Partials 232 231 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
@@ -11,3 +11,4 @@ testoverride: | |||
920420-8: 'HEAD request with data. Go/http does not allow it - 400 Bad Request' | |||
920430-5: 'Test has expect_error, Go/http and Envoy return 400' | |||
920430-8: 'Go/http does no allow HTTP/3.0 - 505 HTTP Version Not Supported' | |||
932200-13: 'wip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a very unfortunate combination of rules triggered, 932200-13
test is now failing. It is intended to check that the correct log has been printed, looking for the wrong one found within MATCHED_VAR
. The latter string is matched not inside the 932200
log, but inside a 942440
that is triggered alongside it. The latter match is legit, but it results to be a bit more verbose compared to Modsec MATCHED_VARS:ARGS:host
vs ARGS:host
, therefore the test is marked as failing. A tested workaround would be just looking for found within MATCHED_VAR:
instead of found within MATCHED_VAR
. I will evolve this discussion CRS side and update the exclusion comment with the related issue/pr.
This is awesome progress, and it shows that we are keeping and eye on details! Excellent work @M4tteoP ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some notes.
@@ -520,3 +520,64 @@ func TestTransformArgNoCacheForTXVariable(t *testing.T) { | |||
t.Errorf("Expected 0 transformations in cache, got %d. It is not expected to cache TX variable transformations", len(transformationCache)) | |||
} | |||
} | |||
|
|||
func TestCaptureNotPropagatedToInnerChainRule(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3
Added a few commits that:
|
internal/corazarules/rule_match.go
Outdated
if len(data) > maxSizeLogMessage { | ||
data = data[:maxSizeLogMessage] | ||
} | ||
log.WriteString(fmt.Sprintf("[msg_match_%d %q] [data_match_%d %q]", n, msg, n, data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to use printf with strings.Builder, I think you can still avoid the extra allocation with something like
fmt.Fprintf(log, "[msg_match_%d %q] [data_match_%d %q]", n, msg, n, data)
Applies throughout the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it's a bit surprising to be printing the same value n
twice
This PR aims to:
MATCHED_*
variables.capture
action propagation.Example 1: Rule 920170
Current Coraza:
Apache w/ModSec:
This PR:
920170
is a CRS chained rule with a first match ofREQUEST_METHOD "@rx ^(?:GET|HEAD)$"
, followed byREQUEST_HEADERS:Content-Length "!@rx ^0?$"
. TheMATCHED_VAR
is printed bylogdata
. Current Coraza expands the latter right after the parent rule execution, therefore[data "GET"]
is printed. ModSecurity behavior is different, expanding the macro at the of the whole rule. It leads to logging more meaningful data like the size of the body content. If the meaningful data that has to be printed is the matched variable of the parent rule, CRS already address it using some support variables and printing them (E.g. seeTX.932200_MATCHED_VAR_NAME
used by rule 932200.Example 2: Rule 920450 (Reported by Issue #570)
Current Coraza:
Apache w/ModSec:
This PR:
This rule matches with the parent rule multiple request headers names (
REQUEST_HEADERS_NAMES "@rx ^.*$"
). Because of this, currently we are just printing random header names, not the one that matched against the list of restricted headers (inner rule of the chain). Furthermore, it shows that we are iterating overmatched_data
generating a lot of verbosity.Example 3: Rule 920240
Current Coraza:
Apache w/ModSec:
This PR:
Iteration over
matched_data
that leads to verbosity in which I see not much value, is clearer in this example. Rule 920240 is a chained rule made of 3 total rules, therefore we expect 3 matched_data. Iterating over them leads to logging three times theCoraza: Warning
line, effectively printing some data only at the first line. (Just like previous examples, the printed data is also not very useful, not printing the real payload, but just that the request has been done withContent-Type:application/x-www-form-urlencoded
.Example 4: Chained rule with multiple
msg
/logdata
.Even if ModSecurity explicitly states that
metadata actions (id, rev, msg, tag, severity, logdata), skip, and skipAfter actions can be specified only by the chain starter rule.
, the aforementioned iteration over matched_data allowed to print multiplemsg
/logdata
. This "feature" was also escalated by engine/chains.go tests in which every inner rule is printing differentmsg
andlogdata
. This PR proposes to better support this feature by printing extramsg
/logdata
:Example rule:
Current Coraza:
This PR:
Example 5: Rules with
multiMatch
The
data_match_*
andmsg_match_*
affects also rules relying onmultiMatch
. This probably was the initial reason to iterate overmatched_data
.Example rule:
Current Coraza:
Apache w/ModSec:
This PR:
This is the only case where the current Coraza is more aligned with ModSecurity rather than this PR. If there are no strong concerns about this divergence in error logs, I feel that the PR solution:
Looking forward to your feedback!
Closes #570