-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix constructions of sed filter in corner cases #83
Fix constructions of sed filter in corner cases #83
Conversation
Some tests use sed to generate predictable outputs and employ sed files for that. In some circumstances, the generated sed file syntax is incorrect, if the subsitions data contains the sed expresssion seperator. The patch escapes the data if needed. The bug manifested in (local) autopkgtest runs, when the temporary directory contained the separator ("Q").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things look good and tests still work on my Windows.
|
||
return ret; | ||
}; | ||
|
||
for (log4cxx::Filter::PatternList::const_iterator iter = patterns.begin(); | ||
iter != patterns.end(); | ||
iter++) | ||
{ | ||
tmp = "sQ"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own thoughts: This results in something like sQ...Q...Qg
, which is pretty much the same like s/.../.../g
or s!...!...!g
or s#...#...#g
etc. The important point is that at least the delimiter needs to be escaped when available in the input, like was the case for this PR.
OTOH, in theory each and every character would need to be escaped because it might have special reg exp meaning? SED documents things like &
and \1
already. Though, might not be that important in this context, as it's only about tests with the only variable input being some paths in the file system.
Modern versions of SED support the argument --regexp-extended
, which should support \Q...\E
to escape arbitrary input. So something like sQ\Q...\EQ\Q...\EQg
should work in the future as well.
Though, a different delimiter should be chosen than to keep things readable. But on my system I have a version of SED using UnxUtils
for Windows which doesn't support that. Instead, I would need to use SED bundled with GIT, but sadly wasn't able to get that work as expected as well. So am simply keeping things as they are right now.
C:\Program Files (x86)\UnxUtils\usr\local\wbin
vs.
C:\Program Files\Git\usr\bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this PR is only focusing on the delimiter only -- to fix the test issues I saw when packaging...
Regarding your sed: Did you try the switch -E
instead? --regexp-extend
is not POSIX, but -E
would be, so maybe it understands only that switch (TIL: the feature is indeed POSIX; before, I thought this was not a portable feature)
Of course, if wanted, I can extend sedSanitizer to escape more characters, so that it (Edit: Won't work for the same reason as in the last paragraph)-E
(aka --rexep-extended
is not needed),
or look into boost::regexp to avoid the invocation to sed alltogether. Though this feels for me as overkill just for the tests (having to have boost I mean)
Though, a different delimiter should be chosen than to keep things readable.
I agree :)
ghpr_83_full_escape.diff.txt
Unconditonally wrapping won't work.. sed will treat them as literal, so any regexp in the strings will be NOPs...
So possibly the RegExps in patternlayouttest.cpp would be needed to be tweaked instead...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested and --regexp-extend
is supported by my GIT-SED. Though, it seems I have missed again that regular expression are used and need to be supported, making my thoughts somewhat useless. :-) So let's simply keep things as they are now and change when needed as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, if wanted, I can extend sedSanitizer to escape more characters, so that it
-E
(aka--rexep-extended
is not needed), (Edit: Won't work for the same reason as in the last paragraph) or look into boost::regexp to avoid the invocation to sed alltogether. Though this feels for me as overkill just for the tests (having to have boost I mean)
FWIW since we're now on C++11, we have std::regex
, which is basically the same as boost::regexp
. If your compiler doesn't support C++17, boost is currently required anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gosh, I thought std::regex
was C++17 ... I stand corrected.
Some tests use sed to generate predictable outputs and employ sed files for
that. In some circumstances, the generated sed file syntax is incorrect, if
the subsitions data contains the sed expresssion seperator. The patch escapes
the data if needed. The bug manifested in (local) autopkgtest runs, when the
temporary directory contained the separator ("Q").
https://issues.apache.org/jira/browse/LOGCXX-543