Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix constructions of sed filter in corner cases #83

Merged
merged 4 commits into from
Dec 16, 2021

Conversation

coldtobi
Copy link
Contributor

@coldtobi coldtobi commented Dec 13, 2021

Some tests use sed to generate predictable outputs and employ sed files for
that. In some circumstances, the generated sed file syntax is incorrect, if
the subsitions data contains the sed expresssion seperator. The patch escapes
the data if needed. The bug manifested in (local) autopkgtest runs, when the
temporary directory contained the separator ("Q").

https://issues.apache.org/jira/browse/LOGCXX-543

Some tests use sed to generate predictable outputs and employ sed files for
that.  In some circumstances, the generated sed file syntax is incorrect, if
the subsitions data contains the sed expresssion seperator. The patch escapes
the data if needed.  The bug manifested in (local) autopkgtest runs, when the
temporary directory contained the separator ("Q").
src/test/cpp/util/transformer.cpp Outdated Show resolved Hide resolved
src/test/cpp/util/transformer.cpp Outdated Show resolved Hide resolved
src/test/cpp/util/transformer.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@ams-tschoening ams-tschoening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things look good and tests still work on my Windows.


return ret;
};

for (log4cxx::Filter::PatternList::const_iterator iter = patterns.begin();
iter != patterns.end();
iter++)
{
tmp = "sQ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own thoughts: This results in something like sQ...Q...Qg, which is pretty much the same like s/.../.../g or s!...!...!g or s#...#...#g etc. The important point is that at least the delimiter needs to be escaped when available in the input, like was the case for this PR.

OTOH, in theory each and every character would need to be escaped because it might have special reg exp meaning? SED documents things like & and \1 already. Though, might not be that important in this context, as it's only about tests with the only variable input being some paths in the file system.

Modern versions of SED support the argument --regexp-extended, which should support \Q...\E to escape arbitrary input. So something like sQ\Q...\EQ\Q...\EQg should work in the future as well.

https://www.boost.org/doc/libs/1_50_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html#boost_regex.syntax.basic_extended.quoting_escape

Though, a different delimiter should be chosen than to keep things readable. But on my system I have a version of SED using UnxUtils for Windows which doesn't support that. Instead, I would need to use SED bundled with GIT, but sadly wasn't able to get that work as expected as well. So am simply keeping things as they are right now.

C:\Program Files (x86)\UnxUtils\usr\local\wbin

vs.

C:\Program Files\Git\usr\bin

ghpr_83_full_escape.diff.txt

Copy link
Contributor Author

@coldtobi coldtobi Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this PR is only focusing on the delimiter only -- to fix the test issues I saw when packaging...

Regarding your sed: Did you try the switch -E instead? --regexp-extend is not POSIX, but -E would be, so maybe it understands only that switch (TIL: the feature is indeed POSIX; before, I thought this was not a portable feature)

Of course, if wanted, I can extend sedSanitizer to escape more characters, so that it -E (aka --rexep-extended is not needed), (Edit: Won't work for the same reason as in the last paragraph)
or look into boost::regexp to avoid the invocation to sed alltogether. Though this feels for me as overkill just for the tests (having to have boost I mean)

Though, a different delimiter should be chosen than to keep things readable.

I agree :)

ghpr_83_full_escape.diff.txt

Unconditonally wrapping won't work.. sed will treat them as literal, so any regexp in the strings will be NOPs...
So possibly the RegExps in patternlayouttest.cpp would be needed to be tweaked instead...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested and --regexp-extend is supported by my GIT-SED. Though, it seems I have missed again that regular expression are used and need to be supported, making my thoughts somewhat useless. :-) So let's simply keep things as they are now and change when needed as needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, if wanted, I can extend sedSanitizer to escape more characters, so that it -E (aka --rexep-extended is not needed), (Edit: Won't work for the same reason as in the last paragraph) or look into boost::regexp to avoid the invocation to sed alltogether. Though this feels for me as overkill just for the tests (having to have boost I mean)

FWIW since we're now on C++11, we have std::regex, which is basically the same as boost::regexp. If your compiler doesn't support C++17, boost is currently required anyway.

Copy link
Contributor Author

@coldtobi coldtobi Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gosh, I thought std::regex was C++17 ... I stand corrected.

@ams-tschoening ams-tschoening merged commit 4db6209 into apache:master Dec 16, 2021
@coldtobi coldtobi deleted the LOGCXX-543_SedHappyness branch December 16, 2021 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants