Skip to content
This repository has been archived by the owner on Aug 26, 2020. It is now read-only.

Checking for the absence of \ before a replacement isn't robust #6

Open
DaGenix opened this issue Dec 4, 2019 · 1 comment
Open

Comments

@DaGenix
Copy link

DaGenix commented Dec 4, 2019

The character class \w is replaced with [a-zA-Z] to restrict matches to ASCII. The code that does that replacement is:

pattern = re.sub(r"(?<!\\)" + re.escape(esc), repl=replacement, string=pattern)

That code will replace any \w it finds with [a-zA-Z], unless the \w is preceded by another backslash - as in \\w - in which case it won't do the replacement as this is equivalent to a literal backslash followed by a literal "w". However, this code fails to do the replacement for \\\w which is a literal backslash followed by the \w character class and should be translated into \\[a-zA-Z].

Example:

>>> compile(r'\w').search('\u1234')
>>> compile(r'\\\w').search('\\\u1234')
<_sre.SRE_Match object; span=(0, 2), match='\\ሴ'>
@Zac-HD
Copy link
Owner

Zac-HD commented Dec 4, 2019

Yep... this is similar to #4 in that the only solution is to actually parse the regex and operate on that, rather than the text.

It's definitely possible, but not likely to happen given that the jsonschema standard doesn't appear to really care about regex compatibility - I'd rather build a linter for incompatible syntax than a (much more difficult) translator.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants