-
Notifications
You must be signed in to change notification settings - Fork 235
Add runtime checks for invalid uses of \K in lookaround #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
src/pcre2_jit_match_inc.h
Outdated
if (rc >= 0 && | ||
(ovector[0] < start_offset || ovector[0] > ovector[1]) && | ||
(re->extra_options & PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK) == 0) | ||
{ | ||
rc = PCRE2_ERROR_BAD_BACKSLASH_K; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zherczeg I have done something bad here I think. The code in this file is a tiny wrapper around the JIT, so it feels wrong to implement this check as a post-processing step here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. This should be done before the pattern returns with a valid match (and PCRE2_HASBSK is set, and PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK is not set).
/* Fail if we detect that the start position was moved to be either after | ||
the end position (\K in lookahead) or before the start offset (\K in | ||
lookbehind). */ | ||
|
||
if (common->has_set_som && | ||
(common->re->extra_options & PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK) == 0) | ||
{ | ||
struct sljit_jump *bad_som; | ||
struct sljit_jump *bad_eom; | ||
|
||
// XXX emit code equivalent to the following: | ||
// if (OVECTOR(0) < jit_arguments->str || | ||
// OVECTOR(0) > OVECTOR(1)) | ||
// { | ||
// return PCRE2_ERROR_BAD_BACKSLASH_K; | ||
// } | ||
|
||
OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(0)); | ||
OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(1)); | ||
if (HAS_VIRTUAL_REGISTERS) | ||
{ | ||
OP1(SLJIT_MOV, TMP3, 0, ARGUMENTS, 0); | ||
OP1(SLJIT_MOV, TMP3, 0, SLJIT_MEM1(TMP3), SLJIT_OFFSETOF(jit_arguments, str)); | ||
} | ||
else | ||
{ | ||
OP1(SLJIT_MOV, TMP3, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, str)); | ||
} | ||
|
||
// Compare if OVECTOR(0) < jit_arguments->str | ||
bad_som = CMP(SLJIT_LESS, TMP1, 0, TMP3, 0); | ||
// Compare if OVECTOR(0) > OVECTOR(1) | ||
bad_eom = CMP(SLJIT_GREATER, TMP1, 0, TMP2, 0); | ||
|
||
// If either comparison is true, return error and jump to abort | ||
OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_BAD_BACKSLASH_K); | ||
JUMPTO(SLJIT_JUMP, common->abort_label); | ||
|
||
// Patch the jumps to skip the error if the checks pass | ||
JUMPHERE(bad_som); | ||
JUMPHERE(bad_eom); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT is still very new to me; I don't really understand how it all works yet.
I've made an attempt at some code here. It crashes, but I'm trying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have fixed it so that it doesn't crash.
However, the results of the comparisons don't seem correct.
How should I debug this code? Is there any way at all to set breakpoints, step through...? I can obviously step through the entire massive blob of code in assembler, but I have no idea how I'd find the few instructions that I added here.
59ef11f
to
1621949
Compare
} | ||
|
||
/* Fail if we detect that the start position was moved to be either after | ||
the end position (\K in lookahead) or before the start offset (\K in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the operation.
Why start offset is a problem? I think the only issue is when ovector(0) > ovector(1), which confuses some simple implementations. Nobody complained about startoffset before.
Do this happens when \K is executed, or as a post check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Start offset is a problem, because when you are doing "search all matches" you don't want matches to overlap or go "backwards". The canonical list of matches should be ordered, non-overlapping, and without duplicates. I believe that many clients will expect this. For example, pcre2_substitute itself fails if the list of matches is overlapping.
- The checks are done as a post check, after a match is accepted (much, much later than when \K is encountered). These checks do not cause backtracking: it simply turns an accepted match into an error.
Fixes #736