Skip to content

Conversation

NWilson
Copy link
Member

@NWilson NWilson commented Oct 6, 2025

Fixes #736

Comment on lines 180 to 185
if (rc >= 0 &&
(ovector[0] < start_offset || ovector[0] > ovector[1]) &&
(re->extra_options & PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK) == 0)
{
rc = PCRE2_ERROR_BAD_BACKSLASH_K;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zherczeg I have done something bad here I think. The code in this file is a tiny wrapper around the JIT, so it feels wrong to implement this check as a post-processing step here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. This should be done before the pattern returns with a valid match (and PCRE2_HASBSK is set, and PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK is not set).

Comment on lines 13775 to 13819
/* Fail if we detect that the start position was moved to be either after
the end position (\K in lookahead) or before the start offset (\K in
lookbehind). */

if (common->has_set_som &&
(common->re->extra_options & PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK) == 0)
{
struct sljit_jump *bad_som;
struct sljit_jump *bad_eom;

// XXX emit code equivalent to the following:
// if (OVECTOR(0) < jit_arguments->str ||
// OVECTOR(0) > OVECTOR(1))
// {
// return PCRE2_ERROR_BAD_BACKSLASH_K;
// }

OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(0));
OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(1));
if (HAS_VIRTUAL_REGISTERS)
{
OP1(SLJIT_MOV, TMP3, 0, ARGUMENTS, 0);
OP1(SLJIT_MOV, TMP3, 0, SLJIT_MEM1(TMP3), SLJIT_OFFSETOF(jit_arguments, str));
}
else
{
OP1(SLJIT_MOV, TMP3, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, str));
}

// Compare if OVECTOR(0) < jit_arguments->str
bad_som = CMP(SLJIT_LESS, TMP1, 0, TMP3, 0);
// Compare if OVECTOR(0) > OVECTOR(1)
bad_eom = CMP(SLJIT_GREATER, TMP1, 0, TMP2, 0);

// If either comparison is true, return error and jump to abort
OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_BAD_BACKSLASH_K);
JUMPTO(SLJIT_JUMP, common->abort_label);

// Patch the jumps to skip the error if the checks pass
JUMPHERE(bad_som);
JUMPHERE(bad_eom);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT is still very new to me; I don't really understand how it all works yet.

I've made an attempt at some code here. It crashes, but I'm trying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have fixed it so that it doesn't crash.

However, the results of the comparisons don't seem correct.

How should I debug this code? Is there any way at all to set breakpoints, step through...? I can obviously step through the entire massive blob of code in assembler, but I have no idea how I'd find the few instructions that I added here.

@NWilson NWilson force-pushed the user/niwilson/block-bad-bsk branch from 59ef11f to 1621949 Compare October 7, 2025 11:14
}

/* Fail if we detect that the start position was moved to be either after
the end position (\K in lookahead) or before the start offset (\K in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the operation.

Why start offset is a problem? I think the only issue is when ovector(0) > ovector(1), which confuses some simple implementations. Nobody complained about startoffset before.

Do this happens when \K is executed, or as a post check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Start offset is a problem, because when you are doing "search all matches" you don't want matches to overlap or go "backwards". The canonical list of matches should be ordered, non-overlapping, and without duplicates. I believe that many clients will expect this. For example, pcre2_substitute itself fails if the list of matches is overlapping.
  • The checks are done as a post check, after a match is accepted (much, much later than when \K is encountered). These checks do not cause backtracking: it simply turns an accepted match into an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

\K in lookbehind/lookahead should be always invalid
2 participants