Skip to content

Conversation

@zherczeg
Copy link
Collaborator

No description provided.


/(a)(b+)(*scs:(1)a(*ACCEPT))(\2)/
abbb
0: abb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this the current/expected output in 10.45 and HEAD?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This patch is a bugfix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not being clear enough, but the point I was trying to make is that the test case doesn't show the bug, at least on my system:

PCRE2 version 10.45 2025-02-05 (8-bit)
  re> /(a)(b+)(*scs:(1)a(*ACCEPT))(\2)/BI
------------------------------------------------------------------
        Bra
        CBra 1
        a
        Ket
        CBra 2
        b+
        Ket
        Scan substring
      1 Capture ref
        a
        *ASSERT_ACCEPT
        Ket
        CBra 3
        \2
        Ket
        Ket
        End
------------------------------------------------------------------
Capture group count = 3
Max back reference = 2
First code unit = 'a'
Subject length lower bound = 1
data> abbb
 0: abb
 1: a
 2: b
 3: b

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. The (PCRE2_SIZE)(mb->end_subject - eptr) < length in line 488 is 18446744073709551613 < length and does not true. A better test case is needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test should be better. Thank you for noticing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are arguing over semantics, but yes, you are correct that this PR alone solves both bugs, including the subject over reads in match_ref that could cause crashes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for confirming! I will include only this fix in the 10.46 security release, in order to make minimal changes in this release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw why not 10.45.1? The increase of major usually means new features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presume that moving to a 3 digit version number might be a good idea if we are going to do this regularly, but definitely would had been a bigger change and taken a lot longer than this took.

I had to admit I am impressed that it went so smoothly, including a CVE number being assigned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Changing the format of the version number to two digits has the potential to be disruptive to downstream consumers. Plus, we have API functions that expect a two-digit number, and the special VERSION pattern syntax...

It simply had to be 10.46.

Anyway, thanks to both of you for fixing this (Zoltan) and reviewing (Carlo).

@carenas
Copy link
Contributor

carenas commented Aug 26, 2025

Not a regression, as it seems to behave the same with or without this patch, but why is this correct?

  re> /(.+)(*scs:(1)\d{2}|\d{4}(*ACCEPT))a/BI
------------------------------------------------------------------
        Bra
        CBra 1
        Any+
        Ket
        Scan substring
      1 Capture ref
        \d{2}
        Alt
        \d{4}
        *ASSERT_ACCEPT
        Ket
        a
        Ket
        End
------------------------------------------------------------------
Capture group count = 1
Max back reference = 1
Subject length lower bound = 0
data> 12a
 0: 12a
 1: 12
data> 1234
No match
data> 1234a
 0: 1234a
 1: 1234
data> 12345a
 0: 12345a
 1: 12345

I was expecting 1234 to match because of the (*ACCEPT) but if (*ACCEPT) makes 1234a to match, I can't explain the last one

@zherczeg
Copy link
Collaborator Author

The last one matches because of backtracking. (.+) matches 12345a first, but there is no match, so it reverts the last character and matches 12345. Then the scs matches \d{2} and the a matches at the end.

@carenas
Copy link
Contributor

carenas commented Aug 26, 2025

Got it; do you have an example with (*ACCEPT) being relevant?

@zherczeg
Copy link
Collaborator Author

  /(a+)b(*scs:(1)(*ACCEPT))\1/
    aaabaa

Probably something like this. There will be a buffer overread, although if the next character in the buffer is not a it will not be visible.

@NWilson
Copy link
Member

NWilson commented Aug 27, 2025

I have decided I will make a dedicated security release, and notify the mailing list.

For a widely-used library such as PCRE2, I would like to take this seriously.

I should have acted on it sooner. That is my fault.

@NWilson NWilson merged commit 936feaa into PCRE2Project:master Aug 28, 2025
43 of 45 checks passed
@zherczeg zherczeg deleted the restore_fix branch September 16, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants