Skip to content

Various disagreements between boost::regex and the JS spec #249

@Alcaro

Description

@Alcaro

I implemented a test suite for JS v3 regex syntax as applied to 8bit strings, applied it to the three std:: implementations, and found a lot of bugs. The same tests can also be run against Boost.

https://godbolt.org/z/Pq76Pe1cz (some of the comments only apply to the regex engine I wrote alongside the tests, feel free to ignore them)

List of behavioral differences
FAIL should compile [a-b-c-d]
FAIL should compile [a-b-c-d]
FAIL should compile []
FAIL should compile [^]
FAIL expected error a[A-]]c
FAIL expected error ]
FAIL expected error \c+
FAIL expected error [a-\w]
FAIL \Z :: Z :: ret=(fail) :: exp=Z
FAIL expected error \x0
FAIL [\W\D] :: a :: ret=(fail) :: exp=a
FAIL expected error \00
FAIL expected error \01
FAIL expected error [\01]
FAIL expected error [\00]
FAIL expected error [\0-\00]
FAIL expected error a++
FAIL expected error a{,5}
FAIL expected error {
FAIL expected error }
FAIL expected error a{
FAIL expected error a{1
FAIL expected error a{1,
FAIL (abc)?\1 ::  :: ret=(fail) :: exp=/(null)
FAIL (abc)?\1 :: abc :: ret=(fail) :: exp=/(null)
FAIL (?:(a)|b)\1c :: bc :: ret=(fail) :: exp=bc/(null)
FAIL (?:a|(b))\1c :: ac :: ret=(fail) :: exp=ac/(null)
FAIL (a?){0,2}\1b :: b :: ret=b/ :: exp=b/(null)
FAIL (a?){0,2}\1b :: ab :: ret=ab/ :: exp=(fail)
FAIL (a?){0,5} :: aa :: ret=aa/ :: exp=aa/a
FAIL (?:(a)|b)\1 :: b :: ret=(fail) :: exp=b/(null)
FAIL (?:(a)b|aa)\1 :: aaa :: ret=(fail) :: exp=aa/(null)
FAIL expected error (a)b\01c
FAIL ((a)|(b))+ :: ab :: ret=ab/b/a/b :: exp=ab/b/(null)/b
FAIL ((a)|(b))+ :: ba :: ret=ba/a/a/b :: exp=ba/a/a/(null)
FAIL ((a)|(b)){2} :: ab :: ret=ab/b/a/b :: exp=ab/b/(null)/b
FAIL ((a)|(b)){2} :: ba :: ret=ba/a/a/b :: exp=ba/a/a/(null)
FAIL ((a)\2|(b)\3){2} :: aabb :: ret=aabb/bb/a/b :: exp=aabb/bb/(null)/b
FAIL ((a)\2|(b)\3){2} :: bbaa :: ret=bbaa/aa/a/b :: exp=bbaa/aa/a/(null)
FAIL (?:(a)|(b)|(c)|(d))+ :: abcd :: ret=abcd/a/b/c/d :: exp=abcd/(null)/(null)/(null)/d
FAIL (?:(a)|(b)|(c)|(d))+c :: abc :: ret=abc/a/b/(null)/(null) :: exp=abc/(null)/b/(null)/(null)
FAIL (a?)* ::  :: ret=/ :: exp=/(null)
FAIL (a?){2,4}b\1c :: aabc :: ret=aabc/ :: exp=(fail)
FAIL ((b)?(be)?){0,5}bbee :: bbebbee :: ret=bbebbee//b/be :: exp=bbebbee/bbe/b/be
FAIL ^(ab|()|a|bc)*$ :: abc :: ret=abc// :: exp=abc/bc/(null)
FAIL (?!(?!(a))) :: a :: ret=/a :: exp=/(null)
FAIL expected error (?=a)+
FAIL (?!(a))\1 :: b :: ret=(fail) :: exp=/(null)
FAIL (?:(a)|\1b)+ :: aabbaab :: ret=aa/a :: exp=aabbaab/(null)
FAIL (?:(a)|\1b)+ :: baabbaab :: ret=(fail) :: exp=baabbaab/(null)
FAIL should compile (?=)a
FAIL should compile (?![])
FAIL should compile (?![])
FAIL should compile (?![^])
FAIL should compile (?![^])

I can't find a clear specification for what regex syntax Boost tries to implement (https://www.boost.org/doc/libs/1_87_0/libs/regex/doc/html/boost_regex/ref/syntax_option_type/syntax_option_type_perl.html says it's supposed to match JS, but I can't find which version, and https://www.boost.org/doc/libs/1_87_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html lists several Perl-only features).

As such, I don't know which of the above divergences are bugs, and which are expected; feel free to consider this a meta-bug, and close it once the above have been filed as separate bugs as appropriate. (For example, ] is explicitly called out as legal in docs, the two [a-b-c-d] are the same bug, and [\W\D] is #241.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions