Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while trying to match a string with a specific unicode against a RegExp that contains a space and a group #48

Closed
beatzzz opened this issue Apr 1, 2022 · 3 comments

Comments

@beatzzz
Copy link

beatzzz commented Apr 1, 2022

When trying to match (phrase.MatchString(X)) messages like gg 󠀀 󠀀 (notice that these are not the regular spaces) against a phrase like regexp2.MustCompile("\\bcool (house)\\b", 0), the following error will be thrown:

panic: runtime error: index out of range [917504] with length 128

goroutine 1 [running]:
github.com/dlclark/regexp2/syntax.(*BmPrefix).Scan(0xc000180540, {0xc000b70948, 0x6, 0x0?}, 0x0?, 0x0, 0x6)
        C:/Users/X/go/pkg/mod/github.com/dlclark/regexp2@v1.4.0/syntax/prefix.go:716 +0x3bb
github.com/dlclark/regexp2.(*runner).findFirstChar(0xc000623a00)
        C:/Users/X/go/pkg/mod/github.com/dlclark/regexp2@v1.4.0/runner.go:1305 +0x366
github.com/dlclark/regexp2.(*runner).scan(0xc000623a00, {0xc000b70948?, 0x6, 0xc000b70948?}, 0x6?, 0x1, 0xc00008f8e8?)
        C:/Users/X/go/pkg/mod/github.com/dlclark/regexp2@v1.4.0/runner.go:130 +0x1e5
github.com/dlclark/regexp2.(*Regexp).run(0xc0000f6200, 0xf4?, 0xffffffffffffffff, {0xc000b70948, 0x6, 0x6})
        C:/Users/X/go/pkg/mod/github.com/dlclark/regexp2@v1.4.0/runner.go:91 +0xfa
github.com/dlclark/regexp2.(*Regexp).MatchString(0x10f9c40?, {0x108f0f4?, 0xc00008fb48?})
        C:/Users/X/go/pkg/mod/github.com/dlclark/regexp2@v1.4.0/regexp.go:213 +0x45
main.main()
        C:/Users/X/Desktop/GoRegExTests/test.go:127 +0xbdc

The error is only being thrown when:
a. The message contains those unicode characters
b. The RegExp contains a space and a group like (house)

The RegExp above is just a very basic example to demonstrate this problem.

@dlclark
Copy link
Owner

dlclark commented Apr 2, 2022

Unfortunately I'm not able to reproduce this via copy-paste. I suspect GitHub isn't preserving the exact unicode characters of your input. Can you give me the exact runes in question via the unicode escape sequences (e.g. \uabcd)?

@beatzzz
Copy link
Author

beatzzz commented Apr 2, 2022

Yes, you are right. GitHub modified one of the characters.

This is the original one: https://apps.timwhitlock.info/unicode/inspect?s=gg+%F3%A0%80%80+%F3%A0%80%80
(and this is the one copied from GitHub: https://apps.timwhitlock.info/unicode/inspect?s=gg+%F3%A0%80%80+)

@dlclark
Copy link
Owner

dlclark commented Apr 3, 2022

I've confirmed this is fixed with #36 so you just need to upgrade to v1.5 or v1.6 to get the fix.

@dlclark dlclark closed this as completed Apr 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants