-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Inconsistent std::regex_replace results on x64 Linux and aarch64 Android #1911
Comments
BTW, this issue is not specific to wchar_t. When I keep the source string to UTF8 encoded std::string and modify everything else accordingly, the results are still inconsistent, only in a different way: the source string has some illegal UTF8 bytes inserted besides removing the trailing spaces. I can provide the source code to reproduce if needed. |
So far I'm not seeing a difference between arm64 and x86_64 behavior. On both a P and an Sv2 emulator, I see this:
That does seem wrong though? I reduced it to:
I see some comments in http://eel.is/c++draft/re about a "zero-length match", so I'm guessing these test cases have defined behavior, and maybe there's a libc++ bug here. |
Prichard, Thanks for the try. Actually you got consistent but incorrect results. They are incorrect because the first 3 bytes (e0 b8 81) should not be removed according to the regular expression "^\s*|\s*$" which matches the leading or trailing spaces while, e0 b8 81 are the encoding for the Thai character 'ก'. |
I reported the issue to LLVM, llvm/llvm-project#64451. |
Upstream issue was fixed in llvm/llvm-project#94550. Will try to cherry-pick to the next prebuilt drop into r27. |
Description
The code to reproduce the issue:
https://gist.github.com/zheng-yu-yang/a225cc68350ae828cf68b2591730871c
With NDK r25c/r26b1 x64, the output is:
where the 2 trailing spaces (20 20) are removed from the source content.
With NDK r25c/r26b1 aarch64, the output is:
where the trailing 2 spaces (20, 20), as well as the first 3 bytes (e0 b8 81) from the source content are removed.
I did not use any building system but manually compiled the source code to static ELF binary.
clang++ regex_test.cpp -o regex_test -static -std=c++11
org++ regex_test.cpp -o regex_test -static -std=c++11
I also tried cross gcc (arm-linux-gnueabihf-g++, 11.4.0) and native gcc (g++, 11.4.0), and the outputs are the same (only trailing spaces were removed).
I run the test program on MI MAX 3 (Android 9) and Ubuntu 22.04 in WSL.
Affected versions
r25, r26
Canary version
No response
Host OS
Linux
Host OS version
Ubuntu 22.04 in WSL
Affected ABIs
arm64-v8a
Build system
Other (specify below)
Other build system
manual build from bash command line
minSdkVersion
30
Device API level
28
The text was updated successfully, but these errors were encountered: