Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8proc 2.8.0 does not support new grapheme-break rules in Unicode 15.1.0 #252

Closed
kloczek opened this issue Oct 1, 2023 · 10 comments · Fixed by #253
Closed

utf8proc 2.8.0 does not support new grapheme-break rules in Unicode 15.1.0 #252

kloczek opened this issue Oct 1, 2023 · 10 comments · Fixed by #253

Comments

@kloczek
Copy link

kloczek commented Oct 1, 2023

Looks like latest 2.8.0 is failing in utf8proc.testgraphemetest unit

+ cd utf8proc-2.8.0
+ /usr/bin/ctest --test-dir x86_64-redhat-linux-gnu --output-on-failure --force-new-ctest-process -j48
Internal ctest changing into directory: /home/tkloczko/rpmbuild/BUILD/utf8proc-2.8.0/x86_64-redhat-linux-gnu
Test project /home/tkloczko/rpmbuild/BUILD/utf8proc-2.8.0/x86_64-redhat-linux-gnu
    Start 1: utf8proc.testcase
    Start 2: utf8proc.testcustom
    Start 3: utf8proc.testiterate
    Start 4: utf8proc.testmisc
    Start 5: utf8proc.testprintproperty
    Start 6: utf8proc.testvalid
    Start 7: utf8proc.testcharwidth
    Start 8: utf8proc.testgraphemetest
    Start 9: utf8proc.testnormtest
1/9 Test #2: utf8proc.testcustom ..............   Passed    0.01 sec
2/9 Test #3: utf8proc.testiterate .............   Passed    0.01 sec
3/9 Test #4: utf8proc.testmisc ................   Passed    0.01 sec
4/9 Test #5: utf8proc.testprintproperty .......   Passed    0.01 sec
5/9 Test #6: utf8proc.testvalid ...............   Passed    0.01 sec
6/9 Test #8: utf8proc.testgraphemetest ........***Failed    0.00 sec
line 1202: grapheme mismatch: "/क्/त" instead of "/क्त"
checking line 100...
checking line 200...
checking line 300...
checking line 400...
checking line 500...
checking line 600...
checking line 700...
checking line 800...
checking line 900...
checking line 1000...
checking line 1100...
checking line 1200...

7/9 Test #1: utf8proc.testcase ................   Passed    0.05 sec
8/9 Test #7: utf8proc.testcharwidth ...........   Passed    0.06 sec
9/9 Test #9: utf8proc.testnormtest ............   Passed    0.16 sec

89% tests passed, 1 tests failed out of 9

Total Test time (real) =   0.17 sec

The following tests FAILED:
          8 - utf8proc.testgraphemetest (Failed)
Errors while running CTest
@stevengj
Copy link
Member

stevengj commented Oct 2, 2023

I just downloaded utf8proc-2.8.0.tar.gz and

make check

works for me. I also tried the cmake build with:

mkdir build
cmake -S . -B build -DUTF8PROC_ENABLE_TESTING=ON
cmake --build build
ctest --test-dir build -V

and it works too.

@kloczek
Copy link
Author

kloczek commented Oct 2, 2023

OK so what I can try to do to diagnose this issue? 🤔

@stevengj
Copy link
Member

stevengj commented Oct 2, 2023

Try the exact commands that I used above on a fresh tarball to see if you still reproduce the issue.

@kloczek
Copy link
Author

kloczek commented Oct 2, 2023

May I ask for a little hel what exactly I can try to execute? 🤔

FYI: I'm building rpm package. During the build tar ball is automatically downloaded from the URL specified in rpm spec file in build env created to build only ONE package inside LXC zone in which are installed ONLY packages listed in rpm BuildRequires.
Issue is 100% reproduceable,

@jbicha
Copy link

jbicha commented Oct 6, 2023

There was a new Unicode 15.1.0 release and utf8proc needs to be updated for it.

http://blog.unicode.org/2023/09/announcing-unicode-standard-version-151.html

@stevengj
Copy link
Member

stevengj commented Oct 6, 2023

There was a new Unicode 15.1.0 release and utf8proc needs to be updated for it.

The tests should download the test files for the supported version of Unicode (version 15), however?

@cdluminate
Copy link

Same issue on Debian Sid. The unit tests fails after updating the code for unicode 15.1.0 . The grapheme tests on which the tests failed are newly added, and did not exist in the previous version 15.0.0

@stevengj
Copy link
Member

Same issue on Debian Sid. The unit tests fails after updating the code for unicode 15.1.0 . The grapheme tests on which the tests failed are newly added, and did not exist in the previous version 15.0.0

The current version of utf8proc will not work if you simply update the build scripts to use Unicode 15.1.0, even if you re-generate the data tables. It is only compatible with Unicode 15.

The difficulty is that they updated the grapheme rules in Unicode 15.1.0, adding a new rule GB9c that relies on a new Indic_Conjunct_Break property. Implementing this new rule will require a new field in our data table as well as new code.

I'm working on an update now that will support the new Unicode 15.1.0 rules, but it is an error to expect utf8proc 2.8 (or any other library written for Unicode 15) to pass the Unicode 15.1 grapheme tests.

@stevengj stevengj changed the title 2.8.0: test suite is failing in utf8proc.testgraphemetest unit utf8proc 2.8.0 does not support new grapheme-break rules in Unicode 15.1.0 Oct 18, 2023
@cdluminate
Copy link

cdluminate commented Oct 18, 2023

I'm working on an update now that will support the new Unicode 15.1.0 rules, but it is an error to expect utf8proc 2.8 (or any other library written for Unicode 15) to pass the Unicode 15.1 grapheme tests.

Thanks! This is a very valuable comment to me. My fellow developers were asking me when I will upload the patch. Now it seems I'd better wait for the fix.

@stevengj
Copy link
Member

Support for Unicode 15.1 is merged into master, and should be tagged as a new 2.9.0 release shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants