-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix a bug regarding non-combining <LV,T> in hangul shaper #10
Conversation
Well, perhaps you can tell us what we were getting before the patch also. And with what font for example. Also, can you contribute a test file with various interesting syllables, one per line? |
Ok, I believe I've fixed this properly now. Please test. And test file is still appreciated! |
Thanks a lot. I works quite well.
Previously the result of input string |
Thanks. But can you please produce a more comprehensive list of syllables? Something around 20 syllables, with combining and non-combining jamo, with precomposed and decomposed combinations, etc? I can probably do that myself too, but I suppose it's much easier for you. Just listing the Unicode values is also fine. You can use hb/test/shaping/hb-unicode-encode to turn that into UTF-8 for testing. |
Thanks for the information about the encoding/decoding utils.
This is a working example of Old Hangul text. First line contains several precomposed syllables and two non-combining jamos; second line contains many decomposed syllables, combining and non-combining. The result is good except cases like this: At first line, a precomposed syllable followed by a tone mark |
Humm, doesn't make sense that we decomposed CE58. I'll check it out. |
Also, what I was asking for test data is not two lines of running text, but carefully constructed list of all different syllable types. I guess I have to build that myself. |
Fixed. Thanks. |
I'll work on tone mark reordering soon. |
Fixes this -fno-sanitize-recover=undefined check, /buffer/positions/empty: hb-buffer.cc:327:11: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:60:62: note: nonnull attribute specified here #0 0x4cf31c in hb_buffer_t::clear_positions() /home/user/code/harfbuzz/src/hb-buffer.cc:327:3 #1 0x4d4dd4 in hb_buffer_get_glyph_positions /home/user/code/harfbuzz/src/hb-buffer.cc:1418:13 #2 0x4cb553 in test_buffer_positions /home/user/code/harfbuzz/test/api/test-buffer.c:305:3 harfbuzz#3 0x7f324187bf49 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72f49) harfbuzz#4 0x7f324187be7a (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72e7a) harfbuzz#5 0x7f324187be7a (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72e7a) harfbuzz#6 0x7f324187c121 in g_test_run_suite (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x73121) harfbuzz#7 0x7f324187c140 in g_test_run (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x73140) harfbuzz#8 0x4c8bd3 in hb_test_run /home/user/code/harfbuzz/test/api/./hb-test.h:88:10 harfbuzz#9 0x4c8bd3 in main /home/user/code/harfbuzz/test/api/test-buffer.c:884:10 harfbuzz#10 0x7f324086db96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 harfbuzz#11 0x41e919 in _start (/home/user/code/harfbuzz/test/api/test-buffer+0x41e919)
Fixes this -fno-sanitize-recover=undefined fail, /set/iter: hb-algs.hh:1016:60: runtime error: index 4294967295 out of bounds for type 'unsigned long long const[8]' #0 0x4d1e09 in hb_vector_size_t<unsigned long long, 64u>::operator[](unsigned int) const /home/user/code/harfbuzz/src/./hb-algs.hh:1016:60 #1 0x4d8b5f in hb_set_t::page_t::previous(unsigned int*) const /home/user/code/harfbuzz/src/./hb-set.hh:139:53 #2 0x4d0ada in hb_set_t::previous(unsigned int*) const /home/user/code/harfbuzz/src/./hb-set.hh:602:36 harfbuzz#3 0x4cd76f in hb_set_previous /home/user/code/harfbuzz/src/hb-set.cc:494:15 harfbuzz#4 0x4ca8af in test_set_iter /home/user/code/harfbuzz/test/api/test-set.c:310:3 harfbuzz#5 0x7f3a4f3e0f49 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72f49) harfbuzz#6 0x7f3a4f3e0e7a (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72e7a) harfbuzz#7 0x7f3a4f3e1121 in g_test_run_suite (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x73121) harfbuzz#8 0x7f3a4f3e1140 in g_test_run (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x73140) harfbuzz#9 0x4c8894 in hb_test_run /home/user/code/harfbuzz/test/api/./hb-test.h:88:10 harfbuzz#10 0x4c8894 in main /home/user/code/harfbuzz/test/api/test-set.c:408:10 harfbuzz#11 0x7f3a4e3d2b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 harfbuzz#12 0x41e7d9 in _start (/home/user/code/harfbuzz/test/api/test-set+0x41e7d9)
Fixes this -fno-sanitize-recover=undefined fail, hb-ot-map.hh:188:1: runtime error: load of value 4294967294, which is not a valid value for type 'hb_ot_map_feature_flags_t' #0 0x7f62bfa9b227 in operator&=(hb_ot_map_feature_flags_t&, hb_ot_map_feature_flags_t) /home/ebrahim/Desktop/harfbuzz/src/./hb-ot-map.hh:188:1 #1 0x7f62bfa9b227 in hb_ot_map_builder_t::compile(hb_ot_map_t&, hb_ot_shape_plan_key_t const&) /home/ebrahim/Desktop/harfbuzz/src/hb-ot-map.cc:194 #2 0x7f62bface650 in hb_ot_shape_planner_t::compile(hb_ot_shape_plan_t&, hb_ot_shape_plan_key_t const&) /home/ebrahim/Desktop/harfbuzz/src/hb-ot-shape.cc:108:7 harfbuzz#3 0x7f62bfacec1e in hb_ot_shape_plan_t::init0(hb_face_t*, hb_shape_plan_key_t const*) /home/ebrahim/Desktop/harfbuzz/src/hb-ot-shape.cc:225:11 harfbuzz#4 0x7f62bfae1318 in hb_shape_plan_create2 /home/ebrahim/Desktop/harfbuzz/src/hb-shape-plan.cc:232:7 harfbuzz#5 0x7f62bfae1d2a in hb_shape_plan_create_cached2 /home/ebrahim/Desktop/harfbuzz/src/hb-shape-plan.cc:489:33 harfbuzz#6 0x7f62bfae2527 in hb_shape_full /home/ebrahim/Desktop/harfbuzz/src/hb-shape.cc:135:33 harfbuzz#7 0x55ed360b6588 in shape_options_t::shape(hb_font_t*, hb_buffer_t*, char const**) /home/ebrahim/Desktop/harfbuzz/util/./options.hh:242:10 harfbuzz#8 0x55ed360b5d9c in shape_consumer_t<output_buffer_t>::consume_line(char const*, unsigned int, char const*, char const*) /home/ebrahim/Desktop/harfbuzz/util/./shape-consumer.hh:67:19 harfbuzz#9 0x55ed360b549f in main_font_text_t<shape_consumer_t<output_buffer_t>, 2147483647, 0>::main(int, char**) /home/ebrahim/Desktop/harfbuzz/util/./main-font-text.hh:81:16 harfbuzz#10 0x55ed360b4e23 in main /home/ebrahim/Desktop/harfbuzz/util/hb-shape.cc:189:17 harfbuzz#11 0x7f62bf104ee2 in __libc_start_main (/usr/lib/libc.so.6+0x26ee2) harfbuzz#12 0x55ed3608f7ad in _start (/home/ebrahim/Desktop/harfbuzz/util/.libs/lt-hb-shape+0xd7ad)
Fixes this -fno-sanitize-recover=undefined check, /buffer/positions/empty: hb-buffer.cc:327:11: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:60:62: note: nonnull attribute specified here #0 0x4cf31c in hb_buffer_t::clear_positions() /home/user/code/harfbuzz/src/hb-buffer.cc:327:3 #1 0x4d4dd4 in hb_buffer_get_glyph_positions /home/user/code/harfbuzz/src/hb-buffer.cc:1418:13 #2 0x4cb553 in test_buffer_positions /home/user/code/harfbuzz/test/api/test-buffer.c:305:3 #3 0x7f324187bf49 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72f49) #4 0x7f324187be7a (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72e7a) #5 0x7f324187be7a (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72e7a) #6 0x7f324187c121 in g_test_run_suite (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x73121) #7 0x7f324187c140 in g_test_run (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x73140) #8 0x4c8bd3 in hb_test_run /home/user/code/harfbuzz/test/api/./hb-test.h:88:10 #9 0x4c8bd3 in main /home/user/code/harfbuzz/test/api/test-buffer.c:884:10 #10 0x7f324086db96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 #11 0x41e919 in _start (/home/user/code/harfbuzz/test/api/test-buffer+0x41e919)
Fixes this -fno-sanitize-recover=undefined fail, /set/iter: hb-algs.hh:1016:60: runtime error: index 4294967295 out of bounds for type 'unsigned long long const[8]' #0 0x4d1e09 in hb_vector_size_t<unsigned long long, 64u>::operator[](unsigned int) const /home/user/code/harfbuzz/src/./hb-algs.hh:1016:60 #1 0x4d8b5f in hb_set_t::page_t::previous(unsigned int*) const /home/user/code/harfbuzz/src/./hb-set.hh:139:53 #2 0x4d0ada in hb_set_t::previous(unsigned int*) const /home/user/code/harfbuzz/src/./hb-set.hh:602:36 #3 0x4cd76f in hb_set_previous /home/user/code/harfbuzz/src/hb-set.cc:494:15 #4 0x4ca8af in test_set_iter /home/user/code/harfbuzz/test/api/test-set.c:310:3 #5 0x7f3a4f3e0f49 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72f49) #6 0x7f3a4f3e0e7a (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72e7a) #7 0x7f3a4f3e1121 in g_test_run_suite (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x73121) #8 0x7f3a4f3e1140 in g_test_run (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x73140) #9 0x4c8894 in hb_test_run /home/user/code/harfbuzz/test/api/./hb-test.h:88:10 #10 0x4c8894 in main /home/user/code/harfbuzz/test/api/test-set.c:408:10 #11 0x7f3a4e3d2b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 #12 0x41e7d9 in _start (/home/user/code/harfbuzz/test/api/test-set+0x41e7d9)
Fixes this -fno-sanitize-recover=undefined fail, hb-ot-map.hh:188:1: runtime error: load of value 4294967294, which is not a valid value for type 'hb_ot_map_feature_flags_t' #0 0x7f62bfa9b227 in operator&=(hb_ot_map_feature_flags_t&, hb_ot_map_feature_flags_t) /home/ebrahim/Desktop/harfbuzz/src/./hb-ot-map.hh:188:1 #1 0x7f62bfa9b227 in hb_ot_map_builder_t::compile(hb_ot_map_t&, hb_ot_shape_plan_key_t const&) /home/ebrahim/Desktop/harfbuzz/src/hb-ot-map.cc:194 #2 0x7f62bface650 in hb_ot_shape_planner_t::compile(hb_ot_shape_plan_t&, hb_ot_shape_plan_key_t const&) /home/ebrahim/Desktop/harfbuzz/src/hb-ot-shape.cc:108:7 #3 0x7f62bfacec1e in hb_ot_shape_plan_t::init0(hb_face_t*, hb_shape_plan_key_t const*) /home/ebrahim/Desktop/harfbuzz/src/hb-ot-shape.cc:225:11 #4 0x7f62bfae1318 in hb_shape_plan_create2 /home/ebrahim/Desktop/harfbuzz/src/hb-shape-plan.cc:232:7 #5 0x7f62bfae1d2a in hb_shape_plan_create_cached2 /home/ebrahim/Desktop/harfbuzz/src/hb-shape-plan.cc:489:33 #6 0x7f62bfae2527 in hb_shape_full /home/ebrahim/Desktop/harfbuzz/src/hb-shape.cc:135:33 #7 0x55ed360b6588 in shape_options_t::shape(hb_font_t*, hb_buffer_t*, char const**) /home/ebrahim/Desktop/harfbuzz/util/./options.hh:242:10 #8 0x55ed360b5d9c in shape_consumer_t<output_buffer_t>::consume_line(char const*, unsigned int, char const*, char const*) /home/ebrahim/Desktop/harfbuzz/util/./shape-consumer.hh:67:19 #9 0x55ed360b549f in main_font_text_t<shape_consumer_t<output_buffer_t>, 2147483647, 0>::main(int, char**) /home/ebrahim/Desktop/harfbuzz/util/./main-font-text.hh:81:16 #10 0x55ed360b4e23 in main /home/ebrahim/Desktop/harfbuzz/util/hb-shape.cc:189:17 #11 0x7f62bf104ee2 in __libc_start_main (/usr/lib/libc.so.6+0x26ee2) #12 0x55ed3608f7ad in _start (/home/ebrahim/Desktop/harfbuzz/util/.libs/lt-hb-shape+0xd7ad)
...as seen with HarfBuzz used by LibreOffice, with `instdir/program/soffice --headless --convert-to pdf` of doc/abi6073-2.doc from the LibreOffice crash- testing corpus when run under UBSan, > hb-graphite2.cc:361:15: runtime error: -1024 is outside the range of representable values of type 'unsigned int' > #0 in _hb_graphite2_shape at workdir/UnpackedTarball/harfbuzz/src/hb-graphite2.cc:361:15 > harfbuzz#1 in _hb_shape_plan_execute_internal(hb_shape_plan_t*, hb_font_t*, hb_buffer_t*, hb_feature_t const*, unsigned int) at workdir/UnpackedTarball/harfbuzz/src/./hb-shaper-list.hh:38:1 > harfbuzz#2 in hb_shape_plan_execute at workdir/UnpackedTarball/harfbuzz/src/hb-shape-plan.cc:453:14 > harfbuzz#3 in hb_shape_full at workdir/UnpackedTarball/harfbuzz/src/hb-shape.cc:139:19 > harfbuzz#4 in GenericSalLayout::LayoutText(ImplLayoutArgs&, SalLayoutGlyphsImpl const*) at vcl/source/gdi/CommonSalLayout.cxx:495:23 > harfbuzz#5 in OutputDevice::getFallbackLayout(LogicalFontInstance*, int, ImplLayoutArgs&, SalLayoutGlyphs const*) const at vcl/source/outdev/font.cxx:1232:21 > harfbuzz#6 in OutputDevice::ImplGlyphFallbackLayout(std::unique_ptr<SalLayout, std::default_delete<SalLayout> >, ImplLayoutArgs&, SalLayoutGlyphs const*) const at vcl/source/outdev/font.cxx:1300:48 > harfbuzz#7 in OutputDevice::ImplLayout(rtl::OUString const&, int, int, Point const&, long, long const*, SalLayoutFlags, vcl::TextLayoutCache const*, SalLayoutGlyphs const*) const at vcl/source/outdev/text.cxx:1332:22 > harfbuzz#8 in lcl_CreateLayout(SwTextGlyphsKey const&, __gnu_debug::_Safe_iterator<std::_Rb_tree_iterator<std::pair<SwTextGlyphsKey const, SwTextGlyphsData> >, std::__debug::map<SwTextGlyphsKey, SwTextGlyphsData, std::less<SwTextGlyphsKey>, std::allocator<std::pair<SwTextGlyphsKey const, SwTextGlyphsData> > >, std::bidirectional_iterator_tag>) at sw/source/core/txtnode/fntcache.cxx:233:33 > harfbuzz#9 in SwFntObj::GetCachedSalLayoutGlyphs(SwTextGlyphsKey const&) at sw/source/core/txtnode/fntcache.cxx:257:12 > harfbuzz#10 in SwFont::GetTextBreak(SwDrawTextInfo const&, long) at sw/source/core/txtnode/fntcache.cxx:2551:58 > harfbuzz#11 in SwTextSizeInfo::GetTextBreak(long, o3tl::strong_int<int, Tag_TextFrameIndex>, unsigned short, vcl::TextLayoutCache const*) const at sw/source/core/text/inftxt.cxx:450:20 > harfbuzz#12 in SwTextGuess::Guess(SwTextPortion const&, SwTextFormatInfo&, unsigned short) at sw/source/core/text/guess.cxx:205:26 > harfbuzz#13 in SwTextPortion::Format_(SwTextFormatInfo&) at sw/source/core/text/portxt.cxx:305:32 > harfbuzz#14 in SwTextPortion::Format(SwTextFormatInfo&) at sw/source/core/text/portxt.cxx:456:12 > harfbuzz#15 in SwLineLayout::Format(SwTextFormatInfo&) at sw/source/core/text/porlay.cxx:260:31 (where in frame harfbuzz#4 GenericSalLayout::LayoutText, pHbBuffer->props.direction is HB_DIRECTION_RTL, in case that is relevant). It is unclear to me whether it is sufficient to only change hb_graphite2_cluster_t::advance from signed to unsigned int, as there are other unsigned int variables (like curradv in _hb_graphite2_shape) whose value depend on hb_graphite2_cluster_t::advance, and which thus might also become negative. But unlike the float -> unsigned int conversion that UBSan warned about here (where gr_slot_origin_X() and xscale are float), those are signed int -> unsigned int conversions that do not cause undefined behavior. At least, with this change, the above --convert-to pdf and a full `make check screenshot` succeeded for me under without further UBSan warnings. (For the version of HarfBuzz optionally built as part of the LibreOffice build, this has been addressed with <https://git.libreoffice.org/core/+/6e53e03f752c2f85283c4d47efaaf0683299783c%5E!/> "external/harfbuzz: hb_graphite2_cluster_t::advance can apparently be negative.")
...as seen with HarfBuzz used by LibreOffice, with `instdir/program/soffice --headless --convert-to pdf` of doc/abi6073-2.doc from the LibreOffice crash- testing corpus when run under UBSan, > hb-graphite2.cc:361:15: runtime error: -1024 is outside the range of representable values of type 'unsigned int' > #0 in _hb_graphite2_shape at workdir/UnpackedTarball/harfbuzz/src/hb-graphite2.cc:361:15 > #1 in _hb_shape_plan_execute_internal(hb_shape_plan_t*, hb_font_t*, hb_buffer_t*, hb_feature_t const*, unsigned int) at workdir/UnpackedTarball/harfbuzz/src/./hb-shaper-list.hh:38:1 > #2 in hb_shape_plan_execute at workdir/UnpackedTarball/harfbuzz/src/hb-shape-plan.cc:453:14 > #3 in hb_shape_full at workdir/UnpackedTarball/harfbuzz/src/hb-shape.cc:139:19 > #4 in GenericSalLayout::LayoutText(ImplLayoutArgs&, SalLayoutGlyphsImpl const*) at vcl/source/gdi/CommonSalLayout.cxx:495:23 > #5 in OutputDevice::getFallbackLayout(LogicalFontInstance*, int, ImplLayoutArgs&, SalLayoutGlyphs const*) const at vcl/source/outdev/font.cxx:1232:21 > #6 in OutputDevice::ImplGlyphFallbackLayout(std::unique_ptr<SalLayout, std::default_delete<SalLayout> >, ImplLayoutArgs&, SalLayoutGlyphs const*) const at vcl/source/outdev/font.cxx:1300:48 > #7 in OutputDevice::ImplLayout(rtl::OUString const&, int, int, Point const&, long, long const*, SalLayoutFlags, vcl::TextLayoutCache const*, SalLayoutGlyphs const*) const at vcl/source/outdev/text.cxx:1332:22 > #8 in lcl_CreateLayout(SwTextGlyphsKey const&, __gnu_debug::_Safe_iterator<std::_Rb_tree_iterator<std::pair<SwTextGlyphsKey const, SwTextGlyphsData> >, std::__debug::map<SwTextGlyphsKey, SwTextGlyphsData, std::less<SwTextGlyphsKey>, std::allocator<std::pair<SwTextGlyphsKey const, SwTextGlyphsData> > >, std::bidirectional_iterator_tag>) at sw/source/core/txtnode/fntcache.cxx:233:33 > #9 in SwFntObj::GetCachedSalLayoutGlyphs(SwTextGlyphsKey const&) at sw/source/core/txtnode/fntcache.cxx:257:12 > #10 in SwFont::GetTextBreak(SwDrawTextInfo const&, long) at sw/source/core/txtnode/fntcache.cxx:2551:58 > #11 in SwTextSizeInfo::GetTextBreak(long, o3tl::strong_int<int, Tag_TextFrameIndex>, unsigned short, vcl::TextLayoutCache const*) const at sw/source/core/text/inftxt.cxx:450:20 > #12 in SwTextGuess::Guess(SwTextPortion const&, SwTextFormatInfo&, unsigned short) at sw/source/core/text/guess.cxx:205:26 > #13 in SwTextPortion::Format_(SwTextFormatInfo&) at sw/source/core/text/portxt.cxx:305:32 > #14 in SwTextPortion::Format(SwTextFormatInfo&) at sw/source/core/text/portxt.cxx:456:12 > #15 in SwLineLayout::Format(SwTextFormatInfo&) at sw/source/core/text/porlay.cxx:260:31 (where in frame #4 GenericSalLayout::LayoutText, pHbBuffer->props.direction is HB_DIRECTION_RTL, in case that is relevant). It is unclear to me whether it is sufficient to only change hb_graphite2_cluster_t::advance from signed to unsigned int, as there are other unsigned int variables (like curradv in _hb_graphite2_shape) whose value depend on hb_graphite2_cluster_t::advance, and which thus might also become negative. But unlike the float -> unsigned int conversion that UBSan warned about here (where gr_slot_origin_X() and xscale are float), those are signed int -> unsigned int conversions that do not cause undefined behavior. At least, with this change, the above --convert-to pdf and a full `make check screenshot` succeeded for me under without further UBSan warnings. (For the version of HarfBuzz optionally built as part of the LibreOffice build, this has been addressed with <https://git.libreoffice.org/core/+/6e53e03f752c2f85283c4d47efaaf0683299783c%5E!/> "external/harfbuzz: hb_graphite2_cluster_t::advance can apparently be negative.")
Update for language changes
<U+AC00,U+11F0>
[uni1100.ljmo05=0+1024|uni1161.vjmo02=0+0|uni11F0.tjmo01=1+0]
, for instance.After this patch is applied, we now have the expected result.