Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzzing harfbuzz #139

Closed
kcc opened this issue Oct 9, 2015 · 36 comments
Closed

fuzzing harfbuzz #139

kcc opened this issue Oct 9, 2015 · 36 comments

Comments

@kcc
Copy link
Collaborator

kcc commented Oct 9, 2015

This in an umbrella issue for setting up regular fuzzing for harfbuzz and fixing the bugs that we find with fuzzing.

The starting point is the target function below used with libFuzzer.

#include "src/hb.h"
#include "src/hb-ot.h"
#include "Fuzzer/FuzzerInterface.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  const char text[] = "ABCDEXYZ123@_%&)*$!";

  hb_blob_t *blob = hb_blob_create((const char *)data, size,
                                   HB_MEMORY_MODE_READONLY, NULL, NULL);
  hb_face_t *face = hb_face_create(blob, 0);
  hb_font_t *font = hb_font_create(face);
  hb_ot_font_set_funcs(font);
  hb_font_set_scale(font, 12, 12);

  hb_buffer_t *buffer = hb_buffer_create();
  hb_buffer_add_utf8(buffer, text, -1, 0, -1);
  hb_buffer_guess_segment_properties(buffer);

  hb_shape(font, buffer, NULL, 0);

  hb_buffer_destroy(buffer);
  hb_font_destroy(font);
  hb_face_destroy(face);
  hb_blob_destroy(blob);
  return 0;
}

Eventually we'll need to submit this function to harfbuzz repo and extend it to cover more code.

Currently, this is my workflow to build the fuzzer:

  1. Get fresh llvm and build libFuzzer.
(
cd harfbuzz
make distclean
SAN=-fsanitize=address
COV=-fsanitize-coverage=edge,8bit-counters,trace-cmp

CXX="clang++ $SAN $COV" CC="clang $SAN $COV" CCLD="clang++ $SAN $COV" ../harfbuzz/configure --enable-static --disable-shared
make -j
)
clang++ -std=c++11 harfbuzz_fuzzer.cc -fsanitize=address -fsanitize-coverage=edge -I harfbuzz -I. ./harfbuzz/src/.libs/libharfbuzz.a -lglib-2.0 Fuzzer*.o -o harfbuzz_fuzzer
@kcc
Copy link
Collaborator Author

kcc commented Oct 9, 2015

Two recent findings:
ee9b0b6
f396fbb

@kcc
Copy link
Collaborator Author

kcc commented Oct 9, 2015

Next catch.
Reproducer:

echo AAEAAAADgCoAKvwNUEdEKkYAKeAAdAA9AAAAM0dQT1MA1UARAAAAAAoDAAMAAQAEAAgAAgADAAEABAABAA== | base64 --decode > reproducer

==12311==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60600000effd at pc 0x000000619d1b bp 0x7ffccad4b690 sp 0x7ffccad4b688
READ of size 1 at 0x60600000effd thread T0
    #0 0x619d1a in OT::BEInt<unsigned short, 2>::operator unsigned short() const src/./hb-open-type-private.hh:584:13
    #1 0x619d1a in OT::IntType<unsigned short, 2u>::operator unsigned short() const src/./hb-open-type-private.hh:632
    #2 0x619d1a in OT::ValueFormat::get_len() const src/./hb-ot-layout-gpos-table.hh:93
    #3 0x619d1a in OT::PairPosFormat1::sanitize(OT::hb_sanitize_context_t*) const src/./hb-ot-layout-gpos-table.hh:708
    #4 0x617363 in bool OT::hb_sanitize_context_t::dispatch<OT::PairPosFormat1>(OT::PairPosFormat1 const&) src/./hb-open-type-private.hh:205:52
    #5 0x617363 in OT::hb_sanitize_context_t::return_t OT::PairPos::dispatch<OT::hb_sanitize_context_t>(OT::hb_sanitize_context_t*) const src/./hb-ot-layout-gpos-table.hh:848
    #6 0x617363 in OT::hb_sanitize_context_t::return_t OT::PosLookupSubTable::dispatch<OT::hb_sanitize_context_t>(OT::hb_sanitize_context_t*, unsigned int) const src/./hb-ot-layout-gpos-table.hh:1410
    #7 0x6163d5 in OT::hb_sanitize_context_t::return_t OT::Lookup::dispatch<OT::PosLookupSubTable, OT::hb_sanitize_context_t>(OT::hb_sanitize_context_t*) const src/./hb-ot-layout-common-private.hh:614:40
    #8 0x6163d5 in OT::hb_sanitize_context_t::return_t OT::PosLookup::dispatch<OT::hb_sanitize_context_t>(OT::hb_sanitize_context_t*) const src/./hb-ot-layout-gpos-table.hh:1476
    #9 0x6163d5 in OT::PosLookup::sanitize(OT::hb_sanitize_context_t*) const src/./hb-ot-layout-gpos-table.hh:1482
    #10 0x615909 in OT::OffsetTo<OT::PosLookup, OT::IntType<unsigned short, 2u> >::sanitize(OT::hb_sanitize_context_t*, void const*) const src/./hb-open-type-private.hh:792:5
    #11 0x615909 in OT::ArrayOf<OT::OffsetTo<OT::PosLookup, OT::IntType<unsigned short, 2u> >, OT::IntType<unsigned short, 2u> >::sanitize(OT::hb_sanitize_context_t*, void const*) const src/./hb-open-type-private.hh:893
    #12 0x5cd859 in OT::OffsetListOf<OT::PosLookup>::sanitize(OT::hb_sanitize_context_t*) const src/./hb-open-type-private.hh:950:5
    #13 0x5cd859 in OT::OffsetTo<OT::OffsetListOf<OT::PosLookup>, OT::IntType<unsigned short, 2u> >::sanitize(OT::hb_sanitize_context_t*, void const*) const src/./hb-open-type-private.hh:792
    #14 0x5cd859 in OT::GPOS::sanitize(OT::hb_sanitize_context_t*) const src/./hb-ot-layout-gpos-table.hh:1507
    #15 0x5cd859 in OT::Sanitizer<OT::GPOS>::sanitize(hb_blob_t*) src/./hb-open-type-private.hh:337
    #16 0x5a7598 in _hb_ot_layout_create(hb_face_t*) src/hb-ot-layout.cc:60:23
    #17 0x50f4c6 in hb_ot_shaper_face_data_ensure(hb_face_t*) src/./hb-shaper-list.hh:43:1
    #18 0x50f4c6 in hb_shape_plan_plan(hb_shape_plan_t*, hb_feature_t const*, unsigned int, char const* const*) src/./hb-shaper-list.hh:43
    #19 0x50f4c6 in hb_shape_plan_create src/hb-shape-plan.cc:149
    #20 0x512be5 in hb_shape_plan_create_cached src/hb-shape-plan.cc:458:33
    #21 0x50e02e in hb_shape_full src/hb-shape.cc:374:33
    #22 0x50e02e in hb_shape src/hb-shape.cc:405
    #23 0x4d3936 in LLVMFuzzerTestOneInput 

0x60600000effd is located 0 bytes to the right of 61-byte region [0x60600000efc0,0x60600000effd)
allocated by thread T0 here:
    #0 0x4a92db in __interceptor_malloc 
    #1 0x4d786b in _try_writable(hb_blob_t*) src/hb-blob.cc:465:23
    #2 0x5a7598 in _hb_ot_layout_create(hb_face_t*) src/hb-ot-layout.cc:60:23
    #3 0x50f4c6 in hb_ot_shaper_face_data_ensure(hb_face_t*) src/./hb-shaper-list.hh:43:1
    #4 0x50f4c6 in hb_shape_plan_plan(hb_shape_plan_t*, hb_feature_t const*, unsigned int, char const* const*) src/./hb-shaper-list.hh:43
    #5 0x50f4c6 in hb_shape_plan_create src/hb-shape-plan.cc:149

behdad added a commit that referenced this issue Oct 12, 2015
@behdad behdad closed this as completed in f966649 Oct 13, 2015
@behdad
Copy link
Member

behdad commented Oct 13, 2015

Should be fixed. Please reopen with next issue :).

@kcc
Copy link
Collaborator Author

kcc commented Oct 13, 2015

Gooood! This has unblocked further fuzzing.
Next item that is a minor headache for fuzzing, but is also a bug worth fixing:

==1899==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 152 byte(s) in 1 object(s) allocated from:
    #0 0x4b79fc in calloc /home/kcc/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56
    #1 0x4e9bf3 in _ZL16hb_object_createI9hb_blob_tEPT_v src/./hb-object-private.hh:129:24
    #2 0x4e9bf3 in hb_blob_create src/hb-blob.cc:108
    #3 0x4e9bf3 in hb_blob_create_sub_blob src/hb-blob.cc:164
    #4 0x50acf4 in _hb_face_for_data_reference_table(hb_face_t*, unsigned int, void*) src/hb-face.cc:146:21
    #5 0x52f135 in hb_face_t::reference_table(unsigned int) const src/./hb-face-private.hh:72:12
    #6 0x52f135 in hb_ot_face_glyf_accelerator_t::init(hb_face_t*) src/hb-ot-font.cc:121
    #7 0x5293c4 in _hb_ot_font_create(hb_face_t*) src/hb-ot-font.cc:253:3
    #8 0x5293c4 in hb_ot_font_set_funcs src/hb-ot-font.cc:425
    #9 0x4e4ce4 in LLVMFuzzerTestOneInput /home/kcc/harfbuzz/test/fuzzing/hb-fuzzer.cc:12:3
    #10 0x4e5633 in main /home/kcc/harfbuzz/test/fuzzing/hb-fuzzer.cc:44:5
    #11 0x7fc341b06ec4 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)
    #12 0x41a4ce in _start (/home/kcc/harfbuzz_run+0x41a4ce)
Indirect leak of 152 byte(s) in 1 object(s) allocated from:
    #0 0x4b79fc in calloc /home/kcc/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56
    #1 0x4e8bf5 in _ZL16hb_object_createI9hb_blob_tEPT_v src/./hb-object-private.hh:129:24
    #2 0x4e8bf5 in hb_blob_create src/hb-blob.cc:108
    #3 0x4e4cbb in LLVMFuzzerTestOneInput /home/kcc/harfbuzz/test/fuzzing/hb-fuzzer.cc:8:21
    #4 0x4e5633 in main /home/kcc/harfbuzz/test/fuzzing/hb-fuzzer.cc:44:5
    #5 0x7fc341b06ec4 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)
    #6 0x41a4ce in _start (/home/kcc/harfbuzz_run+0x41a4ce)

Reproducer in base64:

AAEAAAAOAIAAAwBgR1NVQuIT4Q4AAAgYAAAARE9TLzJxolP2AAAB0AAAAFZjbWFwAAwwXwAAAigA
AAA0Y3YAAQAAAAMCAgAqRUZRDfzQU0cpA+AAAD0AAAA0R1BPUwAVSAEAAAAAAwoAAQAEAAV0IA/A
EAAAAAWUAAACAGZwZ20BUpwYAAACXAAAALNnbHlmR9rfuAAAAOwAAABYaGVhZNv2AAAABAABAAMA
AgBoxARfNAAAAWwAAAA2aGhlYQejA0EAAAGsAAAAJGhtdHgGyAAAAq0AAAGkAAAACGxvY2EAFgAs
AAABZAAAAAhtYXhwGDUSawAAAUQAAAAgbmFtZQOoDBEAAAeUAAAARHBvc3QdaKP9AAAH2AAAAD1w
cmVwDyU+pQAAAxAAAAKCAAECyAA= 

@behdad behdad reopened this Oct 13, 2015
@behdad
Copy link
Member

behdad commented Oct 13, 2015

You should be able to reopen issues now.

@behdad
Copy link
Member

behdad commented Oct 13, 2015

Leak fixed.

behdad added a commit that referenced this issue Oct 13, 2015
@kcc
Copy link
Collaborator Author

kcc commented Oct 13, 2015

Now, the bane of fuzzing: non-linear algorithms.
I am attaching several inputs that consume multiple seconds each.
In case these are different problems, please check all after fixing one.

For slow-unit-03b06616c9de07c85802e6a9a0652354266af8fd (takes 5 minutes to finish)
the profile looks like this:

 37.81%  harfbuzz_run  harfbuzz_run       [.] OT::apply_lookup(OT::hb_apply_context_t*, unsigned int, unsigned int*, unsigned int, OT::LookupRecord const*, unsigned int)
 21.42%  harfbuzz_run  harfbuzz_run       [.] hb_buffer_t::move_to(unsigned int)
 11.99%  harfbuzz_run  harfbuzz_run       [.] OT::SubstLookup::apply_recurse_func(OT::hb_apply_context_t*, unsigned int)

ARTIFACTS.tgz.pdf

(hm... github does not allow to upload .tgz file. The .pdf file attached is actually a tgz file. Weird)

behdad added a commit that referenced this issue Oct 14, 2015
We probably should implement better system to catch cyclic lookups.
But for now, this speeds up worst case behavior with broken fonts
considerably without compromising legitimate usecases.

#139 (comment)
@behdad
Copy link
Member

behdad commented Oct 14, 2015

I pushed out a change that makes these edge cases significantly faster (over 10x). We still can do better, but this should give you a boost for now.

@kcc
Copy link
Collaborator Author

kcc commented Oct 15, 2015

Did some tricks out of my sleeve...

==304==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61f00000fa98 at pc 0x0000005188a4 bp 0x7fffc8c70cf0 sp 0x7fffc8c70ce8
READ of size 1 at 0x61f00000fa98 thread T0
    #0 0x5188a3 in OT::BEInt<unsigned short, 2>::operator unsigned short() const src/./hb-open-type-private.hh:583:13
    #1 0x5188a3 in OT::IntType<unsigned short, 2u>::operator unsigned short() const src/./hb-open-type-private.hh:632
    #2 0x5188a3 in hb_ot_face_metrics_accelerator_t::get_advance(unsigned int) const src/hb-ot-font.cc:92
    #3 0x5188a3 in hb_ot_get_glyph_h_advance(hb_font_t*, void*, unsigned int, void*) src/hb-ot-font.cc:290
    #4 0x52c9cb in hb_font_t::get_glyph_h_advance(unsigned int) src/./hb-font-private.hh:164:12
    #5 0x52c9cb in hb_font_t::get_glyph_advance_for_direction(unsigned int, hb_direction_t, int*, int*) src/./hb-font-private.hh:257
    #6 0x52c9cb in hb_ot_position_default(hb_ot_shape_context_t*) src/hb-ot-shape.cc:652
    #7 0x52c9cb in hb_ot_position(hb_ot_shape_context_t*) src/hb-ot-shape.cc:752
    #8 0x52c9cb in hb_ot_shape_internal(hb_ot_shape_context_t*) src/hb-ot-shape.cc:796
    #9 0x52c9cb in _hb_ot_shape src/hb-ot-shape.cc:816
    #10 0x51130b in hb_shape_plan_execute src/./hb-shaper-list.hh:43:1
    #11 0x50e325 in hb_shape_full src/hb-shape.cc:375:19
    #12 0x50e325 in hb_shape src/hb-shape.cc:405
    #13 0x4d3946 in LLVMFuzzerTestOneInput

0x61f00000fa98 is located 127 bytes to the right of 1049-byte region [0x61f00000f600,0x61f00000fa19)
allocated by thread T0 here:
    #0 0x4d19eb in operator new(unsigned long)
    #1 0x7fd000411668 in __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) 
    #2 0x7fd000411668 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) 
    #3 0x4d5474 in char* std::string::_S_construct_aux<std::istreambuf_iterator<char, std::char_traits<char> > >(std::istreambuf_iterator<char, std::char_traits<char> >, std::
    #4 0x4d4fe4 in char* std::string::_S_construct<std::istreambuf_iterator<char, std::char_traits<char> > >(std::istreambuf_iterator<char, std::char_traits<char> >, std::istr
    #5 0x4d4b03 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<std::istreambuf_iterator<char, std::char_traits<char> > >(std::istreamb
    #6 0x4d3c6f in FileToString(std::string const&) 

Note the "is located 127 bytes to the right of 1049-byte region" bit.
you will need ASAN_OPTIONS=redzone=128 to get reliable report.

Repro (base64)

AAEAAAALAIAAAwAwR1NVQrKltSEAACNoAAAAfk9TLzJpynqEAAABOAAAAGBjbWFwAGUDrAAAAogA
AACIZ2x5ZsL9NwYAAAOMAAAZMmhlYWQw2sRrAAAAvAAAADZoaGVhOOY4LQAAAPQAAAAkaG10eKLe
DeEAAAOYAAAA8GxvY2HhQ+fQAAADEAAAAHxtYXhwAF8DbwAAARgAAAAgbmFtZUVYv/YAABzAAAAF
EHBvc3TZ1eHyAAAh0AAAAZcAAQAAAAVMzIO9wBJfDzz1Ah0IAAAAAADMFoLdAAAAANHWrBb/nv4Z
MhwFlgAAAAgAAAAAAAAAAAABAAAHJ/4IAAA6mP+e/1EyHgABAAAAAAAAAAAAAAAAAAAAOwABAAAA
PQBdAAMCzQAVAAEAAABBAAAAAAAAAAAACQADAAMP5gGQAAUAAAUzBZkAAAEeBTMFmQAAA9cAZgIS
AQUCAAUDAAAAAAAAAAAAAQAAAAAAAAAAAAAAAFBmRWQAQAAgAHoHJ/4IAAAHJwH4AAAAAQAAAAAD
bwVEAEAAFAAHBAAAAAIAAAADuABQA7gAtgO4AGoDuABaA7gAOQO4AG8DuABaA7gAaAO4AF4DuABm
A6cASgPxABADbABMBAwAUAOTAEwCegAtBAAAQgRNABwCKwA1Ai3/ngQYAB0CHAAlBlEALQRWAC0E
CABUBCYACAQGAEgC+QAtAx4AYgKHADMEPwAtA/kADAX5AAwD6wAhBB4AGQNkAEwOEABODhAATg4Q
AE4OEABODhAATg4QAE4OEABODhAATg4QAE46mABOOpgATjqYAE46mABOOpgATjqYAE46mABOOpgA
TjqYAE46mAAAOpgAABOIAAAAAAAAAAAABAAAAAMAAAAkAAAACgAAAFQAAwABAAAAJAADAAoAAABU
AAQAMAAAAAgACAACAAAAIAA5AHr//wAAACAAMABh////4f/S/6sAAQAAAAAAAAAAAAwAAAAAADQA
AAAAAAAAAwAAACAAAAAgAAAAAQAAADAAAAA5AAAAOwIAAABhAAAAegAAAAwAAAAAAAAAOwBrALEB
BQFNAYwBxgHzAj4CeALLAxUDRAOfA9YEHASVBPwFOgWBBfQGIwanBwMHMQePB+QILQh5CLQJCglT
CckKQgqbCtkK6Qr5CwkLGQspCzkLSQtZC2kLhQuRC50LxQvRC90MAQwNDBkMNQxVDG0MhQyZAAIA
UP/sA2AE4QARACIAAAEiDgIVEBcxPgI3NhEQJyYDIicmAjUQEjMyFxYRFAIGBgHbJ0VEKdMSKTgz
ECN3Jz16VlRf66CWZItOfIQEmjJ2/LL98AEMI1A8fwEmAX9lI/tSVlMBIKUBIwFkhbj+yar++Q==

behdad added a commit that referenced this issue Oct 15, 2015
@kcc
Copy link
Collaborator Author

kcc commented Oct 15, 2015

one more case of a very slow run (23 seconds)
slow-unit-b1e4f88fa1687e01e20b5708ad2f229631c1e460.pdf Not really a pdf file...

@kcc
Copy link
Collaborator Author

kcc commented Oct 16, 2015

The public bot is up and running (similar to the one for FreeType).
It is very inefficient right now due to slow inputs, some taking over 200 seconds (with 4K size limit).

Steps once the slowness is fixed:

  • give it some time to saturate
  • look at the coverage dump, see what's not covered
  • extend the test corpus with inputs that will improve coverage
  • extend the target function itself (e.g. to deal with unicode text)

@kcc
Copy link
Collaborator Author

kcc commented Oct 17, 2015

The attached input takes over 400 seconds to process, all time is spent in OT::apply_lookup
slow-unit-b599e7814a2a9343b4aa8bedd4f275ca9d2e6fd2.pdf

@behdad
Copy link
Member

behdad commented Oct 17, 2015

The attached input takes over 400 seconds to process, all time is spent in OT::apply_lookup
slow-unit-b599e7814a2a9343b4aa8bedd4f275ca9d2e6fd2.pdf

Thanks. Will take a look. The problem is, the OpenType layout model is very expressive. You can easily get exponential expansion from it. That said, by handling circular recursions we might get most of those handled already...

@kcc
Copy link
Collaborator Author

kcc commented Oct 19, 2015

jsonn pushed a commit to jsonn/pkgsrc that referenced this issue Oct 21, 2015
Overview of changes leading to 1.0.6
Thursday, October 15, 2015
====================================

- Reduce max nesting level in OT lookups from 8 to 6.
  Should not affect any real font as far as I know.
- Fix memory access issue in ot-font.
- Revert default load-flags of fonts created using hb_ft_font_create()
  back to FT_LOAD_DEFAULT|FT_LOAD_NO_HINTING.  This was changed in
  last release (1.0.5), but caused major issues, so revert.
  harfbuzz/harfbuzz#143


Overview of changes leading to 1.0.5
Tuesday, October 13, 2015
====================================

- Fix multiple memory access bugs discovered using libFuzzer.
  harfbuzz/harfbuzz#139
  Everyone should upgrade to this version as soon as possible.
  We now have continuous fuzzing set up, to avoid issues like
  these creeping in again.
- Misc fixes.

- New API:
  * hb_font_set_parent().
  * hb_ft_font_[sg]et_load_flags()
    The default flags for fonts created using hb_ft_font_create()
    has changed to default to FT_LOAD_DEFAULT now.  Previously it
    was defaulting to FT_LOAD_DFEAULT|FT_LOAD_NO_HINTING.

- API changes:
  * Fonts now default to units-per-EM as their scale, instead of 0.
  * hb_font_create_sub_font() does NOT make parent font immutable
    anymore.  hb_font_make_immutable() does.
@behdad
Copy link
Member

behdad commented Oct 28, 2015

The attached input takes over 400 seconds to process, all time is spent in OT::apply_lookup
slow-unit-b599e7814a2a9343b4aa8bedd4f275ca9d2e6fd2.pdf
Thanks. Will take a look. The problem is, the OpenType layout model is very expressive. You can easily get exponential expansion from it. That said, by handling circular recursions we might get most of those handled already...

To better understand this, see my new presentation, read the speaker notes, where I show that OpenType lookups are Turing complete if recursion depth is not limited...

https://goo.gl/yWYUrd

@kcc
Copy link
Collaborator Author

kcc commented Oct 29, 2015

Is any workaround possible? Some way to limit the recursion (under a flag?) and return early?
The fuzzing is completely stalled :(

@behdad
Copy link
Member

behdad commented Oct 29, 2015

We already have a max recursion depth of 6 in place. I haven't had time to look at your worst examples to see what other measures I might be able to put in place.

@kcc
Copy link
Collaborator Author

kcc commented Nov 3, 2015

despite the slow runs, here is one more trophy.
having all reports in this single bug actually looks silly to me now. Will be filing individual bugs separately. New one: #156

@behdad
Copy link
Member

behdad commented Nov 3, 2015

We already have a max recursion depth of 6 in place. I haven't had time to look at your worst examples to see what other measures I might be able to put in place.

I still haven't studied the examples, but in the mean time I added a compile time way to limit the recursion depth even further. Just define HB_MAX_NESTING_LEVEL to 2 or 3. Setting it to 1 might be too limiting.

behdad added a commit that referenced this issue Nov 3, 2015
Exactly the same problem that I fixed in
63ef0b4

I rewrote the table checking yesterday in
67f8821
and introduced the exact same issue again. :(
Good thing we have ongoing fuzzing going now.  Was discovered
immediately by libFuzzer.  Thanks kcc!

#139 (comment)
Fixes #156
@kcc
Copy link
Collaborator Author

kcc commented Nov 6, 2015

In the mean time, any news on adding a fuzzing vector for UTF-8 text?

So far I don't have any nice solution for cross-fuzzing two kinds of inputs.
It's on my radar, but I want to collect more motivating examples first. (and get some time for quality design).

questions:
how long should be the text to touch interesting parts of code?
does it have to be a valid utf-8?

jsonn pushed a commit to jsonn/pkgsrc that referenced this issue Nov 6, 2015
Overview of changes leading to 1.0.6
Thursday, October 15, 2015
====================================

- Reduce max nesting level in OT lookups from 8 to 6.
  Should not affect any real font as far as I know.
- Fix memory access issue in ot-font.
- Revert default load-flags of fonts created using hb_ft_font_create()
  back to FT_LOAD_DEFAULT|FT_LOAD_NO_HINTING.  This was changed in
  last release (1.0.5), but caused major issues, so revert.
  harfbuzz/harfbuzz#143


Overview of changes leading to 1.0.5
Tuesday, October 13, 2015
====================================

- Fix multiple memory access bugs discovered using libFuzzer.
  harfbuzz/harfbuzz#139
  Everyone should upgrade to this version as soon as possible.
  We now have continuous fuzzing set up, to avoid issues like
  these creeping in again.
- Misc fixes.

- New API:
  * hb_font_set_parent().
  * hb_ft_font_[sg]et_load_flags()
    The default flags for fonts created using hb_ft_font_create()
    has changed to default to FT_LOAD_DEFAULT now.  Previously it
    was defaulting to FT_LOAD_DFEAULT|FT_LOAD_NO_HINTING.

- API changes:
  * Fonts now default to units-per-EM as their scale, instead of 0.
  * hb_font_create_sub_font() does NOT make parent font immutable
    anymore.  hb_font_make_immutable() does.
@behdad
Copy link
Member

behdad commented Nov 7, 2015

In the mean time, any news on adding a fuzzing vector for UTF-8 text?

So far I don't have any nice solution for cross-fuzzing two kinds of inputs.
It's on my radar, but I want to collect more motivating examples first. (and get some time for quality design).

questions:
how long should be the text to touch interesting parts of code?

Less than 8 or 16 characters.

does it have to be a valid utf-8?

Doesn't have to be. The useful ones are. But HB should work regardless.

@kcc
Copy link
Collaborator Author

kcc commented Nov 7, 2015

What if we use hb_buffer_add_utf32 instead of utf8?
Will this examine (almost) the same functionality?
If so, we can treat last N*4 bytes of the font file as the buffer in ut32.
This is similar to the strategy I've used for regular expression matchers.

@behdad
Copy link
Member

behdad commented Nov 12, 2015

What if we use hb_buffer_add_utf32 instead of utf8?
Will this examine (almost) the same functionality?

Yes. It wouldn't exercise the UTF-8 decoder, but that's minor.

If so, we can treat last N*4 bytes of the font file as the buffer in ut32.

Yes, that works. And you'd need a new set of seeds I assume?

This is similar to the strategy I've used for regular expression matchers.

@kcc
Copy link
Collaborator Author

kcc commented Nov 12, 2015

Let's do it then (treat last N*4 bytes of font as UTF32 string).
I don't think we'll need more seeds -- UTF32 is trivial enough that the fuzzer should discover interesting strings by itself

@kcc
Copy link
Collaborator Author

kcc commented Dec 1, 2015

We can use something like to fuzz utf32.

  uint32_t text32[16];
  if (size > sizeof(text32)) {
    memcpy(text32, data + size - sizeof(text32), sizeof(text32));
    hb_buffer_t *buffer = hb_buffer_create();
    hb_buffer_add_utf32(buffer, text32, sizeof(text32)/sizeof(text32[0]), 0, -1);
    hb_buffer_guess_segment_properties(buffer);
    hb_shape(font, buffer, NULL, 0);
    hb_buffer_destroy(buffer);
  }

@behdad
Copy link
Member

behdad commented Dec 7, 2015

We can use something like to fuzz utf32.

uint32_t text32[16];
if (size > sizeof(text32)) {
memcpy(text32, data + size - sizeof(text32), sizeof(text32));
hb_buffer_t *buffer = hb_buffer_create();
hb_buffer_add_utf32(buffer, text32, sizeof(text32)/sizeof(text32[0]), 0, -1);
hb_buffer_guess_segment_properties(buffer);
hb_shape(font, buffer, NULL, 0);
hb_buffer_destroy(buffer);
}

Can you send a pull-request for that? Thanks.

@kcc
Copy link
Collaborator Author

kcc commented Dec 10, 2015

Maybe you could just apply the patch below?

--- a/test/fuzzing/hb-fuzzer.cc
+++ b/test/fuzzing/hb-fuzzer.cc
@@ -1,9 +1,9 @@
 #include <stddef.h>
 #include <hb.h>
 #include <hb-ot.h>
+#include <string.h>

 extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
-  const char text[] = "ABCDEXYZ123@_%&)*$!";

   hb_blob_t *blob = hb_blob_create((const char *)data, size,
                                    HB_MEMORY_MODE_READONLY, NULL, NULL);
@@ -12,13 +12,26 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
   hb_ot_font_set_funcs(font);
   hb_font_set_scale(font, 12, 12);

-  hb_buffer_t *buffer = hb_buffer_create();
-  hb_buffer_add_utf8(buffer, text, -1, 0, -1);
-  hb_buffer_guess_segment_properties(buffer);
+  {
+    const char text[] = "ABCDEXYZ123@_%&)*$!";
+    hb_buffer_t *buffer = hb_buffer_create();
+    hb_buffer_add_utf8(buffer, text, -1, 0, -1);
+    hb_buffer_guess_segment_properties(buffer);
+    hb_shape(font, buffer, NULL, 0);
+    hb_buffer_destroy(buffer);
+  }
+
+  uint32_t text32[16];
+  if (size > sizeof(text32)) {
+    memcpy(text32, data + size - sizeof(text32), sizeof(text32));
+    hb_buffer_t *buffer = hb_buffer_create();
+    hb_buffer_add_utf32(buffer, text32, sizeof(text32)/sizeof(text32[0]), 0, -1);
+    hb_buffer_guess_segment_properties(buffer);
+    hb_shape(font, buffer, NULL, 0);
+    hb_buffer_destroy(buffer);
+  }

-  hb_shape(font, buffer, NULL, 0);

-  hb_buffer_destroy(buffer);
   hb_font_destroy(font);
   hb_face_destroy(face);
   hb_blob_destroy(blob);

(Or I'll need some time to figure out how pull requests work...)

@behdad
Copy link
Member

behdad commented Dec 10, 2015

Will do. Thanks.

@behdad
Copy link
Member

behdad commented Dec 10, 2015

I also need to change the fuzzing code to print out the UTF-32 test, such that we can easily reproduce with hb-shape or copy into our test suite.

behdad added a commit that referenced this issue Jan 11, 2016
Very rudimentary right now, but will get kcc's bot going.

From
#139 (comment)
@behdad
Copy link
Member

behdad commented Jan 11, 2016

Maybe you could just apply the patch below?

Done.

@kcc
Copy link
Collaborator Author

kcc commented Oct 6, 2016

Have you seen this?

harfbuzz-1.3.2: hb-buffer.cc:419: bool hb_buffer_t::move_to(unsigned int): Assertion `i <= out_len + (len - idx)' failed.
==29603== ERROR: libFuzzer: deadly signal
    #8 0x4f66e3 in hb_buffer_t::move_to(unsigned int) /usr/local/google/home/kcc/tmp/hb/BUILD/src/hb-buffer.cc:419:3
    #9 0x5ad271 in OT::apply_lookup(OT::hb_apply_context_t*, unsigned int, unsigned int*, unsigned int, OT::LookupRecord const*, unsigned int) hb-ot-layout-gsubgpos-private.hh:1036:11
    #10 0x610684 in OT::chain_context_apply_lookup(OT::hb_apply_context_t*, unsigned int, OT::IntType<unsigned short, 2u> const*, unsigned int, OT::IntType<unsigned short, 2u> const*, unsigned int, OT::IntType<unsigned short, 2u> const*, unsigned int, OT::LookupRecord const*, OT::ChainContextApplyLookupContext&) hb-ot-layout-gsubgpos-private.hh:1649:10
    #11 0x610684 in OT::ChainContextFormat3::apply(OT::hb_apply_context_t*) const hb-ot-layout-gsubgpos-private.hh:2089
    #12 0x604032 in bool OT::hb_apply_context_t::dispatch<OT::ChainContextFormat3>(OT::ChainContextFormat3 const&) hb-ot-layout-gsubgpos-private.hh:446:56
    #13 0x604032 in OT::hb_apply_context_t::return_t OT::ChainContext::dispatch<OT::hb_apply_context_t>(OT::hb_apply_context_t*) const hb-ot-layout-gsubgpos-private.hh:2140
    #14 0x604032 in OT::hb_apply_context_t::return_t OT::SubstLookupSubTable::dispatch<OT::hb_apply_context_t>(OT::hb_apply_context_t*, unsigned int) const hb-ot-layout-gsub-table.hh:1081
    #15 0x602d87 in OT::hb_apply_context_t::return_t OT::Lookup::dispatch<OT::SubstLookupSubTable, OT::hb_apply_context_t>(OT::hb_apply_context_t*) const hb-ot-layout-common-private.hh:628:71
    #16 0x602d87 in OT::hb_apply_context_t::return_t OT::SubstLookup::dispatch<OT::hb_apply_context_t>(OT::hb_apply_context_t*) const hb-ot-layout-gsub-table.hh:1231

I am getting it on 1.3.2, repro attached. (gzip-ed)
crash-e88c339237f52d21e01c55f01b9c1b4cc14a0467.gz

behdad added a commit that referenced this issue Dec 22, 2016
@behdad
Copy link
Member

behdad commented Dec 22, 2016

Have you seen this?

Cannot reproduce using hb-fuzzer. Can you double-check?

iongchun pushed a commit to iongchun/harfbuzz that referenced this issue Jan 12, 2017
behdad added a commit that referenced this issue Feb 25, 2017
This has no security implications whatsoever since we always keep
and extra element at the end of buffer, just in case.

Discovered by oss-fuzz
CC #139
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=660
@behdad behdad closed this as completed Mar 2, 2017
@ebraminio
Copy link
Collaborator

ebraminio commented Jul 4, 2017

Self note for the next time:

  1. Install http://llvm.org/docs/LibFuzzer.html
  2. Install fuzzer lib, git clone https://chromium.googlesource.com/chromium/llvm-project/llvm/lib/Fuzzer
  3. ./autogen.sh && make && cd test/fuzzing && make hb-fuzzer && cd ../..
  4. make clean && CXXFLAGS="-DDEBUG -g" ./configure
  5. make && cd test/fuzzing && ~/third_party/llvm-build/Release+Asserts/bin/clang++ -std=c++11 hb-fuzzer.cc -fsanitize=memory -I harfbuzz -I../../src -lglib-2.0 ../../src/.libs/libharfbuzz.so ~/Fuzzer/libFuzzer.a -o hb_fuzzer -g -w && ./hb_fuzzer TESTCASE; cd ../..
    (or, replace -fsanitize=memory with address or fuzzer,address)

scheib pushed a commit to scheib/chromium that referenced this issue Oct 7, 2017
For now it's not using a corpus since basically any string would work as
a useful input and the input data is interpreted as UTF-32, then
converted to UTF-16. This follow the discussion about raw input in the
HarfBuzz fuzzing bug:
harfbuzz/harfbuzz#139 (comment)

Bug: 771594
Change-Id: Ibf60bb6b67c60131337a712c7b8baf6b5219a52b
Reviewed-on: https://chromium-review.googlesource.com/700375
Commit-Queue: Dominik Röttsches <drott@chromium.org>
Reviewed-by: Max Moroz <mmoroz@chromium.org>
Cr-Commit-Position: refs/heads/master@{#507099}
nareix pushed a commit to nareix/webrtc.third_party that referenced this issue Mar 5, 2018
For now it's not using a corpus since basically any string would work as
a useful input and the input data is interpreted as UTF-32, then
converted to UTF-16. This follow the discussion about raw input in the
HarfBuzz fuzzing bug:
harfbuzz/harfbuzz#139 (comment)

Bug: 771594
Change-Id: Ibf60bb6b67c60131337a712c7b8baf6b5219a52b
Reviewed-on: https://chromium-review.googlesource.com/700375
Commit-Queue: Dominik Röttsches <drott@chromium.org>
Reviewed-by: Max Moroz <mmoroz@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#507099}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: b8cc3ad7241229a774e5182c62c4b1f0fc19ac95
gpgreen pushed a commit to gpgreen/harfbuzz that referenced this issue Jan 10, 2024
Print proper cargo:include values

In order to build a Rust crate with C/C++ that uses the HarfBuzz library referenced by the `harfbuzz` crate, I need to know where the header files are. I believe the standard way of doing that is the `links` key/value environment variables. In particular, I'm expecting `DEP_HARFBUZZ_INCLUDE` to tell me the include path for `hb.h` and other header files. Unfortunately, this doesn't seem to be working correctly now. This PR is an attempt to resolve that.

Summary of changes:
* Prints proper values for `cargo:include` in the build script for both the `pkg-config` and vendored cases.
* Changes a use of `to_str().unwrap()` to `PathBuf::display()`. I believe this is better, but I'm happy to be corrected.

As far as adding a useful feature goes, I think this PR is complete. However, there are a couple of potential improvements that would be nice, but I'm not sure how to go about them.

**`not(feature = "build-native-harfbuzz")`**

I'm not exactly sure what disabling the feature `build-native-harfbuzz` does. It doesn't use `pkg-config` and doesn't build the vendored library? Consequently I'm not sure what to print for `cargo:include` here.

**Testing**

It would be nice to test for the presence and correctness of `DEP_HARFBUZZ_INCLUDE`. I used the following files (in the repository's top-level directory for convenience) both with `HARFBUZZ_SYS_NO_PKG_CONFIG=1` and without:

`Cargo.toml`:

```toml
[project]
name = "foo"
version = "0.5.0"
authors = []
build = "build.rs"

[dependencies]
harfbuzz-sys = { path = "harfbuzz-sys" }

[build-dependencies]
cc = "1.0"
```

`build.rs`:

```rust
use std::env;
fn main() {
    let mut cfg = cc::Build::new();
    for path in env::split_paths(&env::var_os("DEP_HARFBUZZ_INCLUDE").unwrap()) {
        cfg.include(path);
    }
    cfg.file("src/test.c").warnings(false).compile("test");
}
```

`src/test.c`:

```c
#include <hb.h>
int test() {
    hb_font_t *hb_font;
    return 0;
}
```

`src/lib.rs`: empty

I'm not sure how to add such a test without simply writing a script and running it in CI. I did a quick search on GitHub and couldn't find any other projects that actually tested `cargo:include` in this way.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-harfbuzz/139)
<!-- Reviewable:end -->
gpgreen pushed a commit to gpgreen/harfbuzz that referenced this issue Jan 10, 2024
Add harfbuzz-sys-test using ctest

This adds a crate to the workspace for testing `harfbuzz-sys` automatically using `ctest`.

I changed a few types in `harfbuzz-sys/src/lib.rs` to make the testing automatic (i.e. to avoid creating special cases). However, I suppose `lib.rs` is generated? If so, this might be problematic if it is regenerated later. The changes seem quite sensible to me, so maybe there's an alternative to tweak the generation to get something similar?

`harfbuzz-sys-test` currently only works with `HARFBUZZ_SYS_NO_PKG_CONFIG=1` because it doesn't get the `DEP_HARFBUZZ_INCLUDE` environment variable otherwise. The `.travis.yml` change reflects that. If harfbuzz#139 or something similar is merged, this special case can be removed, and the test should be run for both `HARFBUZZ_SYS_NO_PKG_CONFIG=1` and without `HARFBUZZ_SYS_NO_PKG_CONFIG`.

There's a `FIXME` in `harfbuzz-sys-test/build.rs` because I don't understand the errors when these functions are not skipped. I would appreciate it if someone else would take a look at that.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-harfbuzz/140)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants