Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python crashes when HarfBuzz.language_from_string() is used #91

Closed
khaledhosny opened this issue Mar 22, 2015 · 17 comments
Closed

Python crashes when HarfBuzz.language_from_string() is used #91

khaledhosny opened this issue Mar 22, 2015 · 17 comments

Comments

@khaledhosny
Copy link
Collaborator

After #90 I can get most of the *_from_string() function to work from Python, except language_from_string() which cases a “double free or corruption” crash. Running under valgrind shows the following:

==27049== Invalid free() / delete / delete[] / realloc()
==27049==    at 0x4C2B200: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27049==    by 0x770363A: g_boxed_free (in /usr/lib/libgobject-2.0.so.0.4200.2)
==27049==    by 0x7099C5F: boxed_del (pygi-boxed.c:45)
==27049==    by 0x4E808C2: PyObject_Call (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F11BB6: PyEval_CallObjectWithKeywords (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ED6C2B: slot_tp_del (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ED1C89: subtype_dealloc (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ECFED9: tupledealloc (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F15C30: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F18AEF: PyEval_EvalCodeEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F16FE3: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F170E9: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)
==27049==  Address 0x9f18ce0 is 0 bytes inside a block of size 3 free'd
==27049==    at 0x4C2B200: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27049==    by 0x770363A: g_boxed_free (in /usr/lib/libgobject-2.0.so.0.4200.2)
==27049==    by 0x7099C5F: boxed_del (pygi-boxed.c:45)
==27049==    by 0x4E808C2: PyObject_Call (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F11BB6: PyEval_CallObjectWithKeywords (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ED6C2B: slot_tp_del (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ED1C89: subtype_dealloc (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ECFED9: tupledealloc (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F15C30: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F18AEF: PyEval_EvalCodeEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F16FE3: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F170E9: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)

and

==27049== Invalid free() / delete / delete[] / realloc()
==27049==    at 0x4C2B200: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27049==    by 0xA22AADB: finish (hb-common.cc:229)
==27049==    by 0xA22AADB: free_langs() (hb-common.cc:243)
==27049==    by 0x5455DB1: __run_exit_handlers (in /usr/lib/libc-2.21.so)
==27049==    by 0x5455E04: exit (in /usr/lib/libc-2.21.so)
==27049==    by 0x5440806: (below main) (in /usr/lib/libc-2.21.so)
==27049==  Address 0x9f1e7b0 is 0 bytes inside a block of size 3 free'd
==27049==    at 0x4C2B200: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27049==    by 0x770363A: g_boxed_free (in /usr/lib/libgobject-2.0.so.0.4200.2)
==27049==    by 0x7099C5F: boxed_del (pygi-boxed.c:45)
==27049==    by 0x4E808C2: PyObject_Call (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F11BB6: PyEval_CallObjectWithKeywords (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ED6C2B: slot_tp_del (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ED1C89: subtype_dealloc (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4ECFED9: tupledealloc (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F15C30: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F18AEF: PyEval_EvalCodeEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F16FE3: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)
==27049==    by 0x4F170E9: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0)

I can not really tell what is going on, but it looks like a mismatch between the way the string is allocated and the use of free() here, or a double free. May be there is a magic Introspection keyword to fix this but I don’t know what is it (I tried (transfer full) but it made no difference).

@behdad
Copy link
Member

behdad commented Mar 22, 2015

I think this is about Python trying to free(?) the return value of that function perhaps. I'll take a look.

@khaledhosny
Copy link
Collaborator Author

I can avoid this by caching the languages so that I don’t call language_from_string() for the same language more than once. I can now reproduce this reliably with a simple:

from gi.repository import HarfBuzz
HarfBuzz.language_from_string("ar")
HarfBuzz.language_from_string("ar")

@khaledhosny
Copy link
Collaborator Author

It seems that just calling it more than once will cause the crash:

from gi.repository import HarfBuzz
HarfBuzz.language_from_string("ar")
HarfBuzz.language_from_string("en")

@khaledhosny
Copy link
Collaborator Author

I think this is about Python trying to free(?) the return value of that function perhaps.

I finally understood what this comment means, and indeed it is the case. Adding (transfer none) to the return value fixes the crash, but now the shaping results are randomly different, investigating…

@khaledhosny
Copy link
Collaborator Author

Opened #99 for the fixes I found so far. The random shaping issue seems to be a bug in language_reference from 17905c5 which seems to be doing some bad memory access. With that commit valgrind gives me:

==9647== Invalid read of size 8
==9647==    at 0x9E792C8: language_reference(hb_language_impl_t const**) (hb-gobject-structs.cc:116)
==9647==    by 0x722E4EA: g_boxed_copy (in /usr/lib/libgobject-2.0.so.0.4200.2)
==9647==    by 0x6BC4E14: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==9647==    by 0x6BD0531: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==9647==    by 0x6BC7F89: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==9647==    by 0x6BC93D7: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==9647==    by 0x4E98527: PyObject_Call (abstract.c:2040)
==9647==    by 0x4F4A4E9: do_call (ceval.c:4466)
==9647==    by 0x4F4A4E9: call_function (ceval.c:4264)
==9647==    by 0x4F4A4E9: PyEval_EvalFrameEx (ceval.c:2838)
==9647==    by 0x4F4D946: PyEval_EvalCodeEx (ceval.c:3588)
==9647==    by 0x4F4D9EA: PyEval_EvalCode (ceval.c:775)
==9647==    by 0x4F697D3: run_mod (pythonrun.c:2180)
==9647==    by 0x4F6B944: PyRun_FileExFlags (pythonrun.c:2133)
==9647==  Address 0x9066f50 is 0 bytes inside a block of size 3 alloc'd
==9647==    at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9647==    by 0x5566F39: strdup (in /usr/lib/libc-2.21.so)
==9647==    by 0x9304DF8: operator= (hb-common.cc:222)
==9647==    by 0x9304DF8: lang_find_or_insert (hb-common.cc:265)
==9647==    by 0x9304DF8: hb_language_from_string (hb-common.cc:310)
==9647==    by 0x77851EF: ffi_call_unix64 (in /usr/lib/libffi.so.6.0.4)
==9647==    by 0x7784C57: ffi_call (in /usr/lib/libffi.so.6.0.4)
==9647==    by 0x6BC7AD3: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==9647==    by 0x6BC93D7: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==9647==    by 0x4E98527: PyObject_Call (abstract.c:2040)
==9647==    by 0x4F4A4E9: do_call (ceval.c:4466)
==9647==    by 0x4F4A4E9: call_function (ceval.c:4264)
==9647==    by 0x4F4A4E9: PyEval_EvalFrameEx (ceval.c:2838)
==9647==    by 0x4F4D946: PyEval_EvalCodeEx (ceval.c:3588)
==9647==    by 0x4F4D9EA: PyEval_EvalCode (ceval.c:775)
==9647==    by 0x4F697D3: run_mod (pythonrun.c:2180)

With snippet like:

from gi.repository import HarfBuzz
buf = HarfBuzz.buffer_create()
HarfBuzz.buffer_set_language(buf, HarfBuzz.language_from_string(b'ar'))
HarfBuzz.buffer_get_language(buf)
HarfBuzz.buffer_get_language(buf)
HarfBuzz.buffer_set_language(buf, HarfBuzz.language_from_string(b'ar'))

If I revert that commit all is fine (but I get the warning it was working around of course).

@behdad
Copy link
Member

behdad commented Apr 11, 2015

Yeah thinking about it I believe I know what's wrong. Let me try to digest it all and fix it. Thanks!

@behdad
Copy link
Member

behdad commented May 19, 2015

Actually this started working too. I guess my stack was bad. This is working fine with Ubuntu 14.04 now.

@behdad
Copy link
Member

behdad commented May 19, 2015

Hum.. No idea what was broken before, but I reviewed and tested the code, I think it's fine. Can you please confirm?

@behdad behdad closed this as completed May 19, 2015
@khaledhosny
Copy link
Collaborator Author

It stopped crashing here as well, but I still see this in valgrind output:

==29403== Invalid read of size 8
==29403==    at 0x9CB4987: language_reference(hb_language_impl_t const**) (hb-gobject-structs.cc:116)
==29403==    by 0x722E64A: g_boxed_copy (in /usr/lib/libgobject-2.0.so.0.4400.0)
==29403==    by 0x6BC4E64: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==29403==    by 0x6BD06E1: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==29403==    by 0x6BC801B: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==29403==    by 0x6BC9537: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==29403==    by 0x4E98527: PyObject_Call (abstract.c:2040)
==29403==    by 0x4F4A4E9: do_call (ceval.c:4466)
==29403==    by 0x4F4A4E9: call_function (ceval.c:4264)
==29403==    by 0x4F4A4E9: PyEval_EvalFrameEx (ceval.c:2838)
==29403==    by 0x4F4D946: PyEval_EvalCodeEx (ceval.c:3588)
==29403==    by 0x4F4D9EA: PyEval_EvalCode (ceval.c:775)
==29403==    by 0x4F697D3: run_mod (pythonrun.c:2180)
==29403==    by 0x4F6B944: PyRun_FileExFlags (pythonrun.c:2133)
==29403==  Address 0x90b3f90 is 0 bytes inside a block of size 3 alloc'd
==29403==    at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29403==    by 0x5566F39: strdup (in /usr/lib/libc-2.21.so)
==29403==    by 0x9311D89: hb_language_item_t::operator=(char const*) (hb-common.cc:222)
==29403==    by 0x93117A2: lang_find_or_insert(char const*) (hb-common.cc:265)
==29403==    by 0x9311881: hb_language_from_string (hb-common.cc:310)
==29403==    by 0x77851EF: ffi_call_unix64 (in /usr/lib/libffi.so.6.0.4)
==29403==    by 0x7784C57: ffi_call (in /usr/lib/libffi.so.6.0.4)
==29403==    by 0x6BC7B53: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==29403==    by 0x6BC9537: ??? (in /usr/lib/python3.4/site-packages/gi/_gi.cpython-34m.so)
==29403==    by 0x4E98527: PyObject_Call (abstract.c:2040)
==29403==    by 0x4F4A4E9: do_call (ceval.c:4466)
==29403==    by 0x4F4A4E9: call_function (ceval.c:4264)
==29403==    by 0x4F4A4E9: PyEval_EvalFrameEx (ceval.c:2838)
==29403==    by 0x4F4D946: PyEval_EvalCodeEx (ceval.c:3588)

@khaledhosny
Copy link
Collaborator Author

I’m also still randomly seeing that setting the buffer language making no effect if I keep calling language_from_string for the same language, but it works fine I cache the result so that I call it only once for each language.

@behdad
Copy link
Member

behdad commented May 19, 2015

humm. I definitely see the valgrind error and looks bad. debugging. I don't understand why that's happening.

@behdad
Copy link
Member

behdad commented May 19, 2015

I have a guess that g-i changed how it treats types that are typedef'ed to pointers...

behdad added a commit that referenced this issue May 20, 2015
Using latest gobject-introspection, I don't seem to be having this
problem anymore:

  https://bugzilla.gnome.org/show_bug.cgi?id=707656

Removing that kludge makes language_t behave more like the way I expect it
in Python.

Also fixes:
#91
@behdad
Copy link
Member

behdad commented May 20, 2015

I'm sure g-i made changes. I removed the kludge I had added for language_t and it fixed things... Even made it better.

@khaledhosny
Copy link
Collaborator Author

It seems to work fine now, though g-i is now giving the wold warning:

src/hb-common.cc:331: Warning: HarfBuzz: hb_language_get_default: return value: Invalid non-constant return of bare structure or union; register as boxed type or (skip)
src/hb-ot-tag.h:54: Warning: HarfBuzz: hb_ot_tag_to_language: return value: Invalid non-constant return of bare structure or union; register as boxed type or (skip)

@behdad
Copy link
Member

behdad commented Jun 13, 2015

So we are back to why I added the boxed type to begin with.

So, g-i handles hb_buffer_get_language() just fine, but not hb_language_get_default(). Added (transfer none) and those are happy now.

@khaledhosny
Copy link
Collaborator Author

Nice, I think we need a release now :)

@behdad
Copy link
Member

behdad commented Jun 18, 2015

Will be out in a few minutes.

gpgreen pushed a commit to gpgreen/harfbuzz that referenced this issue Jan 10, 2024
[breaking change] Use ::std::os::raw instead of libc.

This is a breaking change.

Fixes issue harfbuzz#88.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-harfbuzz/91)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants