Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in @3.0.2 #29

Closed
roll opened this issue Apr 22, 2020 · 6 comments
Closed

Segmentation fault in @3.0.2 #29

roll opened this issue Apr 22, 2020 · 6 comments
Labels
bug Something isn't working

Comments

@roll
Copy link

roll commented Apr 22, 2020

Hi,

We've run into https://travis-ci.org/github/frictionlessdata/tabulator-py/jobs/678168760 on

items = ijson.items(self.__bytes, path)
for row_number, item in enumerate(items, start=1):

which happened on @3.0.2 but works with @3.0.1

@bz2
Copy link

bz2 commented Apr 22, 2020

Example backtrace:

0x00007ffff6278500 in ?? ()
#1  0x00007fffe379af53 in parse_basecoro_send_impl (
    self=<_yajl2.parse_basecoro at remote 0x7fffe3bd2870>, 
    event=event@entry='start_array', value=value@entry=None)
    at ijson/backends/yajl2_c/parse_basecoro.c:102
#2  0x00007fffe379899b in add_event_and_value (val=<optimized out>, 
    evt_name='start_array', ctx=0x7fffe3bd2870)
    at ijson/backends/yajl2_c/basic_parse_basecoro.c:25
#3  start_array (ctx=0x7fffe3bd2870)
    at ijson/backends/yajl2_c/basic_parse_basecoro.c:112
#4  0x00007fffe358e317 in yajl_do_parse ()
   from /home/jenkins/.local/lib/python3.7/site-packages/ijson/backends/../../ijson.libs/libyajl-da6a00d5.so.2.1.0
#5  0x00007fffe37990e5 in ijson_yajl_parse (handle=0x1fd5de0, 
    buffer=0x2156e10 "[{\"A\": \"A\", \"B\": \"B\"}, {\"A\": \"1\", \"B\": \"foo\"}]", length=length@entry=46)
    at ijson/backends/yajl2_c/basic_parse_basecoro.c:133
#6  0x00007fffe379b2e1 in reading_generator_next (self=0x7fffe4689900)
    at ijson/backends/yajl2_c/reading_generator.c:88
#7  0x00000000004db1f3 in enum_next (en=0x7fffe4662b40)

So, unsurprisingly, blame on the line added in 80da1dc

@rtobar
Copy link

rtobar commented Apr 23, 2020

Thanks for the report @roll and for the details @bz2. This is indeed caused by the (buggy itself) fix for #28, which in hindsight is obviously wrong (isn't it always?). Also, despite we having 200+ unit tests and 99+% coverage this wasn't caught by our tests, so one can never be too prepared...

The problem should only appear when the top-level JSON structure is an array. I'll add tests to confirm I can duplicate the problem, and then will fix it, which will be easy. Expect a fix and a new release coming up shortly.

@rtobar rtobar added the bug Something isn't working label Apr 23, 2020
rtobar added a commit that referenced this issue Apr 23, 2020
This problem was caused by calling Py_DECREF() more than once on the
static, global PyObject* "item" variable that we use to keep the "item"
string literal in memory. The situation only arouse when the top-level
JSON value was an array, and therefore did not hit all users
necessarily.

More than one of our tests use arrays as their top-level JSON values
that went through parse_basecoro, so in principle we should have hit the
bug, but it still went unnoticed. This might have been just luck, with
the python memory allocator playing in our favor and placing valid
objects back into the memory previously used by the "item" variable
before we decreased its reference count again. In order to reproduce the
error more reliably more tests need to be added to try and hit the
problem with a higher chance.

This problem was originally reported in #29, and was a direct
consequence of the fix introduced to fix #28.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
rtobar added a commit that referenced this issue Apr 23, 2020
These new tests have higher chances of identifying issues like that
exposed in #29, and in fact they helped reproducing the problem and
finding the solution.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
@rtobar
Copy link

rtobar commented Apr 23, 2020

A fix is on the master branch, and tests reproducing the original problem have been added to ensure this particular issue doesn't crop up again. @roll / @bz2 could you confirm this is now fixed? After receiving confirmation I'll be releasing ijson 3.0.3.

@roll
Copy link
Author

roll commented Apr 23, 2020

Hi @rtobar,

Thanks! tabulator passes its testes with master 👍

@rtobar
Copy link

rtobar commented Apr 23, 2020

ijson 3.0.3 is now available in PyPI.

@rtobar rtobar closed this as completed Apr 23, 2020
@roll
Copy link
Author

roll commented Apr 23, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants