Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy 1.11. Segfault: numpy.random.permutation on list of long strings #7710

Closed
asanakoy opened this issue Jun 6, 2016 · 10 comments · Fixed by #7719
Closed

numpy 1.11. Segfault: numpy.random.permutation on list of long strings #7710

asanakoy opened this issue Jun 6, 2016 · 10 comments · Fixed by #7719

Comments

@asanakoy
Copy link

asanakoy commented Jun 6, 2016

I'm getting Segmentation fault when running numpy.random.permutation on list of long strings. With small strings works well.

import numpy as np
a = ['a', 'a' * 100]
z = np.random.permutation(np.array(a))

NumPy version: 1.11.0
Python 2.7
OS: Ubuntu 14.04.4 LTS

GDB trace:

(gdb) run
Starting program: /usr/bin/python test.py

Program received signal SIGSEGV, Segmentation fault.
visit_decref.48915 (op=<unknown at remote 0x6161616161616161>, data=0x0) at ../Modules/gcmodule.c:360
360 ../Modules/gcmodule.c: No such file or directory.
(gdb) bt

0 visit_decref.48915 (op=<unknown at remote 0x6161616161616161>, data=0x0) at ../Modules/gcmodule.c:360

1 0x000000000057392b in dict_traverse.18526 (

op={<unknown at remote 0x6161616161616161>: <unknown at remote 0x6161616161616161>, <unknown at remote 0x6161616161616161>: <unknown at remote 0x6161616161616161>, <unknown at remote 0x6161616161616161>: <unknown at remote 0x61616161>, '__builtins__': {'bytearray': <type at remote 0x910680>, 'IndexError': <type at remote 0x913740>, 'all': <built-in function all>, 'help': <_Helper at remote 0x7ffff7e86210>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x915820>, 'unicode': <type at remote 0x9199c0>, 'UnicodeDecodeError': <type at remote 0x914760>, 'memoryview': <type at remote 0x907e00>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2014 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer...(truncated), visit=0x54eee0 <visit_decref.48915>, 
arg=0x0) at ../Objects/dictobject.c:2113

2 0x0000000000536476 in subtract_refs (containers=0x9186e0 <generations+96>) at ../Modules/gcmodule.c:385

3 collect.49008 (generation=) at ../Modules/gcmodule.c:925

4 0x000000000042749e in PyGC_Collect () at ../Modules/gcmodule.c:1440

5 0x0000000000437d47 in Py_Finalize () at ../Python/pythonrun.c:449

6 0x000000000044f993 in Py_Main (argc=, argv=0x7fffffffdc18) at ../Modules/main.c:665

7 0x00007ffff7818f45 in __libc_start_main (main=0x44f9c2 , argc=2, argv=0x7fffffffdc18, init=,

fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fffffffdc08) at libc-start.c:287

8 0x0000000000578c4e in _start ()

Reproducibility: ~ 50%

@asanakoy asanakoy changed the title Segfault: numpy.random.permutation on list of long strings numpy 1.11. Segfault: numpy.random.permutation on list of long strings Jun 6, 2016
@charris
Copy link
Member

charris commented Jun 6, 2016

Hmm, I don't see this on 64 bit fedora with 16 GiB memory.

In [1]: np.__version__
Out[1]: '1.11.0'

In [2]: a = ['a', 'a' * 100]

In [3]: z = np.random.permutation(np.array(a))

In [4]:

@charris
Copy link
Member

charris commented Jun 6, 2016

What is the 50% reproducibility.

@charris
Copy link
Member

charris commented Jun 6, 2016

And what is the full python version? Python 2.7.11 here.

@asanakoy
Copy link
Author

asanakoy commented Jun 6, 2016

I have 64bit Ubuntu with 32Gb memory.
Python 2.7.6
50% reproducibility means you need to run it at least 10 times. Not always crashing.

And if you run it from command line, then you will see segfault after you quit the command line (ipython for example)

@asanakoy
Copy link
Author

asanakoy commented Jun 6, 2016

With Python 2.7.11 it worked for me too. So I don't know if it's an issue with numpy or with Python.

@charris
Copy link
Member

charris commented Jun 6, 2016

OK, I got a segfault, but very unreliably. Ran once, happened. Ran in loop 1,000,000 times, nada. Not sure what is going on.

@njsmith
Copy link
Member

njsmith commented Jun 6, 2016

The segfault is happening at interpreter shutdown... did you start and then
shutdown python 1,000,000 times?
On Jun 6, 2016 12:42 PM, "Charles Harris" notifications@github.com wrote:

OK, I got a segfault, but very unreliably. Ran once, happened. Ran in loop
1,000,000 times, nada. Not sure what is going on.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7710 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAlOaJilXet0Bg7N91bUumMZ07PILBH6ks5qJHg5gaJpZM4IvMN5
.

@charris
Copy link
Member

charris commented Jun 6, 2016

IIRC, I got a segfault before shutting down.

@njsmith
Copy link
Member

njsmith commented Jun 6, 2016

Clearly some sort of memory/GC corruption so some indeterminism is to be expected, but it also makes sense that interpreter shutdown would be a particularly likely time to hit the corruption, since when shutting down the interpreter tries to tear-down and garbage-collect all objects (and the traceback at the top of this thread shows it being hit during this process in Py_Finalize).

@charris charris mentioned this issue Jun 7, 2016
@simongibbons
Copy link
Contributor

simongibbons commented Jun 9, 2016

Ok so I think I've got this one figured out.

The issue is when shuffle allocates a buffer for switching elements on this line

It will pick up the wrong length of the string in the buffer's dtype

In [2]: a = np.array(['a', 'a' * 100])

In [3]: a.dtype
Out[3]: dtype('<U100')

In [4]: buf = np.empty_like(a[0])

In [5]: buf.dtype
Out[5]: dtype('<U1')

Now when we swap a longer element into that buffer it will end up overflowing, almost certainly overwriting something important which will cause the segfault when the garbage collection is run.

A simple fix for this would be to explicitly set the dtype to be that of the array

In [6]: buf = np.empty_like(a[0], dtype=a.dtype)

In [7]: buf.dtype
Out[7]: dtype('<U100')

@charris charris added this to the 1.11.1 release milestone Jun 9, 2016
simongibbons added a commit to simongibbons/numpy that referenced this issue Jun 10, 2016
np.random.shuffle will allocate a buffer based on the size of the first
element of an array of strings. If the first element is smaller than
another in the array this buffer will overflow, causing a segfault
when garbage is collected.

Additionally if the array contains objects then one would be left
in the buffer and have it's refcount erroniously decrimented on
function exit, causing that object to be deallocated too early.

To fix this we change the buffer to be an array of int8 of the
the size of the array's dtype, which sidesteps both issues.

Fixes numpy#7710
charris pushed a commit to charris/numpy that referenced this issue Jun 10, 2016
np.random.shuffle will allocate a buffer based on the size of the first
element of an array of strings. If the first element is smaller than
another in the array this buffer will overflow, causing a segfault
when garbage is collected.

Additionally if the array contains objects then one would be left
in the buffer and have it's refcount erroniously decrimented on
function exit, causing that object to be deallocated too early.

To fix this we change the buffer to be an array of int8 of the
the size of the array's dtype, which sidesteps both issues.

Fixes numpy#7710
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants