Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults when iterating a PyDict with > 262141 keys #159

Closed
fdoyon opened this issue May 14, 2018 · 19 comments
Closed

Segfaults when iterating a PyDict with > 262141 keys #159

fdoyon opened this issue May 14, 2018 · 19 comments
Labels

Comments

@fdoyon
Copy link

fdoyon commented May 14, 2018

I think the number of keys is irrelevant but more to do with the object size.

Create a dict with 127 keys and values, rust iteration segfaults by passing a null dict (po) to Pydict_Next on Linux. Will send a repro tomorrow. I am on mobile.

@konstin konstin added the bug label May 14, 2018
@fdoyon
Copy link
Author

fdoyon commented May 15, 2018

I have created a minimum repro : https://github.com/palad1/pyo3/commit/d0ec218ed8bf0a30cfd7f58cac9ed27e3fa2a6e2

I will try to isolate further on https://github.com/palad1/pyo3/tree/master/examples/test_dict - let me know if you'd rather have me create a PR so the repro is in the main pyo3 tree.

Thanks for your help.

@fdoyon
Copy link
Author

fdoyon commented May 15, 2018

Here's the gdb trace (vs python-3.5.2) form my system.


Program received signal SIGSEGV, Segmentation fault.
dict_next (op=0x0, i=127, pvalue=0x7fffffffc730) at Objects/dictobject.c:1373
1373        if (!PyDict_Check(op))
(gdb) bt full
#0  dict_next (op=0x0, i=127, pvalue=0x7fffffffc730) at Objects/dictobject.c:1373
        mask = <optimized out>
        offset = <optimized out>
        mp = 0x0
        value_ptr = <optimized out>
#1  0x00007ffff7a02b0f in PyDict_Next (op=<optimized out>, ppos=<optimized out>, pkey=<optimized out>, pvalue=<optimized out>) at Objects/dictobject.c:1417
        mp = <optimized out>
        i = <optimized out>
#2  0x00007ffff661111f in _$LT$pyo3..objects..dict..PyDictIterator$LT$$u27$a$GT$$u20$as$u20$core..iter..iterator..Iterator$GT$::next::hab5b6f796a59ea1d (self=0x7fffffffc850) at /home/fdoyon/src/pyo3/pyo3/src/objects/dict.rs:143
        value = 0x74c7b0
        key = 0x74c7b0
#3  0x00007ffff66147a2 in test_dict::dict_size::DictSize::iter_dict::hc8fc4e864e3aacf4 (self=0x7ffff6c762b0, _py=..., dict=0x74a790) at src/dict_size.rs:10
        __next = {__0 = 0x74af78, __1 = 0x74c7b0}
        iter = {dict = 0x74a790, pos = 127}
        seen = 127
#4  0x00007ffff660eb14 in test_dict::dict_size::_IMPL_PYO3_METHODS_DictSize::_$LT$impl$u20$pyo3..class..methods..PyMethodsProtocolImpl$u20$for$u20$test_dict..dict_size..DictSize$GT$::py_methods::METHODS::__wrap::_$u7b$$u7b$closure$u7d$$u7d$::h139fba46d588d416 (arg1=0x74a790) at src/dict_size.rs:10
        _slf = 0x7ffff6c762b0
        _py = {__0 = {<No data fields>}}

@fdoyon
Copy link
Author

fdoyon commented May 17, 2018

Only happens on linux - OS X and Window are not affected

@fdoyon
Copy link
Author

fdoyon commented Nov 12, 2018

This seems to be an issue with pointers being moved when the POOL.borrowed vector grows. In my test, the PyDictIterator.dict field is zeroed when exiting py.from_borrowed_prt(key) grows the POOL.borrowed vector. (See the above linked repro).

Apologies for the formatting. I am on a. Mobile

@fdoyon
Copy link
Author

fdoyon commented Nov 12, 2018

I believe there is a soundness issue with the borrowed pool. What is happening in my test case is the following:

  • A python scripts create a dictionary with 128 entries
  • A Python script calls a rust module passing a PyDict* as a parameter
  • The PyDict* pointer is added to the POOL.borrowed pointers vec (size = 1,capacity=256)
  • The rust code iterates the dictionary. Each iteration puts both key/value in the pool (see dict.rs )
  • each iteration adds 2 pointers to the borrowed pool
  • on the 127th iteration, the pool will have to grow from 255 to 257 entries
  • growing the Vec<*mut ffi::PyObject> will trigger a call to libc's void* realloc(void* ptr,size_t size)
  • depending on the environment and memory layout, there is a chance that the return of realloc will be different than ptr, that is if the memory block had to be moved to a new segment with enough free space available*
  • as per man 3 realloc - doing so calls free on the old pointer
  • as the PyDict* value stored in the PyDictIterator.dict field is a reference to this very pointer, it is now pointing to the old location. On my linux systems, the page has been zeroed.
  • The next call to iter() will trigger a segfault when python's _PyDict_Next(PyObj* op...) dereference op / the invalid moved pointer.

You can see the smoking gun in GDB by setting a watchpoint to *self.dict in PyDictIterator::next - realloc clears-out the memory.

TL;DR: We need to stop sending references to pointers contained in the POOL.borrow vector as some OSes will free/zero the old memory block.

Repro test case and forked repo: https://github.com/fdoyon/pyo3/tree/master/examples/test_dict

@konstin, @fafhrd91 - this looks like a big change in the way the pool and pointers are being handed-around...

@kngwyu
Copy link
Member

kngwyu commented Nov 12, 2018

Thanks for tracking this problem!
But this still looks a bit hard to solve...

@konstin
Copy link
Member

konstin commented Nov 12, 2018

I won't get into fixing this myself, but I'm happy to review and help with a pull request tackling this.

@fdoyon
Copy link
Author

fdoyon commented Nov 12, 2018

I agree, this looks like a real hard soundness issue. Fixing this might just entail curtailing the use of pointers in the library to a bare minimum... I will give it a shot now that I have found a way to constantly reproduce the issue.

@kngwyu
Copy link
Member

kngwyu commented Nov 12, 2018

Ah but I think just

pub struct PyDictIterator<'py> {
    dict: PyObject,
    pos: isize,
    __marker: PhantomData<'py ()>,
}

works fine, though I haven't test it yet.

@fdoyon
Copy link
Author

fdoyon commented Nov 12, 2018

Thanks @kngwyu, but this is not working as this is an issue with PyObject.0 being zeroed-out by the C memory allocator, not the iterator being dropped, it's still there and valid.

The root of the problem is that Pyo3 is sharing pointers from a block of memory that is moved by the C allocator.

Smoking gun: (with phantomdata added)
image
This is a screenshot of CLion doing a remote debug on a linux host using gdb-remote

@kngwyu
Copy link
Member

kngwyu commented Nov 12, 2018

I think it works since PyObject.0 is on a python heap and managed by python GC.
Made a PR #268

@kngwyu
Copy link
Member

kngwyu commented Nov 13, 2018

I really can't understand why this bug only happens in extension.
Any idea?

kngwyu added a commit that referenced this issue Nov 13, 2018
Fix PyDictIterator's segfault(for #159)
@yamadapc
Copy link

@kngwyu I believe this is still broken even with these changes. You just need a larger dictionary.

@konstin
Copy link
Member

konstin commented Nov 13, 2018

Yes, I guess this has moved to #271

@kngwyu
Copy link
Member

kngwyu commented Nov 13, 2018

@yamadapc
Thanks for reporting!

262141/1619200 iterations:262140=>262140
262142/1619200 iterations:262141=>262141
fish: 'python' terminated by signal SIGSEGV (Address boundary error)

Hmm it looks 262141 is the limit size in my PC 🤔 (EDITED: with Python 3.7+Arch Linux)

@kngwyu
Copy link
Member

kngwyu commented Nov 13, 2018

I tried to debug with python 3.8 debug build in my PC, but amazingly it works in python 3.8 😭
So I have really no idea now, but it looks a different problem from #271

@kngwyu kngwyu reopened this Nov 13, 2018
@fdoyon
Copy link
Author

fdoyon commented Nov 13, 2018

I disagree, the issue is occurring when the Vec:<* mut ffi::PyObject> borrowed needs to reallocate its internal buffer to a different memory segment during a call to ReleasePool::register_borrowed - internally RawVec::reserve_internal will call realloc which will try to expand the reserved memory segment starting at the array pointer. If there is not enough room in the current memory segment containing ptr, it will locate or create a new memory segment, copy the contents of ptr+old_len to it, free(ptr), then return the newly-copied ptr.

You can see the manpage here:
https://manpage.me/index.cgi?q=realloc&apropos=0&sektion=3&manpath=FreeBSD+12-CURRENT+and+Ports&arch=default&format=html

The realloc() function changes the size of the previously allocated
memory referenced by ptr to size bytes. The contents of the memory are
unchanged up to the lesser of the new and old sizes. If the new size is
larger, the contents of the newly allocated portion of the memory are
undefined. Upon success, the memory referenced by ptr is freed and a
pointer to the newly allocated memory is returned. Note that realloc()
may move the memory allocation, resulting in a different return value
than ptr. If ptr is NULL, the realloc() function behaves identically to
malloc() for the specified size.

If your system doesn't crash with Python 3.8 that is because the memory layout of your process is different and for some reason you have access to a larger segment. Also some oses do not memzero the old ptr after freeing it. I am testing on Centos7.2 and can reproduce the issue.

@kngwyu
Copy link
Member

kngwyu commented Nov 13, 2018

See #268 (comment) for the reason why I think it's not related to ReleasePool.
Now PyDictIter doesn't have a pointer to ReleasePool.

@kngwyu kngwyu changed the title Segfaults when iterating a PyDict with > 127 keys Segfaults when iterating a PyDict with > 262141 keys Nov 14, 2018
@konstin
Copy link
Member

konstin commented Nov 24, 2018

Thanks to the the investigation done here, this is fixed in 0.5.1 🎉

@konstin konstin closed this as completed Nov 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants