Python::allow_threads() causes segmentation fault #649

Askannz · 2019-10-26T07:31:20Z

🐛 Bug Reports

Accessing some Python objects from within the closure given to Python::allow_threads() causes a segmentation fault. I can reproduce this behavior with PyList, PyTuple, and PyArray from the numpy crate (haven't tested with other objects).

🌍 Environment

Your operating system and version: Archlinux
Your python version: 3.7.4
How did you install python (e.g. apt or pyenv)? Did you use a virtualenv?: Repo package + virtualenv
Your rust version (rustc --version): rustc 1.39.0-nightly (ca3766e2e 2019-09-14)
Are you using the latest pyo3 version? Have you tried using latest master (replace version = "0.x.y" with git = "https://github.com/PyO3/pyo3")? Yes

💥 Reproducing

Rust code:

use pyo3::prelude::*;
use pyo3::types::PyList;
use pyo3::wrap_pyfunction;

#[pymodule]
fn my_module(_py: Python, m: &PyModule) -> PyResult<()> {

    #[pyfunction]
    fn my_function(py: Python, list: &PyList) {
        py.allow_threads(|| {
            println!("{:?}", list);
        });
    }

    m.add_wrapped(wrap_pyfunction!(my_function))?;

    Ok(())
}

Python shell:

>>> from my_module import my_function
>>> my_function([1, 2, 3])
Segmentation fault (core dumped)

The text was updated successfully, but these errors were encountered:

kngwyu · 2019-10-26T13:55:19Z

This function is not intended to execute Python functions in parallel.
Python's GIL stops all threads other than the current interpreter thread, but this function removes this lock.
So this is needed to execute Rust's functions in parallel in Python extension codes.

In your example, you call PyList::debug, which requires Python' GIL and looks problematic.

But...

Maybe I'm just using it wrong, but in this case, it should be marked unsafe and have some additional documentation.

(from your comment in #640)
Yeah, I agree.
We should do something for that.

sebpuetz · 2019-10-26T14:09:09Z

@kngwyu Could you provide an example where this can be used?

My example in #640 demonstrated that it's at least not required to get multi-threading going in pure Rust code called from Python.

birkenfeld · 2019-10-26T14:28:50Z

In your example, you call PyList::debug, which requires Python' GIL and looks problematic.

It seems that implementing the format traits for Py structs is a bad idea, since they internally call Python code without ensuring the GIL being held.

A workaround could be a wrapper including the GIL proof, something like

println!("{:?}", list.format(py));

kngwyu · 2019-10-26T14:40:43Z

@sebpuetz
I use it here but... you need handy one that demonstrates allow_threads really allow threads?
It's a bit difficult to make, let me takes some time.

@birkenfeld

It seems that implementing the format traits for Py structs is a bad idea, since they internally call Python code without ensuring the GIL being held.

That's right, but

println!("{:?}", some_object) is super handy and easy for beginners
It may break lots of code
so..., I can't decide now 🤔

birkenfeld · 2019-10-26T14:44:50Z

Well, simple: if it can segfault or otherwise circumvent Python API requirements from safe code, it has to go. If PyO3 doesn't want to commit to this, it won't be accepted by the Rust community.

Askannz · 2019-10-27T02:32:51Z

println!("{:?}", some_object) is super handy and easy for beginners

Yeah I agree, being able to debug-print Python objects on-the-fly is really useful. It wouldn't be the end of the world if we had to pass the Python marker, though. To ease the pain, maybe it could be done through a custom macro, something like pyprintln!(py, "{:?}", some_object) ? That way you wouldn't be able to call it from allow_threads() because the type bounds on the closure would prevent you from capturing py.

But if there are other functions that access the Python interpreter internally, then we need a more general solution, because they would also blow up when called from within allow_threads(). Either those functions need to be rewritten to take a py argument explicitly, or allow_threads() needs to be marked as unsafe with some additional documentation.

Askannz · 2019-10-27T02:40:35Z

@sebpuetz allow_threads() is useful if you have threads on the Python side that call Rust code, and you want to run them concurrently. By default, they are blocked by the GIL and only run sequentially.

Here's an example :

Rust:

use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

#[pymodule]
fn my_module(_py: Python, m: &PyModule) -> PyResult<()> {

    #[pyfunction]
    fn expensive_function(py: Python, init: usize) -> usize {
        (0..10000000).fold(init, |acc, v| acc + v)
    }

    m.add_wrapped(wrap_pyfunction!(expensive_function))?;

    Ok(())
}

Python :

import time
from concurrent.futures import ThreadPoolExecutor
from my_module import expensive_function

executor = ThreadPoolExecutor(max_workers=2)

t0 = time.time()
future_1 = executor.submit(expensive_function, 100)
future_2 = executor.submit(expensive_function, 200)
result_1 = future_1.result()
result_2 = future_2.result()
t1 = time.time()

print(f"Time: {t1 - t0}s")

In that example, the Python side spawns 2 separate threads to execute the function, but because the GIL stays locked when executing the Rust code, they cannot run in parallel.

Time: 1.030261516571045s

Now if we wrap the execution in allow_threads() :

    #[pyfunction]
    fn expensive_function(py: Python, init: usize) -> usize {
        py.allow_threads(|| {
            (0..10000000).fold(init, |acc, v| acc + v)
        })
    }

Suddenly it's twice as fast :

Time: 0.5150125026702881s

This is because allow_threads() releases the GIL and allows the two Rust functions to run concurrently from two separate Python threads.

The documentation at https://github.com/PyO3/pyo3/blob/master/guide/src/parallelism.md is confusing because it suggests that you need to release the GIL to achieve true parallelism within your Rust code (with your own internal threads), which isn't true. AFAIK you only need it if you have multi-threading on the Python side.

kngwyu · 2019-10-27T06:44:52Z

@birkenfeld
Thank you for your advice. Now I'm declined to remove Debug from Python objects.

@Askannz
It is a really good example, thanks!

it suggests that you need to release the GIL to achieve true parallelism within your Rust code (with your own internal threads), which isn't true. AFAIK you only need it if you have multi-threading on the Python side.

I checked CPython codes.
Our allow_threads calls drop_gil, which calls MUTEX_UNLOCK.
And this macro wraps pthread or Windows API for threading.
Since Rust's thread is system thread(ex. pthread), does this affect Rust's threading? 🤔
Caveats: I'm sorry if I'm saying incorrect things. I'm not an expert in this area.

kngwyu · 2019-10-27T08:34:23Z

Hmm, but if Debug is unsafe almost all ObjectProtocol methods are unsafe 🤔
I think we should rather mark all PyObjects !Send like PyRef.

kngwyu · 2019-10-27T14:55:00Z

I made them unsendable in #655

kngwyu · 2019-10-31T07:14:50Z

Closed via #655

Askannz mentioned this issue Oct 26, 2019

Use of Python::allow_threads #640

Closed

kngwyu added the Unsound label Oct 26, 2019

kngwyu mentioned this issue Oct 31, 2019

Make &Py~ types unsendable #655

Merged

kngwyu closed this as completed Oct 31, 2019

Alexander-N mentioned this issue Jun 5, 2020

User Guide: Rewrite parallelism chapter #959

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python::allow_threads() causes segmentation fault #649

Python::allow_threads() causes segmentation fault #649

Askannz commented Oct 26, 2019 •

edited

Loading

kngwyu commented Oct 26, 2019 •

edited

Loading

sebpuetz commented Oct 26, 2019

birkenfeld commented Oct 26, 2019

kngwyu commented Oct 26, 2019

birkenfeld commented Oct 26, 2019

Askannz commented Oct 27, 2019 •

edited

Loading

Askannz commented Oct 27, 2019 •

edited

Loading

kngwyu commented Oct 27, 2019

kngwyu commented Oct 27, 2019 •

edited

Loading

kngwyu commented Oct 27, 2019

kngwyu commented Oct 31, 2019

Python::allow_threads() causes segmentation fault #649

Python::allow_threads() causes segmentation fault #649

Comments

Askannz commented Oct 26, 2019 • edited Loading

🐛 Bug Reports

🌍 Environment

💥 Reproducing

kngwyu commented Oct 26, 2019 • edited Loading

sebpuetz commented Oct 26, 2019

birkenfeld commented Oct 26, 2019

kngwyu commented Oct 26, 2019

birkenfeld commented Oct 26, 2019

Askannz commented Oct 27, 2019 • edited Loading

Askannz commented Oct 27, 2019 • edited Loading

kngwyu commented Oct 27, 2019

kngwyu commented Oct 27, 2019 • edited Loading

kngwyu commented Oct 27, 2019

kngwyu commented Oct 31, 2019

Askannz commented Oct 26, 2019 •

edited

Loading

kngwyu commented Oct 26, 2019 •

edited

Loading

Askannz commented Oct 27, 2019 •

edited

Loading

Askannz commented Oct 27, 2019 •

edited

Loading

kngwyu commented Oct 27, 2019 •

edited

Loading