Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Rust initialisation of Python-allocated bytes #1074

Merged
merged 8 commits into from
Aug 3, 2020
Merged

Added Rust initialisation of Python-allocated bytes #1074

merged 8 commits into from
Aug 3, 2020

Conversation

juntyr
Copy link
Contributor

@juntyr juntyr commented Aug 1, 2020

Adds new_with<F: Fn(&mut [u8])>(py: Python<'_>, len: usize, init: F) -> &PyBytes as suggested by @davidhewitt in #617. This allows initialising new PyBytes in Rust like such:

let py_bytes = PyBytes::new_with(py, 10, |b: &mut [u8]| {
    b.copy_from_slice(b"Hello Rust");
});

Currently, it follows the semantics of PyBytes::new in that it panics if a memory error occurs in Python. Maybe my implementation of that could be improved.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 1, 2020

I have also implemented a more specialised variant

new_with_and_truncate<F: Fn(&mut [u8]) -> usize>(
    py: Python<'_>,
    len: usize,
    init_and_truncate: F,
) -> &PyBytes

where the closure returns the truncated length of the final PyBytes. I think this might be too specific of a use case for this PR, but I can add it as well if you would be interested. This change can be viewed at master...MoritzLangenstein:py_bytes_new_with_and_truncate.

@kngwyu
Copy link
Member

kngwyu commented Aug 1, 2020

With this function, we can get uninitialized PyBytes by

fn do_nothing(_bytes: &mut [u8]) {}
let py_bytes = PyBytes::new_with(py, 10, do_nothing);

So I think simply we should expose unsafe fn uninitialized or so

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for this, looks great to me! A few general points:

  • Can you check if any other objects should support the same API? For example maybe PyString::new_with or PyTuple::new_with. I haven't thought about this too hard, so it might not make sense for them. If it does, would be great to support a uniform API across the whole library!
  • Regarding the truncate form, if you have examples of uses which are common enough then I'd be happy to consider it.
  • Could you please also add a CHANGELOG entry? Something like "Add PyBytes::new_with is sufficient.

@@ -26,6 +26,21 @@ impl PyBytes {
unsafe { py.from_owned_ptr(ffi::PyBytes_FromStringAndSize(ptr, len)) }
}

/// Creates a new Python bytestring object.
/// The uninitialised bytestring must be initialised in the `init` closure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe specifically say the entire uninitialised bytestring?

Also, if you're up for it, adding a little example would be very much appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where and in what format should I add an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added examples in 533c105

/// Panics if out of memory.
pub fn new_with<F: Fn(&mut [u8])>(py: Python<'_>, len: usize, init: F) -> &PyBytes {
let length = len as ffi::Py_ssize_t;
let pyptr = unsafe { ffi::PyBytes_FromStringAndSize(std::ptr::null(), length) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this and the next 4 lines all contain unsafe, could perhaps just refactor to be a single unsafe block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 20d27ff

@davidhewitt
Copy link
Member

With this function, we can get uninitialized PyBytes

Ah, this is a very good point. Is it actually UB if we allow that? Might need to consult cpython docs further. What if we make this function unsafe, or alternatively always zero-initialise the bytes object before calling init?

@juntyr
Copy link
Contributor Author

juntyr commented Aug 1, 2020

With this function, we can get uninitialized PyBytes by

fn do_nothing(_bytes: &mut [u8]) {}
let py_bytes = PyBytes::new_with(py, 10, do_nothing);

So I think simply we should expose unsafe fn uninitialized or so

Would it be possible to use MaybeUninit here? I’ve never tried wrapping an existing value in it (and it will unfortunately take until tonight before I get back to my computer), but maybe we could pass the bytes explicitly uninitialised into the closure and return an initialised slice with the same lifetime from the closure? One problem I could see with that, though, would be that you could then return subslices ... On second thought, default-initialising the bytes or making the function unsafe might be easier.

@kngwyu
Copy link
Member

kngwyu commented Aug 1, 2020

Would it be possible to use MaybeUninit here?

It's basically union {(), T} so not applicable here directly.

On second thought, default-initialising the bytes or making the function unsafe might be easier.
What if we make this function unsafe, or alternatively always zero-initialise the bytes object before calling init?

Both of them are OK, but maybe zero-initialization is better?

@juntyr
Copy link
Contributor Author

juntyr commented Aug 1, 2020

Ok, let’s go with zero-initialisation then? In that case, should we explicitly add support for a zero- and / or an unsafe uninitialised constructor (neither of which would require a closure)? Personally, I can’t think of a use-case for that in Rust code right now, but maybe you might have some ideas.

@davidhewitt
Copy link
Member

How about new_with zero-initializes first, and unsafe new_with_uninit which has a # Safety note pointing out two things:

  • the slice should not be read, because it's uninitialised
  • the entire slice must be written, so that code which follows does not read uninitialised memory

@davidhewitt
Copy link
Member

I'm also happy only exposing the safe API if it's simpler. An extra write to the whole array can't be that bad, can it? :)

@juntyr
Copy link
Contributor Author

juntyr commented Aug 1, 2020

  • Regarding the truncate form, if you have examples of uses which are common enough then I'd be happy to consider it.

I initially came across #617 after having implemented pickling for my wrapper class as suggested in https://gist.github.com/ethanhs/fd4123487974c91c7e5960acc9aa2a77. The data structure I am serialising to bytes is large enough that the copying from Vec to PyBytes can lead to OutOfMemory errors. Therefore, I have a small compacting algorithm which reduces the size of the serde + bincode output. For this algorithm I have an allocation upper bound, which allows me to do the compaction in place. Afterwards, I can then usually truncate the buffer. For me, pushing the allocation of the buffer to Python and being able to truncate it after its initialisation would make the entire serialisation zero-copy. This is probably a very specific use-case.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 1, 2020

* Can you check if any other objects should support the same API? For example maybe `PyString::new_with` or `PyTuple::new_with`. I haven't thought about this too hard, so it might not make sense for them. If it does, would be great to support a uniform API across the whole library!

@davidhewitt I have looked through the native types and think that only str, bytes and bytearray should share this API, as only they have the Py..._FromStringAndSize constructor methods (tuples unfortunately do not, see https://docs.python.org/3/c-api/tuple.html).

For the PyString, there still exists PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size) but setting u to NULL
has been deprecated in favour of PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar). However, to use the latter we would have to know the largest char value, i.e. does it fit into 1-4 bytes, up front. How should we go about this - should we just expose maxchar as a required parameter as well?

@davidhewitt
Copy link
Member

Many thanks, will give this all a thorough read through in the morning!

For the example, you can just add an # Example section to the docstring of the method in question. e.g. like this:

/// # Example

@kngwyu
Copy link
Member

kngwyu commented Aug 2, 2020

For the PyString, there still exists PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size) but setting u to NULL
has been deprecated in favour of PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar). However, to use the latter we would have to know the largest char value, i.e. does it fit into 1-4 bytes, up front. How should we go about this - should we just expose maxchar as a required parameter as well?

I don't think we need PyString::new_with

@programmerjake
Copy link
Contributor

Technically, even making a &mut [u8] that points to uninitialized bytes is potentially undefined behavior even if you don't read from it (assuming Rust decides that uninitialized integers are undefined behavior, they still haven't decided but it's better to be safe than sorry).

I think you may want to use either *mut u8 or &mut [MaybeUninit<u8>], since they are specifically designed to handle pointing to uninitialized data.

}
#[cfg(not(feature = "slice_fill"))]
{
slice.iter_mut().for_each(|x| *x = 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optimized by the compiler and not so slow.
I don't see any clear reason we should use (unstable) fill here. It's actually for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 533c105

/// follows does not read uninitialised memory.
///
/// Panics if out of memory.
pub unsafe fn new_with_uninit<F: Fn(&mut [u8])>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, the closure should take &mut [MaybeUninit<u8>] as @programmerjake says.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 533c105

/// Creates a new Python bytestring object.
/// The `init` closure can initialise the bytestring.
///
/// # Safety
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need #Safety document for safe functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 533c105

@kngwyu
Copy link
Member

kngwyu commented Aug 2, 2020

Thank you for the change. Let's make it simpler and add #Example as David says.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 2, 2020

I have attempted to implement all of your suggestions. One everyone is satisfied with the current API, I would then implement it for PyByteArray as well.

I don't think we need PyString::new_with

PyString would be the only other remaining candidate I could think of, but it probably does not usually transport as much data around as PyBytes. I'll follow your recommendations here.

@kngwyu
Copy link
Member

kngwyu commented Aug 2, 2020

Why do we need new_with_uninit?

@juntyr
Copy link
Contributor Author

juntyr commented Aug 2, 2020

On an unrelated note, could you mention cargo clippy --tests and cargo test --doc (and any other checks made by your CI pipeline) in Contributing.md? Personally, I was, unfortunately, not aware that I had to invoke these explicitly in order to catch all the errors your pipeline does.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 2, 2020

Why do we need new_with_uninit?

We could also remove it entirely - I just included it so we could see what an interface using MaybeUninit would look like.

@kngwyu
Copy link
Member

kngwyu commented Aug 2, 2020

On an unrelated note, could you mention cargo clippy --tests and cargo test --doc (and any other checks made by your CI pipeline) in Contributing.md?

Agreed that clippy --tests should be mentioned, but I think cargo test runs all doc tests by default

We could also remove it entirely - I just included it so we could see what an interface using MaybeUninit would look like.

If you don't have any use case, I think it's better to remove it.

@davidhewitt
Copy link
Member

could you mention cargo clippy --tests and cargo test --doc (and any other checks made by your CI pipeline) in Contributing.md?

Good suggestion, see #1075 - I also plan to extend CONTRIBUTING.md later today.

We could also remove it entirely - I just included it so we could see what an interface using MaybeUninit would look like.

Yeah I was thinking when it was essentially new_with but without the zero-intialisation it made more sense. Now, I think most users will choose to use new_with anyway. OTOH, it shows how complicated it is to get this API well-defined, so maybe it still serves some use? I don't have a strong opinion either way.

/// ```
pub fn new_with<F: Fn(&mut [u8])>(py: Python<'_>, len: usize, init: F) -> &PyBytes {
let (slice, pybytes) = unsafe { Self::new_with_uninit_impl(py, len) };
slice.iter_mut().for_each(|x| *x = MaybeUninit::zeroed());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you choose to remove new_with_uninit, you may find std::ptr::write_bytes a nice alternative to making a MaybeUninit slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion - I’ll keep that in mind.

@davidhewitt
Copy link
Member

Also, on assessment I agree PyString::new_with is probably not so useful (and also I'm not sure it's easy to implement safely; the Python representation of unicode data has lots of representations for different platforms / optimisations).

For PyByteArray::new_with - I agree this can work. It's not documented, but it looks like PyByteArray_FromStringAndSize also returns an uninitialized object if the first argument is null: https://github.com/python/cpython/blob/8963a7f1f84a05412178b56629508b660d38861b/Objects/bytearrayobject.c#L145

Many thanks for your continued revisions to this PR - it's looking really really good 👍

@juntyr
Copy link
Contributor Author

juntyr commented Aug 2, 2020

Agreed that clippy --tests should be mentioned, but I think cargo test runs all doc tests by default

Mhm, maybe I was doing something wrong then. But I checked the docs and you are completely right that cargo test should run documentation tests by default.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 4, 2020

I am sorry that I have to follow up on this discussion, but I came accross an interesting problem. (How) Can we support failure inside the init closure? Right now, we are implicitly assuming that nothing inside it can fail, which might not be the case. One way to resolve this might be to accept a Result<R, E> as a return type of init and to return Result<(&PyBytes, R), E> from new_with. What do you think?

@davidhewitt
Copy link
Member

Ah, great question! We could perhaps modify the return type of init to be PyResult<()>?

Note that because F: FnOnce it's possible to mutate references to other data outside the closure, so I don't think that we need to support an arbitrary return type R.

Im also happy to leave the API it as-is, and let users who need error handling use a similar trick as suggested above for other return values. It'd be less ergonomic, so it depends if we expect users of this API to typically be doing fallible operations inside init.

I guess we the answer is yes, because if they already had the data materialised then PyBytes::new would already be sufficient.

@davidhewitt
Copy link
Member

davidhewitt commented Aug 4, 2020

In fact, reflecting on the original truncate case further, perhaps the return type for init should be PyResult<Option<usize>>, where:

  • if the return value is Ok(Some(x)) then the bytes are truncated to length x
  • if the return value is Ok(None), this is equivalent to the current behaviour
  • if the return value is Err(e) then the bytes are deallocated immediately and the error is returned.

Also, if we change this function to be fallible, I think we should also remove the panic for out-of-memory from this API and related functions.

Does the above seem reasonable?

@juntyr
Copy link
Contributor Author

juntyr commented Aug 4, 2020

I also tried out the approach of mutating an outside variable - it just felt very un-Rusty to mutate a result variable inside a closure so it can be read outside. But for just returning some value, I think mutating an outside variable could be fine.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 4, 2020

In fact, reflecting on the original truncate case further, perhaps the return type for init should be PyResult<Option<usize>>, where:

* if the return value is `Ok(Some(x))` then the bytes are truncated to length x

* if the return value is `Ok(None)`, this is equivalent to the current behaviour

* if the return value is `Err(e)` then the bytes are deallocated immediately and the error is returned.

Also, if we change this function to be fallible, I think we should also remove the panic for out-of-memory from this API and related functions.

Does the above seem reasonable?

That sounds like a very reasonable approach. Should we change the general new such that it returns a result as well?

@davidhewitt
Copy link
Member

Should we change the general new such that it returns a result as well?

Yes please :)

I think when the allocation fails a MemoryError should actually already be already be set by the underlying c apis?

@birkenfeld
Copy link
Member

I don't see why PyO3 should go out of its way not panicking for out-of-memory while the standard library does.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 4, 2020

After fiddling around with the required changes to the API, I agree that returning PyResult from all constructors for PyBytes and PyByteArray makes the code more cumbersome and will just lead to a lot of unwrap()s. We could either remain at not catching any OutOfMemory errors or just catching them when we already return a result (though that semantic inconsistency might then become confusing as well). At the same time, I think forcing the user to think about those error cases also has advantages -- especially if they can forward them.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 4, 2020

If the init closure returns an error, do we need to free the allocated bytes in some way?

@davidhewitt
Copy link
Member

If the init closure returns an error, do we need to free the allocated bytes in some way?

I think you would just need to call Py_DECREF on the pointer instead of py.from_owned_ptr.

@davidhewitt
Copy link
Member

I don't see why PyO3 should go out of its way not panicking for out-of-memory while the standard library does.

That's a reasonable opinion, and I'm happy to leave this as-is if you feel strongly about so. My motivation was that the equivalent Python just raises MemoryError exception, and usually for APIs where we have a Python exception possibly being raised, we return PyResult.

>>> bytearray(10**11)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

From a quick check, at the moment it looks like our only constructors which return PyResult are for PySet / PyFrozenSet and the datetime bindings.

@birkenfeld
Copy link
Member

And Py::new, unfortunately.

@davidhewitt
Copy link
Member

And Py::new, unfortunately.

Yeah, that's a tricky one. While MemoryError is likely to be the error there, it's also possible for arbitrary user code to run e.g. from the tp_alloc slot.

Have you got examples where Py::new returning PyResult is not ergonomic? My general perception is that because #[pyfunction] and #[pymethods] can return PyResult, it's easy to handle APIs that return PyResult with ? unwrapping.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 5, 2020

I think if we wanted to change the API such that out-of-memory panics are caught, it would have to be updated everywhere and would be a significant and breaking change. Allowing the initialisation of PyBytes and PyByteArray to fail is a separate issue. Should I add the error handling (and truncation) to new_with? If so, should new_with still panic on out-of-memory errors or already report them as Err? Since new_with would presumably be most useful in memory-constrained scenaries, I think reporting memory errors would be helpful here.

@davidhewitt
Copy link
Member

Agreed, it's a big change, so let's leave existing constructors as-is for now. Perhaps at some point we can review them across the library including the Py::new case.

Regarding new_with - I'm happy if you open a PR to add the error handling and truncation to it, I think it's useful in memory-constrained situations. And I agree that new_with should return PyErr on out-of-memory - this matches the behaviour of our other fallible constructors as well as being potentially helpful, as you say.

@birkenfeld
Copy link
Member

Yeah, that's a tricky one. While MemoryError is likely to be the error there, it's also possible for arbitrary user code to run e.g. from the tp_alloc slot.

Right, although I would have expected that for the types that are allowed there no fancy tp_alloc is possible.

Have you got examples where Py::new returning PyResult is not ergonomic?

I was using some closure heavy code where it can be a pain to thread the Results through to the toplevel. But I don't think it's a big issue.

@davidhewitt
Copy link
Member

Right, although I would have expected that for the types that are allowed there no fancy tp_alloc is possible.

This is true and I agree, though we might need to refactor some internals. I'll open an issue to discuss.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 5, 2020

One issue I have come across is that testing for a memory error is quite difficult cross-platform. On my machine, the OOM kicks in instead (because the virtual address space allows the allocation) and just kills the program instead. Should we just do a best effort error handling (i.e. if we are lucky enough to encounter the error in Rust we will return an Err, but it might just kill your program as well) or just go back to the "it will panic/abort in some way" semantic? Is there a way to test this behaviour?

@juntyr
Copy link
Contributor Author

juntyr commented Aug 5, 2020

In the test for a failing initialisation (i.e. init returns Err), I would like to assert that the execution has not leaked any objects. Is there a way to do so in the Rust testsuite?

@davidhewitt
Copy link
Member

On my machine, the OOM kicks in instead (because the virtual address space allows the allocation) and just kills the program instead. Should we just do a best effort error handling (i.e. if we are lucky enough to encounter the error in Rust we will return an Err, but it might just kill your program as well) or just go back to the "it will panic/abort in some way" semantic? Is there a way to test this behaviour?

Ah, that's a fair point. I think when we're hitting limits of the machine like this it's hard to guarantee particular execution, so I'm happy to just leave it as "we'll try to give you an error if possible". I'm willing to accept that there probably can't be a test written for this case.

In the test for a failing initialisation (i.e. init returns Err), I would like to assert that the execution has not leaked any objects. Is there a way to do so in the Rust testsuite?

Good question. I'm actually not aware of any C API for this. I wonder if you can call into Python's tracemalloc module to get what you need?

If writing such a test is massively complicated, again I'm willing to forgo a test as long as the implementation is appropriately commented (e.g. // XXX: Make sure to deallocate here or bytes will be leaked!)

@juntyr
Copy link
Contributor Author

juntyr commented Aug 5, 2020

I have found two different ways of calling into tracemalloc:

  1. Using https://docs.python.org/3/c-api/memory.html#tracemalloc-c-api (I have not tried that yet as the relevant ffi is incomplete in pyo3)
  2. Using Python::run: this results in the following import error for tracemalloc: .pyenv/versions/3.8.3/lib/python3.8/lib-dynload/_struct.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyFloat_Type

One slightly unrelated note: when I call Py_REFCNT inside init (which is now called before py.from_owned_ptr(pyptr)), the refcount is 0. Is that the expected behaviour?

@davidhewitt
Copy link
Member

Using Python::run: this results in the following import error for tracemalloc: .pyenv/versions/3.8.3/lib/python3.8/lib-dynload/_struct.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyFloat_Type

Looks like you're using pyenv, so this discussion might be the cause of that: #763

One slightly unrelated note: when I call Py_REFCNT inside init (which is now called before py.from_owned_ptr(pyptr)), the refcount is 0. Is that the expected behaviour?

That's surprising to me, I thought it would be 1. Will need to check the cpython source to figure that out.

@juntyr
Copy link
Contributor Author

juntyr commented Aug 6, 2020

Looks like you're using pyenv, so this discussion might be the cause of that: #763

Thank you for the suggestion - in the end setting config.shared to true seemed to fix it. But from the discussion it was not clear to me where the config was so I had to grep for it. Should that information be included in the README as well or is that a developer-only fix?

@juntyr
Copy link
Contributor Author

juntyr commented Aug 6, 2020

One slightly unrelated note: when I call Py_REFCNT inside init (which is now called before py.from_owned_ptr(pyptr)), the refcount is 0. Is that the expected behaviour?

That's surprising to me, I thought it would be 1. Will need to check the cpython source to figure that out.

Maybe I was just measuring it wrong - anyways, I am now able to produce the behaviour where an allocation of 1MB bytestring without ffi::Py_DECREF has a memory difference >1MB whereas an allocation with ffi::Py_DECREF has a memory difference of >10kB. That should suffice to create a leak test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

memoryview, python-side allocation
5 participants