Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes should translate to &[u8] #373

Closed
pganssle opened this issue Feb 26, 2019 · 11 comments · Fixed by #637
Closed

bytes should translate to &[u8] #373

pganssle opened this issue Feb 26, 2019 · 11 comments · Fixed by #637

Comments

@pganssle
Copy link
Member

pganssle commented Feb 26, 2019

Per a discussion in the gitter chat, it seems that currently the proper traits are not implemented to allow for Python's bytes objects to be automatically translated into &[u8] parameters. Right now, in order to write a function that takes bytes, you need to use &PyBytes:

#[pyfunction]
fn print_bytes(_py: Python, x: &PyBytes) {
    println!("{:?}", x.as_bytes())
}

If you instead use &[u8] directly, there's an error at compile time because the pyo3::PyTypeInfo trait is not implemented for [u8]. I think it should be possible to pass both a bytes and bytearray object into a Rust function that takes &[u8].

I am using version 0.6.0-alpha.4, and with this code:

#[pyfunction]
fn print_bytes(_py: Python, x: &[u8]) {
    println!("{:?}", x.as_bytes())
}

I get this error message:

error[E0277]: the size for values of type `[u8]` cannot be known at compilation time
  --> src/lib.rs:36:1
   |
36 | #[pyfunction]
   | ^^^^^^^^^^^^^ doesn't have a size known at compile-time
   |
   = help: the trait `std::marker::Sized` is not implemented for `[u8]`
   = note: to learn more, visit <...>
   = note: required because of the requirements on the impl of
           `pyo3::FromPyObject<'_>` for `&[u8]`

error[E0277]: the trait bound `[u8]: pyo3::PyTypeInfo` is not satisfied
  --> src/lib.rs:36:1
   |
36 | #[pyfunction]
   | ^^^^^^^^^^^^^ the trait `pyo3::PyTypeInfo` is not implemented for `[u8]`
   |
   = note: required because of the requirements on the impl
           of `pyo3::PyTryFrom<'_>` for `[u8]`
   = note: required because of the requirements on the impl
           of `pyo3::FromPyObject<'_>` for `&[u8]`

error: aborting due to 2 previous errors

CC @thedrow

@konstin
Copy link
Member

konstin commented Feb 26, 2019

I think the second example was meant to contain a different snippet

@pganssle
Copy link
Member Author

@konstin Oops, sorry about that, fixed.

@Hywan
Copy link
Contributor

Hywan commented Apr 12, 2019

Would be a nice feature :-). Right now, the workaround I've is the following:

pub fn foo(bytes: &PyAny) -> PyResult<bool> {
    match <PyBytes as PyTryFrom>::try_from(bytes) {
        Ok(bytes) => /* do something */,
        _ => Ok(false)
    }
}

@konstin
Copy link
Member

konstin commented Apr 17, 2019

If someone wants to implement this,PyBytes_AsStringAndSize (docs, bindings) and PyString::as_bytes should be a good starting point:

pyo3/src/types/string.rs

Lines 50 to 61 in fdeef7d

/// Get the Python string as a byte slice.
#[inline]
pub fn as_bytes(&self) -> &[u8] {
unsafe {
let mut size: ffi::Py_ssize_t = mem::uninitialized();
let data = ffi::PyUnicode_AsUTF8AndSize(self.0.as_ptr(), &mut size) as *const u8;
// PyUnicode_AsUTF8AndSize would return null if the pointer did not reference a valid
// unicode object, but because we have a valid PyString, assume success
debug_assert!(!data.is_null());
std::slice::from_raw_parts(data, size as usize)
}
}

@ExpHP
Copy link

ExpHP commented Jul 15, 2019

You can use a Python to modify a bytearray at any time, so &[u8] is incorrect for this type. The best you can do is &[Cell<u8>].

@konstin
Copy link
Member

konstin commented Jul 16, 2019

Indeed, only bytes should be convertible to &[u8].

@pganssle
Copy link
Member Author

You can use a Python to modify a bytearray at any time, so &[u8] is incorrect for this type. The best you can do is &[Cell<u8>].

@konstin @ExpHP I am not sure I agree with this, can you clarify exactly what the objection is? In both cases you get an immutable view into a memory array, so it's the difference between &mut [u8] (bytearray) and &[u8] (bytes). In Rust you are allowed to coerce mutable references to immutable references:

fn some_func(y: &[u8]) {
    println!("{:?}", y);
}

fn main() {
    let mut x : [u8; 4] = [0, 0x55, 0x10, 0xff];
    some_func(&mut x);
}

So it makes sense to allow conversion from bytearray to &[u8]. I agree that you should be able to pass a bytearray to a function that takes &[Cell<u8>], though, but that you should not be able to pass a bytes object to that function.

@birkenfeld
Copy link
Member

@pganssle Rust can and will assume that memory referenced by &[u8] does not change while you hold the reference. Code like this

let x: &[u8] = python_bytearray.as_bytes();
modify(&python_bytearray);
println!("{:?}", x);

is now problematic, for example because the compiler might decide to do the read from x before the call to modify.

@pganssle
Copy link
Member Author

@birkenfeld I suppose it's a fair point, though it depends heavily on how the implementation works. If it's indeed just passing a reference to some memory that Python can modify while Rust holds a reference to it, then it's a soundness problem. If it's copying the memory into some intermediate representation managed by Rust, it's not a problem.

I haven't looked enough into the implementation of #[pyfunction] to know how easy this would be, but presumably it's also possible to have bytearray automatically converted to some Rust-managed array even if bytes just returns a reference.

@ExpHP
Copy link

ExpHP commented Jul 17, 2019

&mut in rust does not mean mutable, it means unique. This is inherently false for anything obtained from python. (I think it's more or less trivial to show that pyo3's ability to define &mut self methods on python classes (likewise it's choice to use &mut self for __iadd__, etc.) is unsound)


Here is proof that even the existing fn data(&PyByteArray) -> &[u8] method invokes undefined behavior:

extern crate pyo3;
use pyo3::prelude::{PyResult, Python};
use pyo3::types::{PyDict, PyByteArray};

fn main() -> Result<(), ()> {
    let gil = Python::acquire_gil();
    show_ub(gil.python()).map_err(|e| {
        eprintln!("error! :{:?}", e);
        e.print_and_set_sys_last_vars(gil.python());
    })
}

// a function that behaves differently in dev and release builds
fn show_ub(py: Python<'_>) -> PyResult<()> {
    let dict = PyDict::new(py);
    let array = PyByteArray::new(py, &[2]);
    dict.set_item("b", array)?;

    let bytes: &[u8] = array.data();
    lol(&bytes[0], &|| py.run("b[0] = 3", None, Some(&dict)).unwrap());

    // we should never make it here, because 2 != 3
    unreachable!("uh oh, how did we get here!?");
}

#[inline(never)]
fn lol(x: &u8, func: &dyn Fn()) {
    let old = *x;
    func();
    let new = *x;
    assert_eq!(old, new);
}

Results on debug:

   Compiling bbb v0.1.0 (/home/lampam/cpp/throwaway/bbb)
    Finished dev [unoptimized + debuginfo] target(s) in 0.43s
     Running `target/debug/bbb`
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `3`', src/main.rs:31:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Results on release: The assert_eq! gets optimized away:

   Compiling bbb v0.1.0 (/home/lampam/cpp/throwaway/bbb)
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/bbb`
thread 'main' panicked at 'internal error: entered unreachable code: uh oh, how did we get here!?', src/main.rs:23:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

@konstin
Copy link
Member

konstin commented Jul 18, 2019

(I think it's more or less trivial to show that pyo3's ability to define &mut self methods on python classes (likewise it's choice to use &mut self for iadd, etc.) is unsound)

Yep, #342 should fix that. On looking at #342 again I've realized that @athre0z had already pointed that out over there. I'm sorry for missing that, I initially ignored the comments because I thought they were only more evidence for the &mut self situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants