Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ctypes implementation #2364

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open

Ctypes implementation #2364

wants to merge 57 commits into from

Conversation

rodrigocam
Copy link
Contributor

@rodrigocam rodrigocam commented Dec 9, 2020

@darleybarreto and I are implementing the ctypes module. This an initial implementation for review. At this stage, we are focusing on Linux platforms to in the future extend to other platforms like ctypes from cpython does.

@coolreader18
Copy link
Member

I just pushed a commit to get stuff compiling. Excited to review this!

@darleybarreto
Copy link

One thing to note is that some things are directly translation from PyPy, others from CPython's. Also there's that difference I said on Gitter, regarding declaring a class like:

class Custom(_SimpleCData):
   pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: class must define a '_type_' attribute

which here will succeed.
Besides this, I've tried to stay as close as CPython's behavior as possible.

Copy link
Member

@coolreader18 coolreader18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I have so far; I think it would be better to use more of libffi's middle API in ctypes::function, just to avoid as much unsafe as possible for something like ctypes 🙃 . Now that the functions are copied from cpython, it should be easier to refactor into something more idiomatic.

vm/src/stdlib/ctypes/basics.rs Outdated Show resolved Hide resolved
vm/src/stdlib/ctypes/basics.rs Outdated Show resolved Hide resolved
#[pyclass(module = "_ctypes", name = "_CData")]
pub struct PyCData {
_objects: AtomicCell<Vec<PyObjectRef>>,
_buffer: PyRwLock<Vec<u8>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the buffer is supposed to be a raw pointer and a length, so that you can mutate the original data it if you want to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

struct RawBuffer{
    inner: *mut u8,
    size: usize
}

?

@coolreader18 coolreader18 mentioned this pull request Dec 14, 2020
Comment on lines 127 to 128
#[cfg(any(unix, windows, target_os = "wasi"))]
modules.insert("_ctypes".to_owned(), Box::new(ctypes::make_module));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of duplicating same cfg, it can be grouped like:

    #[cfg(any(unix, windows, target_os = "wasi"))]
    {
        modules.insert(os::MODULE_NAME.to_owned(), Box::new(os::make_module));
        modules.insert("_ctypes".to_owned(), Box::new(ctypes::make_module));
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't know if ctypes would be available on wasi, I don't think it has dll functionality yet

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would libffi work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, wasm doesn't really have any sort of support for loading another wasm module at runtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe interesting, Blazor does load DLLs at runtime from what I can see (from this blog). How feasible is something similar for Rust?

}
void => {
ptr::null_mut()
arg(&ptr::null::<usize>())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replace this for

ptr::null::<c_void>()

since it seemed more "correct", which I'changed after committing this. So in a next change batch it will be replaced.

@@ -252,11 +242,17 @@ struct PyCDataBuffer {
// This trait will be used by all types
impl Buffer for PyCDataBuffer {
fn obj_bytes(&self) -> BorrowedValue<[u8]> {
PyRwLockReadGuard::map(self.data.borrow_value(), |x| x.as_slice()).into()
PyRwLockReadGuard::map(self.data.borrow_value(), |x| unsafe {
slice::from_raw_parts(x.inner, x.size)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is copying things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it isn't. slice is a sort of view

@@ -265,14 +261,21 @@ impl Buffer for PyCDataBuffer {
&self.options
}
}
pub struct RawBuffer {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coolreader18 is this what you have in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safe expression of this type in Rust is Box<[u8]> which is exactly equivalent to this struct. Is this intended work to avoid somethiing like ownership check? I think there must be a better way.

}
pointer => {
usize::try_from_object(vm, obj).unwrap() as *mut c_void
arg(&usize::try_from_object(vm, obj).unwrap())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also replace this with

arg(&(usize::try_from_object(vm, obj)? as *mut usize as *mut c_void))

buffer
.obj_bytes()
.chunks(4)
.map(|c| NativeEndian::read_u32(c))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use u32::from_ne_bytes for this; also, there's already a byteorder dependency in vm/Cargo.toml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what function is this supposed to be? CArray doesn't have a value property from what I can see

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use u32::from_ne_bytes for this; also, there's already a byteorder dependency in vm/Cargo.toml

Oh boy, I missed that one, I've found the btand le counterparts. I can simply ignore the whole ByteOrder crate, then.

Also, what function is this supposed to be? CArray doesn't have a value property from what I can see

These properties come from it's metaclass (PyCArrayType_Type), which are CharArray_getsets and WCharArray_getsets

fn obj_bytes(&self) -> BorrowedValue<[u8]> {
// @TODO: This is broken
PyRwLockReadGuard::map(self.data.borrow_value(), |x| unsafe {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need to be fixed, "casting" the associated type somehow

@Pluriscient
Copy link
Contributor

What are the major blockers for this PR? I think I want to give this a try :)

@darleybarreto
Copy link

What are the major blockers for this PR? I think I want to give this a try :)

Unfortunately there are several problems with the state of things and I haven't been able to continue working on this (and will not be able to do so for a while). The main problem is how it behaves differently from CPython's and PyPy's, a meta class should be added for Struct, CSimpleType, Pointer, ... (all but CFuntion), so that you can do c_int * 10 and return a proper array. In the current impl type(CSimpleType) == type, which does not has a __mul__ as required for a c_int, for instance.

@youknowone youknowone force-pushed the ctypes branch 5 times, most recently from 3334aa9 to 4725a80 Compare August 6, 2021 16:10
@darleybarreto
Copy link

darleybarreto commented Aug 6, 2021

@youknowone I tried to sync with master a while ago, but there was a problem and I got too busy to look at that again, so thanks for that! Are you planning into work on this? If so, I think there are some tests I ported from CPython that I altered just to see if the implementation was working-ish.

To be fully compatible to CPython and PyPy, the types must have a metaclass that's not PyType. I tried to work something there, but I couldn't find a solution. This happens because a base type has to implement some ops at type level, eg, c_int * 8 should call __mul__ from its meta (a yet non-existing class PrimitiveTypeMeta which is a subclass of PyType) and return a c_array.

I am happy to help with discussion and pointers, but I'm afraid I don't have much time to code :/

@youknowone
Copy link
Member

@darleybarreto Hi, thank you for the comment. I am not planning for working on ctypes at the moment, but looking for a way to merge it as it is for future contributors - because you already worked really a lot of it - if it doesn't break things too much. How do you think about it?

@darleybarreto
Copy link

You mean merging as is?

@youknowone
Copy link
Member

yes, if we can manage it not to break CI

@darleybarreto
Copy link

I think that we need to do a few things to help future contributions in this code before merging:

  1. Refactor types structure.
  2. Documentation.
  3. A thorough review to improve code quality.

Undoubtedly 1 is the major blocker here. Since I don't know much of the internals of RustPython, I wasn't able to solve this problem. Perhaps you can give me pointers and I can work it out. PyPy does is basically this way:

class _CDataMeta(type):
    ...
    def __mul__(self, length):
        return create_array_type(self, length)
    ...

class SimpleType(_CDataMeta):
    ...

class SimpleCData(..., metaclass=SimpleType):
    ...

I couldn't find a way to make type(SimpleCData) be SimpleType instead of type.

Point 2 is extremely important since this code is basically too experimental and not that much well written. I could gladly do it based on PyPy, CPython and documentations.
Point 3 is somewhat important because I am not a very experienced Rust programmer, so if anyone could help me writing idiomatic and performant Rust, I would appreciate!

@youknowone
Copy link
Member

Thank you for detailed explanation. The metaclass issue is recently solved. Here is the test:

$ cat t.py 
class _CDataMeta(type):
    def __mul__(self, length):
        return create_array_type(self, length)

class SimpleType(_CDataMeta):
    pass

class SimpleCData(object, metaclass=SimpleType):
    pass


print(type(SimpleCData))
assert type(SimpleCData) == SimpleType
$ cargo run t.py 
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/rustpython t.py`
<class '__main__.SimpleType'>

I will look in the code for 3

@darleybarreto
Copy link

darleybarreto commented Aug 7, 2021

Oh, I'm sorry, I meant to do the same thing in the Rust side. CPython does it in the C side of things by manually filling the type:

#define MOD_ADD_TYPE(TYPE_EXPR, TP_TYPE, TP_BASE) \
    do { \
        PyTypeObject *type = (TYPE_EXPR); \
        Py_SET_TYPE(type, TP_TYPE); \
        type->tp_base = TP_BASE; \
        if (PyModule_AddType(mod, type) < 0) { \
            return -1; \
        } \
    } while (0)

MOD_ADD_TYPE(&Simple_Type, &PyCSimpleType_Type, &PyCData_Type);

Here PyCSimpleType_Type is the meta and PyCData_Type is the base class.

vm/src/stdlib/ctypes/array.rs Outdated Show resolved Hide resolved
}
}

fn slice_adjust_size(length: isize, start: &mut isize, stop: &mut isize, step: isize) -> isize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't compare precisely, but this function might be a duplication of inner_indices in slice.rs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have some resemblance, but are different nonetheless.

vm/src/stdlib/ctypes/array.rs Outdated Show resolved Hide resolved
vm/src/stdlib/ctypes/array.rs Outdated Show resolved Hide resolved
_type_: new_simple_type(Either::A(&outer_type), vm)?.into_ref(vm),
_length_: length,
_buffer: PyRwLock::new(RawBuffer {
inner: Vec::with_capacity(length * itemsize).as_mut_ptr(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this line is safe. The new Vec will be removed right after this line and the given pointer will be a dangling pointer.

// @TODO: Is this copying?

let buffered = if copy {
unsafe { slice::from_raw_parts_mut(buffer.as_mut_ptr(), buffer.len()) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a copy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here

let buffered = if copy {
    unsafe { slice::from_raw_parts_mut(buffer.as_mut_ptr(), buffer.len()) }
        .as_mut_ptr()
} else {
    buffer.as_mut_ptr()
};

The idea is to copy the bytes from the buffer if copy, or return a view otherwise. Should I use to_contiguous to copy it?

#[pyimpl]
pub trait PyCDataFunctions: PyValue {
#[pymethod]
fn size_of_instances(zelf: PyRef<Self>, vm: &VirtualMachine) -> PyResult<PyObjectRef>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn size_of_instances(zelf: PyRef<Self>, vm: &VirtualMachine) -> PyResult<PyObjectRef>;
fn size_of_instances(zelf: PyRef<Self>, vm: &VirtualMachine) -> PyResult;

PyResult == PyResult<PyObjectRef>

@@ -252,11 +242,17 @@ struct PyCDataBuffer {
// This trait will be used by all types
impl Buffer for PyCDataBuffer {
fn obj_bytes(&self) -> BorrowedValue<[u8]> {
PyRwLockReadGuard::map(self.data.borrow_value(), |x| x.as_slice()).into()
PyRwLockReadGuard::map(self.data.borrow_value(), |x| unsafe {
slice::from_raw_parts(x.inner, x.size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it isn't. slice is a sort of view

PyCDataFunctions::alignment_of_instances(zelf.into_ref(vm), vm)
}
Either::B(obj) if obj.has_class_attr("alignment_of_instances") => {
let size_of = vm.get_attribute(obj, "alignment_of_instances").unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let size_of = vm.get_attribute(obj, "alignment_of_instances").unwrap();
let size_of = vm.get_attribute(obj, "alignment_of_instances")?;

is this unwrap() intended?

Comment on lines 281 to 282
if vm.isinstance(&argtypes, &vm.ctx.types.list_type).is_ok()
|| vm.isinstance(&argtypes, &vm.ctx.types.tuple_type).is_ok()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if vm.isinstance(&argtypes, &vm.ctx.types.list_type).is_ok()
|| vm.isinstance(&argtypes, &vm.ctx.types.tuple_type).is_ok()
if vm.isinstance(&argtypes, &vm.ctx.types.list_type).and_then(|_| vm.isinstance(&argtypes, &vm.ctx.types.tuple_type)).map_err(|e| vm.new_type_error(format!(
"_argtypes_ must be a sequence of types, {} found.",
argtypes.to_string()
)))? {

@youknowone
Copy link
Member

youknowone commented Aug 7, 2021

by watching rustpython_vm::builtins::pytype::new, I feel like that'd be simply done by replacing typ field in PyObject, but not sure it actually will be that easy.
Because I don't understand this patch that much, would you make specific place in this PR that you need those type stuff? Then I will fill the fitting code there.

@darleybarreto
Copy link

would you make specific place in this PR that you need those type stuff?

Let me get the CPython definitions as example

MOD_ADD_TYPE(&Struct_Type, &PyCStructType_Type, &PyCData_Type);
MOD_ADD_TYPE(&Union_Type, &UnionType_Type, &PyCData_Type);
MOD_ADD_TYPE(&PyCPointer_Type, &PyCPointerType_Type, &PyCData_Type);
MOD_ADD_TYPE(&PyCArray_Type, &PyCArrayType_Type, &PyCData_Type);
MOD_ADD_TYPE(&Simple_Type, &PyCSimpleType_Type, &PyCData_Type);
MOD_ADD_TYPE(&PyCFuncPtr_Type, &PyCFuncPtrType_Type, &PyCData_Type);

Here we have MOD_ADD_TYPE(&Class, &MetaClass, &BaseClass) right? Looking at Simple_Type (in RustPython this is PySimpleType in primitive.rs), we see that its metaclass implements the __mul__, so we can have c_int * 8 just fine, where c_int is subclass of Simple_Type. I used a trait PyCDataSequenceMethods in basics.rs to implement the __mul__, but doing impl PyCDataSequenceMethods for PySimpleTypeMeta {} in primitive.rs doesn't work because PySimpleTypeMeta is not the metaclass of PySimpleType.

@darleybarreto
Copy link

I'm not sure why, but I cannot comment on the following reply

The safe expression of this type in Rust is Box<[u8]> which is exactly equivalent to this struct. Is this intended work to avoid somethiing like ownership check? I think there must be a better way.

The idea of RawBuffer is to be a "view" of the external buffer, so when needed, we copy the bytes from that buffer. I think there's one case we need to copy data and another where we need a view.

@youknowone
Copy link
Member

What happens when the buffer is destroyed? Does it deletes only the pointer(as a reference) or also deletes the data it contains (as an owner)? If it does both, then does it need a flag to distinguish them?

@darleybarreto
Copy link

I forgot where these two functions are used, but based on the docs:
from_buffer_copy: C.from_buffer_copy(object, offset=0) -> C instance\ncreate a C instance from a readable buffer
from_buffer: C.from_buffer(object, offset=0) -> C instance\ncreate a C instance from a writeable buffer

@darleybarreto
Copy link

Don't worry. Thanks for helping me on this :)

@darleybarreto
Copy link

@youknowone Could you kindly help me with something else? How can I fetch your changes in this MR?

@youknowone
Copy link
Member

I pushed to your branch. So if you named your repository as origin, like when result of git remote -v is origin https://github.com/rodrigocam/RustPython.git,

git fetch origin && git reset --hard origin/ctypes will work. Note that this command will remove your entire local changes.

@darleybarreto
Copy link

Note that this command will remove your entire local changes.

Yeah, there wasn't anything different anyways. Thanks!

@youknowone
Copy link
Member

I pushed to your repo directly since I thought you are not active for this PR anymore, but now I know you come back. So I will make PRs for suggestion from now.

Here is another refactoring suggestion: rodrigocam#4

@darleybarreto
Copy link

Nice, thanks!

submodule extension for ctypes::dll
@darleybarreto
Copy link

@youknowone So the idea is to do that for all files in ctypes? Using my_filename::extend_module?

@youknowone
Copy link
Member

youknowone commented Aug 7, 2021

Yes, that's possible to adapt to all files. It is up to you for your convenience. dll module looked good to use it. And I am pretty sure every file can use it. But you don't need to use it if you don't feel like to do.

@darleybarreto
Copy link

@youknowone I basically patched all your comments, you wrote:

no, it isn't. slice is a sort of view

For this code

fn obj_bytes(&self) -> BorrowedValue<[u8]> {
    PyRwLockReadGuard::map(self.data.borrow_value(), |x| unsafe {
        slice::from_raw_parts(x.inner, x.size)
    })
    .into()
}

What did you meant there?

@youknowone
Copy link
Member

There was a question asking if it does copy. The comment is an answer about it. Creating an slice object doesn't copy anything.

@darleybarreto
Copy link

darleybarreto commented Aug 8, 2021

I added a C file that should be compiled as a shared lib and bundled together with the interpreter. Its _ctypes_test.c. How would one make this happen? Would it be some sort of build script using cc? And we should be able to do this:

import _ctypes_test

@youknowone
Copy link
Member

build.rs file can contain any build script. Using cc crate will be helpful.

@darleybarreto
Copy link

@youknowone I haven't figured out a nice way to implement the from_buffer. from_buffer_copy is straight forward, you simply create a new instance (default args for both tp_new and init) and copy the bytes from the value to create an instance. For example:

data = b'data'
ubyte = c_ubyte * len(data)
byteslike = ubyte.from_buffer_copy(data) # an instance of ubyte made from data by copying things

from_buffer does a similar thing, but it doesn't copy things, it points to the source. So if you change byteslike, you change data.

My implementation of from_buffer_copy is something in these lines

#[pyclassmethod]
fn from_buffer_copy(
    cls: PyRef<Self>,
    obj: PyObjectRef,
    offset: OptionalArg,
    vm: &VirtualMachine,
) -> PyResult {
    let buffer = try_buffer_from_object(vm, &obj)?;
    let opts = buffer.get_options().clone();

    let (size, offset) = Self::buffer_check(cls, opts, offset, vm); //ignore this for now

    let src_buffer = buffer.obj_bytes();
    let empty_instance = ... //creates a new empty instance
    let dst_buffer = empty_instance.obj_bytes_mut();

    dst_buffer.copy_from_slice(&src_buffer[offset..offset+size]);

    Ok(empty_instance.as_object().clone())
}

So the idea is simply get the bytes from obj (it should implement the buffer protocol) and point to them. How would I point to the data in obj instead of copying it? Should I use some refcounted type to avoid pointing to dangling things? obj should always be a buffer with write access.

@youknowone
Copy link
Member

I don't know well about your requirements, but I guess you can do same way as how PyMemoryView does. Or even just use it. it just holds a PyBufferRef and copy the data only when it needs to be. It also locks PyBufferRef when the lock is required.

@youknowone
Copy link
Member

I think this PR is (practically) not possible to rebase anymore in this state. Do you mind if we squash the commits into single commit for rebase?

@darleybarreto
Copy link

I think this PR is (practically) not possible to rebase anymore in this state. Do you mind if we squash the commits into single commit for rebase?

Not at all :)

I'm afraid I can't contribute in the following months. I can help others eventually, tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants