Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support exporting Rust enums to Python (full ADTs) #417

Open
thedrow opened this issue Mar 24, 2019 · 58 comments
Open

Support exporting Rust enums to Python (full ADTs) #417

thedrow opened this issue Mar 24, 2019 · 58 comments

Comments

@thedrow
Copy link
Contributor

thedrow commented Mar 24, 2019

This issue replaces the closed #131.

We need to provide a way to export an enum to Python.

@programmerjake
Copy link
Contributor

I haven't checked, but doing the same thing as boost python might work

@birkenfeld
Copy link
Member

well, C++ enums are a fair bit different from Rust enums.

bors bot added a commit to wasmerio/wasmer-python that referenced this issue Jan 24, 2020
116: feat(module) `Module.exports` uses the new `ExportKind` int enum for the `kind` value r=Hywan a=Hywan

This patch creates a new `ExportKind` int enum defined as:

```python
class ExportKind(IntEnum):
    FUNCTION = 1
    MEMORY = 2
    GLOBAL = 3
    TABLE = 4
```

Since pyo3 doesn't have a mapping from Rust to Python enum (cf PyO3/pyo3#417), we had to find a workaround to create it. I believe the approch taken in this PR isn't too ugly. The enum is created with [the functional API](https://docs.python.org/3/library/enum.html#functional-api), such as:

```python
IntEnum("ExportKind", "FUNCTION MEMORY GLOBAL TABLE")
```

Variants order and variant names are defined by implementations on `ExportKind` in Rust.

Then, this new `ExportKind` enum is used in `Module.exports` as values of the `kind` pair.

Co-authored-by: Ivan Enderlin <ivan.enderlin@hoa-project.net>
@gilescope
Copy link
Contributor

We should split this into two issues. One is supporting old school Enums and the other is supporting ADTs. I'd suggest that for now having a pleasant way to define a python Enum in rust would be what this issue should focus on and if there's any state in any of the rust enum states we say not currently supported.

@davidhewitt
Copy link
Member

Agreed with the above

@programmerjake
Copy link
Contributor

@davidhewitt davidhewitt changed the title Support exporting Rust enums to Python Support exporting Rust enums to Python (full ADTs) Mar 25, 2020
@davidhewitt
Copy link
Member

I've made this the full ADT issue and created #834 for tracking the simple case.

@gilescope
Copy link
Contributor

gilescope commented Mar 26, 2020 via email

@gilescope
Copy link
Contributor

For full ADTs I would initially limit it to the one pyclass per variant

@gilescope
Copy link
Contributor

This is now released in 0.12 if people want to try it out.

@thedrow
Copy link
Contributor Author

thedrow commented Oct 12, 2020

In that case, we should close this issue no?

@davidhewitt
Copy link
Member

To clarify - in 0.12 we added #[derive(FromPyObject)] which lets you convert a Python type into a Rust enum. Having a full two-way binding analogue equivalent to #[pyclass] would be even more awesome, but it's not clear what the design would be.

@gilescope
Copy link
Contributor

Well the intial feedback seems to be that it works well enough (one less big switch statement - yay),
but if we're going to do that shouldn't we also support #[derive(IntoPy)] for enums?

@davidhewitt
Copy link
Member

davidhewitt commented Oct 13, 2020

I would be very open to having #[derive(IntoPy)] for enums.

I made such a comment on the FromPyObject PR, but we did agree that it feels easier to use .into_py() to handle the to-python case than it does handling the FromPyObject with an if-let chain.

Also the complication with the #[derive(IntoPy)] case is that it's not clear what Python structures everything should map to. E.g.

#[derive(IntoPy)]
enum MyEnum {
    Str(String),
    Foo { bar: Bar, qux: i32 },
}

it seems clear enough that MyEnum::Str would map into a Python str object, but what does MyEnum::Foo become? Maybe a dict with bar and qux keys?

Design opinions are very welcome from anyone with a use case for this.

@ethanhs
Copy link
Contributor

ethanhs commented Dec 14, 2020

I quite like the idea of using a dict for the MyEnum::Foo case. I'm not keen on having to create a new type for each enum variant (especially if there are a ton of variants).

@jovenlin0527
Copy link
Contributor

Python introduced a new pattern matching syntax in 3.10, and I think if we can make Rust enum support that, this will be as close to ADT as we can get.

According to PEP634, class patterns in Python are actually just a series of isinstance check. To support that we have to create an invisible class for each variant. Going down this way needs more design work.

  • If each variant has a corresponding PyClass, then we need a way to implicitly convert that class back to the enum. We could do this by implementing FromPyObject for the enum, but FromPyObject is implemented for every PyClass, so the enum cannot be a PyClass. However if it's not a PyClass, then we can't have a Py<Enum>.
  • pyenum will behave a lot like pyclass, like relationships between Rust enum and struct, and we will need to think about how to avoid code duplication.

@davidhewitt
Copy link
Member

Thanks, I see you have opened #2002 to get us moving towards this!

I think as well as isinstance there's also something to do with __match_args__? We may need to support that. I haven't played around with 3.10's pattern matching at all myself yet.

If each variant has a corresponding PyClass, then we need a way to implicitly convert that class back to the enum. We could do this by implementing FromPyObject for the enum, but FromPyObject is implemented for every PyClass, so the enum cannot be a PyClass. However if it's not a PyClass, then we can't have a Py.

I think we definitely need to create a class for the Enum, so we can have Py<Enum>.

I'm not sure (and it may depend on Python's pattern matching logic) whether we want to have a method for each variant on the Enum class, or we want each variant to be a subclass of the original enum.

Thinking about it from a user perspective, I think we want an enum like this in Rust:

enum Point {
    TwoD { x: f32, y: f32 },
    ThreeD { x: f32, y: f32, z: f32 }
}

to use in Python as

point = Point.TwoD(x=1.0, y=3.4)

match point:
    case Point.TwoD(x, y):
        print(f"Got 2D point ({x}, {y})")
    case Point.ThreeD(x, y, z):
        print(f"Got 3D point ({x}, {y}, {z})")

@programmerjake
Copy link
Contributor

an expanded example with what I'd expect:

pub enum MyEnum {
    A,
    B(),
    C {},
    D(i8, i16),
    E {a: u8, b: String},
}

should be matchable like so:

match value:
    case MyEnum.A: # A is an instance of MyEnum, just like Python enums
        pass
    case MyEnum.B(): # B is a subclass of MyEnum
        pass
    case MyEnum.C(): # C is a subclass of MyEnum
        pass
    case MyEnum.D(x, y): # D is a subclass of MyEnum
        pass
    case MyEnum.E(a=x, b=y): # E is a subclass of MyEnum
        pass

@jovenlin0527
Copy link
Contributor

If each variant has a corresponding PyClass, then we need a way to implicitly convert that class back to the enum. We could do this by implementing FromPyObject for the enum, but FromPyObject is implemented for every PyClass, so the enum cannot be a PyClass. However if it's not a PyClass, then we can't have a Py.

I think we definitely need to create a class for the Enum, so we can have Py<Enum>.

I'm not sure (and it may depend on Python's pattern matching logic) whether we want to have a method for each variant on the Enum class, or we want each variant to be a subclass of the original enum.

When we implement each Variant as a seperate class, we want them to be invisible. Specifically, when the user writes a fn in PyO3, they cannot put any Variant in their function signature. That means PyO3 will have to turn Variants into Enums for the user.

Subclassing the Enum handles this very cleanly. (Didn't realize we can do that...) If we don't subclass the Enum, we will have to write custom FromPyObject for the Enum. This leads to unnecessary complexity, especially when There are so many competing implementations of FromPyObject related to PyClass that we need to be very careful to avoid conflicting implementations.

@programmerjake
Copy link
Contributor

you can have custom metaclasses in the C API:
just set ob_type in the class definition:
https://docs.python.org/3.10/c-api/typeobj.html#c.PyObject.ob_type

@vultix
Copy link
Contributor

vultix commented Aug 12, 2022

Got it. I think I see how to do that, I'll play with it and see if I can get it to work

@vultix
Copy link
Contributor

vultix commented Aug 12, 2022

Alright, I've successfully set the metaclass to a custom class. Now I'm trying to figure out how to get __instancecheck__ of the metaclass to work.

Is there a specific slot I should use for the method? A specific signature?

Edit: fixed typo __isinstance__ to __instancecheck__

@mejrs
Copy link
Member

mejrs commented Aug 12, 2022

you can have custom metaclasses in the C API: just set ob_type in the class definition: https://docs.python.org/3.10/c-api/typeobj.html#c.PyObject.ob_type

Then you end up writing code like this. Also, wouldn't that make this entire feature unavailable on limited api/PyPy?

@programmerjake
Copy link
Contributor

Is there a specific slot I should use for the method? A specific signature?

iirc there isn't a specific slot, just add a method __instancecheck__(self: YourMetaclass, instance: Any) -> bool

@programmerjake
Copy link
Contributor

another method for creating a class with a metaclass is just calling the metaclass as could be done in python code...this does make it harder to create methods though...

this is suggested in the docs for PyType_FromMetaclass

@vultix
Copy link
Contributor

vultix commented Aug 13, 2022

Good news, I now have the __instancecheck__ working! I now have everything I need to put together a proof of concept.

For now I'm simply using PyObject.ob_type to set the metaclass. We can experiment with other ways after I have that

@programmerjake
Copy link
Contributor

Good news, I now have the __instancecheck__ working! I now have everything I need to put together a proof of concept.

yay!

For now I'm simply using PyObject.ob_type to set the metaclass. We can experiment with other ways after I have that

afaict you should actually use PyTypeObject rather than PyObject, otherwise it'll get the wrong type layout and you'll modify the wrong bytes, because PyTypeObject doesn't actually always match PyObject (iirc it doesn't match if you're on PyPy, or Py_TRACE_REFS is enabled, or ...).

@vultix
Copy link
Contributor

vultix commented Aug 14, 2022

(Sorry for the duplicate comment, accidentally commented from my wrong account)

Just to make sure I understand correctly, here's essentially what I'm doing now (pseudocode).

// First, I create the metaclass
let metaclass: *mut PyTypeObject = TypeCreator {
    name: "MyEnumMeta",
    base_type: &mut PyType_Type,
    __new__: PyType_Type.tp_new.unwrap() as *mut c_void,
    __instancecheck__: fancy_enum_instancecheck,
}
.create();

// Next, I create the base class for the enum
let enumclass: *mut PyTypeObject = TypeCreator {
    name: "MyEnum",
    basicsize: std::mem::size_of::<PyCell<MyEnum>>(),
    tp_dealloc: tp_dealloc::<MyEnum>,
    is_basetype: true,
    ..enum_variant_properties
}
.create();

// Set the metaclass as the ob_type for the new enumclass
(*(enumclass as *mut PyObject)).ob_type = metaclass;

// Finally, I create a new child class for each enum variant
let enum_variant_a: *mut PyTypeObject = TypeCreator {
    name: "A",
    basicsize: std::mem::size_of::<PyCell<MyEnum>>(),
    tp_dealloc: tp_dealloc::<MyEnum>,
    __new__: variant_a__new__,
}
.create();

// Set the metaclass as the ob_type for each enum variant
(*(enum_variant_a as *mut PyObject)).ob_type = metaclass;

This is all working correctly on the python side. I've correctly mimicked the behavior of the example here. I've confirmed this works even after changing the variant type in an &mut self function.

That said, it seems this method for setting the metaclass could be better:

// Set the metaclass as the ob_type for the new enumclass
(*(enumclass as *mut PyObject)).ob_type = metaclass;

afaict you should actually use PyTypeObject rather than PyObject

By this do you mean set ob_type on the PyTypeObject directly?

// Set the metaclass as the ob_type for the new enumclass
#[cfg(all(PyPy, not(Py_3_9)))]
{
    (*enumclass).ob_type = metaclass;
}
#[cfg(not(all(PyPy, not(Py_3_9))))]
{
    (*enumclass).ob_base.ob_base.ob_type = metaclass;
}

Is this correct? Should I be using PyType_FromMetaclass to construct the enumclass and variants altogether?

@vultix
Copy link
Contributor

vultix commented Aug 17, 2022

Alright, design question! Say I have this enum:

enum MyEnum {
    A(u8, u16),
    B {
		name: String
	}
}

From the python side, you can access the named field like this:

assert MyEnum.B(name="hello").name == "hello"

How do we access the tuple fields? I'm assuming we'll provide an implementation of __getitem__?

variant = MyEnum.A(1, 2)
assert variant[0] == 1
assert variant[1] == 2

Should we also make the class iterable so you can destructure it as a tuple? If so, should it only be iterable when there is a tuple variant?

variant = MyEnum.A(1, 2)
a, b = variant
assert a == 1 and b == 2

In both cases, do we disallow custom user implementations of __getitem__ and __iter__?

@davidhewitt
Copy link
Member

Is this correct? Should I be using PyType_FromMetaclass to construct the enumclass and variants altogether?

I think the downside of not using PyType_FromMetaclass (and instead modify the type object internals) is that we will only be able to have full ADT support with not(Py_LIMITED_API). That's seems like an acceptable tradeoff that the user can make with their implementation.

How do we access the tuple fields? I'm assuming we'll provide an implementation of __getitem__?

I think this is the only sensible option.

Should we also make the class iterable so you can destructure it as a tuple? If so, should it only be iterable when there is a tuple variant?

If I understand, there's a class per-variant, so we could make it so only the tuple variants are iterable?

In both cases, do we disallow custom user implementations of __getitem__ and __iter__?

For the C-style enums we came up with a technique which creates default implementations of __repr__ etc, which the user could override. Maybe we can do the same here?

@vultix
Copy link
Contributor

vultix commented Aug 18, 2022

A few more design questions considering this enum:

#[pyclass]
enum MyEnum {
    A(#[pyo3(get, set, name="first")] u8, #[pyo3(get, set)] u16),
    B {
		#[pyo3(get, set)]
		name: String,
		#[pyo3(get, set)]
		number: u8
	}
}
  • Enums are different from structs in that their fields are always public. Does this imply we should expose these fields to python by default?
  • Do we want to allow renaming tuple variant fields as shown above?
  • If the fields are get, set by default, is #[pyo3(skip)] the clearest way to skip exposing fields to python?

@vultix
Copy link
Contributor

vultix commented Aug 18, 2022

Here's what I'm leaning towards for each of these options:

Enums are different from structs in that their fields are always public. Does this imply we should expose these fields to python by default?

Although this would be convenient, I think we should keep this as explicit as possible and continue requiring #[pyo3(get, set)]. Doing this by default may mean a lot of extra codegen that isn't really used.

Do we want to allow renaming tuple variant fields as shown above?

I'm leaning towards no here. If you want a named value you should probably refactor your enum variant to be named. I'm even considering disallowing accessing unnamed fields from the python side altogether (admittedly mostly because I'm lazy and implementing all of this is taking longer than I'd hoped).

If the fields are get, set by default, is #[pyo3(skip)] the clearest way to skip exposing fields to python?

I don't think this will be necessary, as I don't think generating python accessors by default is a good idea

@davidhewitt
Copy link
Member

If we go down this route of making fields private, does that have implications for how Python match would behave? It might be strange if you're able to match on only part of the enum in Python.

... though actually, thinking about it, Python match doesn't enforce you match on all attributes of the object in question, so maybe this is fine?

@vultix
Copy link
Contributor

vultix commented Aug 19, 2022

You would only be able to match on those fields that have been exposed to Python, but would always be able to match on the variants themselves.

I just realized there's another side of this coin as well - how do you construct the enum variant?

For my motivating use case I would want to be able to construct any of the enum variants, by simply passing in all of the fields. After constructing the variant, the field accessors wouldn't be useful at all. With over 20 fields in my enum, generating all of those accessors is a lot of unnecessary codegen.

My proposal would be this:

  • Constructors are automatically generated for each variant, hopefully in a configurable manner
  • Accessors aren't generated by default, except when using #[pyo3(get, set)]
  • You will always be able to match on variants from python, but matching on fields from the python side will require a get accessor

@davidhewitt
Copy link
Member

davidhewitt commented Aug 20, 2022

This sounds fine to me!

Constructors are automatically generated for each variant, hopefully in a configurable manner

This has a lot of overlap with a possible #[pyclass(dataclass)] option we've discussed in the past (#1375) - implementing the constructor generation here will probably give us the tools to make progress on that idea too.

@flying-sheep
Copy link

flying-sheep commented Feb 14, 2023

Hi! I have a crate that’s mostly a class hierarchy and a parser creating it. I want to add a pyo3 option, which when enabled adds pyo3::pyclass to all structs and adds a pub fn pymodule() -> PyModule that transparently wraps the crate into a python module.

All works fine, except for a enum Shape { Ellipse(Ellipse), Lines(Lines), ... }, which PyO3 can’t wrap.

How can I manually wrap this single enum in a Python class today to avoid making the abstraction worse? Can I tell Pyo3 to create some of the boilerplate and write the rest manually?

@davidhewitt
Copy link
Member

Given the enum is using structs inside, you can start by adding #[pyclass] to Ellipse, Lines, structs etc.

Then you can create #[pyclass(name="Shape")] struct PyShape(Shape). Consider adding methods to construct all of the enum variants you want.

I think you should be able to generate a mostly equivalent Python API like this without too much manual work.

@messense
Copy link
Member

Probably not the best approach, but I have an example of wrapping a Rust enum using subclass here: https://github.com/messense/py-promql-parser/blob/0c723319aff9cea931062b29c8d0ef58e0851364/src/expr.rs.

@flying-sheep
Copy link

flying-sheep commented Feb 15, 2023

I see, so you wrap it, manually add constructors, and if some other Python class has fields containing the struct, you implement a getter instead of using #[get]. Like this:

#[derive(Debug, Clone, PartialEq)]
enum Shape { Ellipse(Ellipse), Lines(Lines), ... }

#[derive(Debug, Clone, PartialEq)]
#[pyclass(name="Shape")]
struct PyShape(Shape);
#[pymethods]
impl PyShape {
    #[new]  // generics aren’t actually possible for #[new] yet …
    fn new(inner: impl Into<Shape>) -> Self {
        PyShape(inner.into())
    }
}

// Struct containing the enum:
#[derive(Debug, Clone, PartialEq)]
#[pyclass]
pub struct ShapeDraw {
    #[pyo3(get)]
    pub pen: Pen,
    pub shape: Shape,
}
#[pymethods]
impl ShapeDraw {
    #[getter]
    fn get_shape(&self) -> PyShape {
        PyShape(self.shape.clone())
    }
}

Thank you for the help, sorry for hijacking this thread. Maybe adding some docs for current limitations and patterns to get around them would be a good idea?

@formlogic-robert
Copy link

Those examples using tuple structs don't actually demonstrate what these would map to in pydantic.
Can anybody share any example python code showing how to access the enum variants and their fields?

@gilescope
Copy link
Contributor

Full ADT support is holding back rust libraries from being used from python. E.g. as mentioned here: pemistahl/lingua-rs#177 - would be really nice if this wasn't the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
1.0
To Do
Development

No branches or pull requests