-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trait ergonomics str implementation #4233
base: main
Are you sure you want to change the base?
Conversation
This update brings custom string formatting for PyClass with a #[pyclass(str = "format string")] attribute. It allows users to specify how their PyClass objects are converted to string in Python. The implementation includes additional tests and parsing logic.
This is a draft. The following still needs to be implemented:
The format expansion and optional value passed to the str are both completed. I wanted to get this created before going too far so I could get feedback. 😃 |
…d str is not allowed when automated str is requested
…a future implementation since it will use the same shorthand formatting logic
@davidhewitt I have an implementation question for you. Currently we have the following implementation for
Given that the user can rename variants on an enum, it seems to me that it makes sense for us to respect this only when the user provides a shorthand formatting string to Are there any other renaming gotchas I should consider? Does this implementation seem reasonable? |
Implemented the capacity to handle renamed variants in enum string representation. Now, custom Python names for enum variants will be correctly reflected when calling the __str__() method on an enum instance. Additionally, the related test has been updated to reflect this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I really like this. Nicely done :)
Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>
Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>
…e targeted span for the format string.
…e targeted span for the format string.
… reduce span of invalid member)
…ightly, verified additional test cases on both nightly and stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I really like the way it works on structs, but I'm a bit sceptical for the enum one. I left my thoughts on that in the review comments and I'm keen to see what others think.
#[pyclass(str = "{:?}")] | ||
#[derive(PartialEq, Debug)] | ||
enum ComplexEnumWithStr { | ||
A(u32), | ||
B { msg: String }, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For complex enums we probably want individual format strings per variant, similar to how thiserror
works on enums, or we only support the delegation to Display
for now. From a format string like this I would have no idea what to expect for the outcome, especially if the variant have a different number of fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this could be helpful. I think it runs into the same magic-like territory of some of the other items you identified. Ultimately, we could implement this and rely on documenting everything and hope that confused users will understand the documentation/find it, or we curtail the application of the shorthand functionality to only apply in simple cases (simple enums, structs, and no renaming used) and require explicit use of Display (and Debug when repr is eventually implemented). I do tend to lean toward the explicit, though I certainly understand the desire for utility.
* `"{x}"` -> `"{}", self.x` | ||
* `"{0}"` -> `"{}", self.0` | ||
* `"{x:?}"` -> `"{:?}", self.x` | ||
* `"{:?}"` -> `"{:?}", self` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure how I feel about this one. Not specifying a ident here feels unintuitive to me. I'm also not sure how useful access to self
here is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main benefit for access to self is on enums. It would be how you format the variant. For instance, in the current simple enum case, we implement repr to be ".<variant_name>". Rust's default for debug would be just to name the variant, not the cls in the output. The current simple enum behavior can be retained by doing something like this:
#[pyclass(eq, str = "MyEnum.{:?}")]
#[derive(Debug, PartialEq)]
pub enum MyEnum {
Variant,
OtherVariant,
}
I am not sure how else we might declare this with a shorthand format without the user needing to implement Display directly. Though maybe we should force them to that for simple enums (so removing the shorthand format string entirely for simple enums). They only downside I see to that, is if they want to have a different formatting applied on the Python side vs the Rust side (though they could implement both str and Display explicitly to achieve it). I suppose we have to strike the right balance for flexibility vs. intuition while retaining decent ergonomics.
tests/test_class_formatting.rs
Outdated
#[pyclass(eq, str = "MyEnum.{}")] | ||
#[derive(Debug, PartialEq)] | ||
pub enum MyEnum3 { | ||
#[pyo3(name = "AwesomeVariant")] | ||
Variant, | ||
OtherVariant, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, similar to complex enum I feel like we should either allow specifying the display string per variant, or only delegate to Display
. This feels a bit to magic for my taste, given that this is meant for simple cases a convenience and more complex cases can still implement __str__
manually.
Also: Shouldn't this require Display for MyEnum3
the way it is currently written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this case comes off as a bit of magic. It is a nesting of convenience methods. But this is exactly what happens in the current repr case for simple enums. So much flexibility was given that it obscures intent when just reading the code.
The reason this doesn't implement Display directly is also a bit odd. Basically, renaming builds a match case to map the variants to a String representing the name on the Python side. Without the renaming you would write #[pyclass(str="MyEnum.{:?}")
to accomplish the same result as the above referenced code that contains renaming. That is because you are no longer passing a variant to the format macro, you are passing a string. If you retained the debug format string, you would get MyEnum."AwesomeVariant"
. Changing the format string gives the output without the quotations in the output string.
I struggled with the behavior around these renaming cases. On the Python side, the user will see these other names, and if the format string outputs the rust names in Python, this would be un-intuitive. However. supporting the renaming leads to these un-intuitive format string cases. It feels like the high degree of flexiblity leads to an un-intuitive result one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was already wondering what this renaming business was about. (Haven't looked too closely at the implementation yet).
If you retained the debug format string, you would get
MyEnum."AwesomeVariant"
.
Especially something like this is fragile and prone to errors with weird messages if you get it wrong. I don't think this is something that we should support or recommend.
We could allow #[pyo3(str = "...")]
on the variants to do this, but I'm not sure if it is worth to add so much complexity here. Maybe a viable option is to add another attribute (str_variant
?, not sure) that replicates the current behavior and have str
only work on Display
. These options would be exclusive, so at most one of the could be active. And a manual implementation of __str__
would of course also be possible if neither attribute was given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Icxolu I have made a check for the rename cases and made the string shorthand format incompatible with renaming. This strange formatting scenario can no longer arise to surprise users.
# Conflicts: # tests/ui/invalid_pyclass_args.rs # tests/ui/invalid_pyclass_args.stderr
Thanks, this looks looks really great! I have to confess when I proposed the feature I hadn't realised the number of interactions around renaming and really thought too hard about the differences between Rust and Python formatting. I'd prefer move slowly and err on the side of not making this too powerful too quickly, even if that means we only allow simpler cases we're sure about. The cases I think risk causing users pain:
Moving to the design, this makes me wonder a few things:
Maybe a lot of these design problems are more related to I suppose, what are the forms we're sure about? I think it's the following:
Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design? |
I agree here. On structs it seems pretty clear and intuitive what this does. One thing we should probably test here: Does the formatting handle raw identifiers (correctly)? #[pyclass(str = "{r#type}")]
struct Foo {
r#type: Bar
}
It's true that it easier to leak syntax here, but we also don't have any guarantee about the syntax in any |
@Icxolu Thank you for pointing out the raw identifiers case! It now properly handles that in the latest commit. I agree that it is reasonable to expect that the users know what will come from a debug format in Rust for a given type. For me, the biggest concern in this feature is ensuring the user understands the cause and effect of this feature without surprises. I mentioned above that if we have strange cases, we would need to document them thoroughly, but such a thing is brittle (it relies on the user finding and reading those caveats). As for the leaking of rust syntax, I don't see that as a problem. As mentioned, the user should generally know what to expect with debug output in Rust. Also, I actually like some of the debug output in rust and would personally retain it in my own implementations of repr or str. In my Python code I mostly use repr implementations (since they are picked up by str unless you implement str separately) and use them for logging and debugging. As my primary use case is debugging quickly, I prefer output structures that "map the issue" quickly and efficiently for me. While there is guidance for repr and str implementation in Python, there is a "practicality beats purity" implementation within dataclasses themselves (for example enabling/disabling per field - such that repr could not be used to fully recreate the instance). I think that the renaming of variants adds surprises to how the user implements the shorthand format strings and I think requiring every variant to have a str format might be onerous. I am leaning toward the following caveats:
One that I'm not sure about:
If we do this automatically (I really wish Rust would do this by default for Debug derivations :-) ), it would lead to surprising behavior if they have implemented Display directly, as it wouldn't be used. I haven't played around with this, but could we detect the Display implementation in the macro and use that knowledge to switch accordingly? Then I think this would be great. |
# Conflicts: # pyo3-macros-backend/src/pyclass.rs
…ss, field, or variant args when mixed with a str shorthand formatter.
@davidhewitt @Icxolu Sorry for the delay in returning to this PR. I have made the string shorthand format incompatible with any renaming, forcing the user to either implement
For the first case, I don't want to make the assumption that they need the name prepended if Display has been implemented explicitly. I don't believe proc-macros have access to the what traits have been implemented. I looked into how I could bound the problem, but I'm not entirely sure how to do so just yet. There is also the other case where they explicitly prepended the class name using a shorthand string formatter and we don't want to double prepend it. If we make the assumption for the user, it would be surprising. For the second case, it was mentioned
This again runs into the potential double pre-pending mentioned above, or maybe the user simply doesn't want it prepended at all. Which brings us to the following:
It isn't clear to me how the shorthand logic should function for the enum cases without making potentially bad assumptions for the user. For this initial release, should we release just the following?
Thoughts? |
Implementation of str as identified in #4207
This introduces the str argument to pyclass which optionally takes a formatting string using the shorthand notation inspired by thiserror. It expands similarly to this library with the one addition that empty unnamed or indexed format brackets will place
self
in the call to format. (this is particularly useful for enum formatting. Any formatting arguments after:
within the brackets will be respected.If no formatting string is provided an implementation of the
Display
trait is expected.