Trait ergonomics str implementation #4233

Zyell · 2024-06-04T22:17:27Z

Implementation of str as identified in #4207

This introduces the str argument to pyclass which optionally takes a formatting string using the shorthand notation inspired by thiserror. It expands similarly to this library with the one addition that empty unnamed or indexed format brackets will place self in the call to format. (this is particularly useful for enum formatting. Any formatting arguments after : within the brackets will be respected.

If no formatting string is provided an implementation of the Display trait is expected.

This update brings custom string formatting for PyClass with a #[pyclass(str = "format string")] attribute. It allows users to specify how their PyClass objects are converted to string in Python. The implementation includes additional tests and parsing logic.

Zyell · 2024-06-04T22:24:16Z

This is a draft. The following still needs to be implemented:

newsfragment
doc updates
expand tests
capture renaming of variants for proper output
ensure manually implemented __str__ is not allowed if automatic __str__ is requested.

The format expansion and optional value passed to the str are both completed. I wanted to get this created before going too far so I could get feedback. 😃

…d str is not allowed when automated str is requested

…a future implementation since it will use the same shorthand formatting logic

Zyell · 2024-06-05T18:22:08Z

@davidhewitt I have an implementation question for you. Currently we have the following implementation for str:

The user passes just str: We use the Display Trait implementation they have implemented for str formatting.
The user passes str("<formatting shorthand>"): We use the shorthand format to map named members value and allow for "{}", "{:?}", ... which allow for self to be passed into the formatting string and leverages the implemented traits accordingly.

Given that the user can rename variants on an enum, it seems to me that it makes sense for us to respect this only when the user provides a shorthand formatting string to str. This will remain consistent to the by default we use whatever Display implementation is found.

Are there any other renaming gotchas I should consider? Does this implementation seem reasonable?

Implemented the capacity to handle renamed variants in enum string representation. Now, custom Python names for enum variants will be correctly reflected when calling the __str__() method on an enum instance. Additionally, the related test has been updated to reflect this change.

mejrs

Thanks, I really like this. Nicely done :)

pyo3-macros-backend/src/pyclass.rs

newsfragments/4233.added.md

tests/ui/invalid_pyclass_args.stderr

Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>

…e targeted span for the format string.

… reduce span of invalid member)

…ightly, verified additional test cases on both nightly and stable

Icxolu

Nice work! I really like the way it works on structs, but I'm a bit sceptical for the enum one. I left my thoughts on that in the review comments and I'm keen to see what others think.

Icxolu · 2024-06-07T19:02:21Z

tests/test_class_formatting.rs

+#[pyclass(str = "{:?}")]
+#[derive(PartialEq, Debug)]
+enum ComplexEnumWithStr {
+    A(u32),
+    B { msg: String },
+}


For complex enums we probably want individual format strings per variant, similar to how thiserror works on enums, or we only support the delegation to Display for now. From a format string like this I would have no idea what to expect for the outcome, especially if the variant have a different number of fields.

Yes, this could be helpful. I think it runs into the same magic-like territory of some of the other items you identified. Ultimately, we could implement this and rely on documenting everything and hope that confused users will understand the documentation/find it, or we curtail the application of the shorthand functionality to only apply in simple cases (simple enums, structs, and no renaming used) and require explicit use of Display (and Debug when repr is eventually implemented). I do tend to lean toward the explicit, though I certainly understand the desire for utility.

Icxolu · 2024-06-07T19:10:59Z

guide/src/class/object.md

+* `"{x}"` -> `"{}", self.x`
+* `"{0}"` -> `"{}", self.0`
+* `"{x:?}"` -> `"{:?}", self.x`
+* `"{:?}"` -> `"{:?}", self`


Hmm, I'm not sure how I feel about this one. Not specifying a ident here feels unintuitive to me. I'm also not sure how useful access to self here is...

The main benefit for access to self is on enums. It would be how you format the variant. For instance, in the current simple enum case, we implement repr to be ".<variant_name>". Rust's default for debug would be just to name the variant, not the cls in the output. The current simple enum behavior can be retained by doing something like this:

#[pyclass(eq, str = "MyEnum.{:?}")] #[derive(Debug, PartialEq)] pub enum MyEnum { Variant, OtherVariant, }

I am not sure how else we might declare this with a shorthand format without the user needing to implement Display directly. Though maybe we should force them to that for simple enums (so removing the shorthand format string entirely for simple enums). They only downside I see to that, is if they want to have a different formatting applied on the Python side vs the Rust side (though they could implement both str and Display explicitly to achieve it). I suppose we have to strike the right balance for flexibility vs. intuition while retaining decent ergonomics.

Icxolu · 2024-06-07T19:20:32Z

tests/test_class_formatting.rs

+#[pyclass(eq, str = "MyEnum.{}")]
+#[derive(Debug, PartialEq)]
+pub enum MyEnum3 {
+    #[pyo3(name = "AwesomeVariant")]
+    Variant,
+    OtherVariant,
+}


Hmm, similar to complex enum I feel like we should either allow specifying the display string per variant, or only delegate to Display. This feels a bit to magic for my taste, given that this is meant for simple cases a convenience and more complex cases can still implement __str__ manually.

Also: Shouldn't this require Display for MyEnum3 the way it is currently written?

I agree that this case comes off as a bit of magic. It is a nesting of convenience methods. But this is exactly what happens in the current repr case for simple enums. So much flexibility was given that it obscures intent when just reading the code.

The reason this doesn't implement Display directly is also a bit odd. Basically, renaming builds a match case to map the variants to a String representing the name on the Python side. Without the renaming you would write #[pyclass(str="MyEnum.{:?}") to accomplish the same result as the above referenced code that contains renaming. That is because you are no longer passing a variant to the format macro, you are passing a string. If you retained the debug format string, you would get MyEnum."AwesomeVariant". Changing the format string gives the output without the quotations in the output string.

I struggled with the behavior around these renaming cases. On the Python side, the user will see these other names, and if the format string outputs the rust names in Python, this would be un-intuitive. However. supporting the renaming leads to these un-intuitive format string cases. It feels like the high degree of flexiblity leads to an un-intuitive result one way or another.

Ah, I was already wondering what this renaming business was about. (Haven't looked too closely at the implementation yet).

If you retained the debug format string, you would get MyEnum."AwesomeVariant".

Especially something like this is fragile and prone to errors with weird messages if you get it wrong. I don't think this is something that we should support or recommend.

We could allow #[pyo3(str = "...")] on the variants to do this, but I'm not sure if it is worth to add so much complexity here. Maybe a viable option is to add another attribute (str_variant?, not sure) that replicates the current behavior and have str only work on Display. These options would be exclusive, so at most one of the could be active. And a manual implementation of __str__ would of course also be possible if neither attribute was given.

@Icxolu I have made a check for the rename cases and made the string shorthand format incompatible with renaming. This strange formatting scenario can no longer arise to surprise users.

# Conflicts: # tests/ui/invalid_pyclass_args.rs # tests/ui/invalid_pyclass_args.stderr

… main.

davidhewitt · 2024-06-08T07:39:25Z

Thanks, this looks looks really great! I have to confess when I proposed the feature I hadn't realised the number of interactions around renaming and really thought too hard about the differences between Rust and Python formatting. I'd prefer move slowly and err on the side of not making this too powerful too quickly, even if that means we only allow simpler cases we're sure about.

The cases I think risk causing users pain:

#[pyclass(repr)] without a format string - I think I see now that almost always it would be a bad default to use the Rust Debug implementation here, because users would expect a Python syntax.
Interactions with struct / field renaming, which have been discussed at length above.
Enum scoping - I think users probably want MyEnum. prepended on their string most of the time? It would be nice to not need to have to specify this.

Moving to the design, this makes me wonder a few things:

Is allowing :? debug format specifier ever useful in this shorthand? Probably yes, in repr particularly. It makes me a bit easy that it makes it easy for Rust syntax to leak into the formatting though 🤔
For complex enums, should we require all variants to have a str = "format" specifier, and then we'll be responsible for prepending the correct variant name?

Maybe a lot of these design problems are more related to repr - e.g. would str on a complex enum even want to refer to the variant name? I imagine the repr definitely would. I think field names are also more usually a concern for repr.

I suppose, what are the forms we're sure about? I think it's the following:

#[pyclass(str)] using Display seems intuitive on structs. On enums we run into the renaming problem.
#[pyclass(str = "<format>")] also seems clear on structs. On enums we have challenges with renaming, and we have a bit of a question whether to put the format string on the top-level or the variants.

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

Icxolu · 2024-06-08T09:15:47Z

I suppose, what are the forms we're sure about? I think it's the following:

#[pyclass(str)] using Display seems intuitive on structs. On enums we run into the renaming problem.

#[pyclass(str = "<format>")] also seems clear on structs. On enums we have challenges with renaming, and we have a bit of a question whether to put the format string on the top-level or the variants.

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

I agree here. On structs it seems pretty clear and intuitive what this does.

One thing we should probably test here: Does the formatting handle raw identifiers (correctly)?

#[pyclass(str = "{r#type}")]
struct Foo {
    r#type: Bar
}

Is allowing :? debug format specifier ever useful in this shorthand? Probably yes, in repr particularly. It makes me a bit easy that it makes it easy for Rust syntax to leak into the formatting though

It's true that it easier to leak syntax here, but we also don't have any guarantee about the syntax in any Display impl and I think it's pretty clear what to expect from a Debug format in Rust usually. So allowing this should not cause too much confusion and would be pretty useful for something like arrays or Vecs

Zyell · 2024-06-10T17:49:27Z

@Icxolu Thank you for pointing out the raw identifiers case! It now properly handles that in the latest commit.

I agree that it is reasonable to expect that the users know what will come from a debug format in Rust for a given type. For me, the biggest concern in this feature is ensuring the user understands the cause and effect of this feature without surprises. I mentioned above that if we have strange cases, we would need to document them thoroughly, but such a thing is brittle (it relies on the user finding and reading those caveats). As for the leaking of rust syntax, I don't see that as a problem. As mentioned, the user should generally know what to expect with debug output in Rust. Also, I actually like some of the debug output in rust and would personally retain it in my own implementations of repr or str. In my Python code I mostly use repr implementations (since they are picked up by str unless you implement str separately) and use them for logging and debugging. As my primary use case is debugging quickly, I prefer output structures that "map the issue" quickly and efficiently for me. While there is guidance for repr and str implementation in Python, there is a "practicality beats purity" implementation within dataclasses themselves (for example enabling/disabling per field - such that repr could not be used to fully recreate the instance).

I think that the renaming of variants adds surprises to how the user implements the shorthand format strings and I think requiring every variant to have a str format might be onerous. I am leaning toward the following caveats:

If the user has implemented naming or renaming for the variants or the class, disallow the string format shorthand and direct implementation of str or the Display trait. We would raise a compile time error explaining this.
Allow any format specifiers in the shorthand string at the user's discretion such that it is clear what type is being output (hence forbidding the above case)
Allow specifying variant string formats for ease of use with complex enum cases in particular.

One that I'm not sure about:

* Enum scoping - I think users probably want `MyEnum.` prepended on their string most of the time? It would be nice to not need to have to specify this.

If we do this automatically (I really wish Rust would do this by default for Debug derivations :-) ), it would lead to surprising behavior if they have implemented Display directly, as it wouldn't be used. I haven't played around with this, but could we detect the Display implementation in the macro and use that knowledge to switch accordingly? Then I think this would be great.

# Conflicts: # pyo3-macros-backend/src/pyclass.rs

…ss, field, or variant args when mixed with a str shorthand formatter.

Zyell · 2024-06-25T21:08:05Z

@davidhewitt @Icxolu Sorry for the delay in returning to this PR. I have made the string shorthand format incompatible with any renaming, forcing the user to either implement Display or __str__ explicitly to remove any ambiguity in the shorthand format token choices. It looks like the remaining issues surround handling Enums:

Enum scoping affecting both Simple and Complex (prepending the enum class name)
Complex Enum variant formatting

For the first case, I don't want to make the assumption that they need the name prepended if Display has been implemented explicitly. I don't believe proc-macros have access to the what traits have been implemented. I looked into how I could bound the problem, but I'm not entirely sure how to do so just yet. There is also the other case where they explicitly prepended the class name using a shorthand string formatter and we don't want to double prepend it. If we make the assumption for the user, it would be surprising.

For the second case, it was mentioned

* For complex enums, should we require all variants to have a `str = "format"` specifier, and then we'll be responsible for prepending the correct variant name?

This again runs into the potential double pre-pending mentioned above, or maybe the user simply doesn't want it prepended at all.

Which brings us to the following:

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

It isn't clear to me how the shorthand logic should function for the enum cases without making potentially bad assumptions for the user. For this initial release, should we release just the following?

Support for str which uses a Display trait implementation for structs, simple enums, and complex enums.
Support a string formatter str="<string formatter>" only for structs and disallowed for enums.
Disallow the the string formatter for all cases of renaming across all pyclass implementations. (Optionally I can re-enable this for structs if I remove the ability to reference self in the shorthand notation with {}. Then there won't be a chance of accidentally introducing incorrect field names when using the shorthand for structs. This only works under the assumption that the shorthand only applies for structs.)

Thoughts?

Michael Gilbert added 7 commits June 4, 2024 08:26

update: removed debug print statements

a058296

update: added members to ToTokens implementation.

0b4ddbc

update: reverted to display

1f3bb49

update: initial tests

ae28b72

update: made STR public for pyclass default implementations

f3ebabe

update: generalizing str implementation

51d52c1

Michael Gilbert added 10 commits June 4, 2024 15:26

update: remove redundant test

fcd51a0

update: implemented compile test to validate that manually implemente…

804f81e

…d str is not allowed when automated str is requested

update: updated compile time error check

a9b6d5e

update: rename test file and code cleanup

e7fd6bd

update: format cleanup

d6a3ac6

update: added news fragment

02f2202

fix: corrected clippy findings

c075611

update: fixed mixed formatting case and improved test coverage

5297915

update: improved test coverage

c940310

refactor: generalized formatting function to accommodate __repr__ in …

4059225

…a future implementation since it will use the same shorthand formatting logic

Michael Gilbert added 3 commits June 5, 2024 16:28

fix: fixed clippy finding

0c33be0

update: fixed test function names

9b89c40

mejrs reviewed Jun 6, 2024

View reviewed changes

pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved

newsfragments/4233.added.md Outdated Show resolved Hide resolved

tests/ui/invalid_pyclass_args.stderr Outdated Show resolved Hide resolved

Zyell and others added 7 commits June 6, 2024 13:22

Update pyo3-macros-backend/src/pyclass.rs

d8cd29a

Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>

Update newsfragments/4233.added.md

9d8b170

Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>

update: implemented hygienic calls and added hygiene tests.

23b0b8f

update: cargo fmt

c653bef

update: retained LitStr usage in the quote in order to preserve a mor…

f8d1be3

…e targeted span for the format string.

update: retained LitStr usage in the quote in order to preserve a mor…

6e77f9d

…e targeted span for the format string.

update: added compile time error check for invalid fields (looking to…

9f51d47

… reduce span of invalid member)

Michael Gilbert added 4 commits June 6, 2024 19:08

update: implemented a subspan to improve errors in format string on n…

12eb032

…ightly, verified additional test cases on both nightly and stable

update: updated test output

4d0e738

update: updated with clippy findings

f4a223f

update: added doc entries.

cc1317c

Icxolu reviewed Jun 7, 2024

View reviewed changes

Michael Gilbert added 2 commits June 7, 2024 13:23

Merge branch 'refs/heads/main' into trait_ergonomics_str

8d73c33

# Conflicts: # tests/ui/invalid_pyclass_args.rs # tests/ui/invalid_pyclass_args.stderr

update: corrected error output for compile errors after updating from…

678208e

… main.

update: added support for raw identifiers used in field names

1f84dd3

Michael Gilbert added 3 commits June 25, 2024 10:16

Merge branch 'refs/heads/main' into trait_ergonomics_str

d0e7287

# Conflicts: # pyo3-macros-backend/src/pyclass.rs

update: aligning branch with main

e9e149d

update: added compile time error when mixing rename_all or name pycla…

0137083

…ss, field, or variant args when mixed with a str shorthand formatter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trait ergonomics str implementation #4233

Trait ergonomics str implementation #4233

Zyell commented Jun 4, 2024

Zyell commented Jun 4, 2024 •

edited

Loading

Zyell commented Jun 5, 2024

mejrs left a comment

Icxolu left a comment

Icxolu Jun 7, 2024

Zyell Jun 7, 2024

Icxolu Jun 7, 2024

Zyell Jun 7, 2024

Icxolu Jun 7, 2024

Zyell Jun 7, 2024

Icxolu Jun 7, 2024

Zyell Jun 25, 2024

davidhewitt commented Jun 8, 2024

Icxolu commented Jun 8, 2024

Zyell commented Jun 10, 2024

Zyell commented Jun 25, 2024 •

edited

Loading

Trait ergonomics str implementation #4233

Are you sure you want to change the base?

Trait ergonomics str implementation #4233

Conversation

Zyell commented Jun 4, 2024

Zyell commented Jun 4, 2024 • edited Loading

Zyell commented Jun 5, 2024

mejrs left a comment

Choose a reason for hiding this comment

Icxolu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidhewitt commented Jun 8, 2024

Icxolu commented Jun 8, 2024

Zyell commented Jun 10, 2024

Zyell commented Jun 25, 2024 • edited Loading

Zyell commented Jun 4, 2024 •

edited

Loading

Zyell commented Jun 25, 2024 •

edited

Loading