Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trait ergonomics str implementation #4233

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

Zyell
Copy link
Contributor

@Zyell Zyell commented Jun 4, 2024

Implementation of str as identified in #4207

This introduces the str argument to pyclass which optionally takes a formatting string using the shorthand notation inspired by thiserror. It expands similarly to this library with the one addition that empty unnamed or indexed format brackets will place self in the call to format. (this is particularly useful for enum formatting. Any formatting arguments after : within the brackets will be respected.

If no formatting string is provided an implementation of the Display trait is expected.

Michael Gilbert added 7 commits June 4, 2024 08:26
This update brings custom string formatting for PyClass with a #[pyclass(str = "format string")] attribute. It allows users to specify how their PyClass objects are converted to string in Python. The implementation includes additional tests and parsing logic.
@Zyell
Copy link
Contributor Author

Zyell commented Jun 4, 2024

This is a draft. The following still needs to be implemented:

  • newsfragment
  • doc updates
  • expand tests
  • capture renaming of variants for proper output
  • ensure manually implemented __str__ is not allowed if automatic __str__ is requested.

The format expansion and optional value passed to the str are both completed. I wanted to get this created before going too far so I could get feedback. 😃

@Zyell
Copy link
Contributor Author

Zyell commented Jun 5, 2024

@davidhewitt I have an implementation question for you. Currently we have the following implementation for str:

  • The user passes just str: We use the Display Trait implementation they have implemented for str formatting.
  • The user passes str("<formatting shorthand>"): We use the shorthand format to map named members value and allow for "{}", "{:?}", ... which allow for self to be passed into the formatting string and leverages the implemented traits accordingly.

Given that the user can rename variants on an enum, it seems to me that it makes sense for us to respect this only when the user provides a shorthand formatting string to str. This will remain consistent to the by default we use whatever Display implementation is found.

Are there any other renaming gotchas I should consider? Does this implementation seem reasonable?

Michael Gilbert added 3 commits June 5, 2024 16:28
Implemented the capacity to handle renamed variants in enum string representation. Now, custom Python names for enum variants will be correctly reflected when calling the __str__() method on an enum instance. Additionally, the related test has been updated to reflect this change.
Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I really like this. Nicely done :)

pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
newsfragments/4233.added.md Outdated Show resolved Hide resolved
tests/ui/invalid_pyclass_args.stderr Outdated Show resolved Hide resolved
Zyell and others added 7 commits June 6, 2024 13:22
Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>
Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>
Copy link
Contributor

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I really like the way it works on structs, but I'm a bit sceptical for the enum one. I left my thoughts on that in the review comments and I'm keen to see what others think.

Comment on lines +90 to +95
#[pyclass(str = "{:?}")]
#[derive(PartialEq, Debug)]
enum ComplexEnumWithStr {
A(u32),
B { msg: String },
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For complex enums we probably want individual format strings per variant, similar to how thiserror works on enums, or we only support the delegation to Display for now. From a format string like this I would have no idea what to expect for the outcome, especially if the variant have a different number of fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be helpful. I think it runs into the same magic-like territory of some of the other items you identified. Ultimately, we could implement this and rely on documenting everything and hope that confused users will understand the documentation/find it, or we curtail the application of the shorthand functionality to only apply in simple cases (simple enums, structs, and no renaming used) and require explicit use of Display (and Debug when repr is eventually implemented). I do tend to lean toward the explicit, though I certainly understand the desire for utility.

* `"{x}"` -> `"{}", self.x`
* `"{0}"` -> `"{}", self.0`
* `"{x:?}"` -> `"{:?}", self.x`
* `"{:?}"` -> `"{:?}", self`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure how I feel about this one. Not specifying a ident here feels unintuitive to me. I'm also not sure how useful access to self here is...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benefit for access to self is on enums. It would be how you format the variant. For instance, in the current simple enum case, we implement repr to be ".<variant_name>". Rust's default for debug would be just to name the variant, not the cls in the output. The current simple enum behavior can be retained by doing something like this:

#[pyclass(eq, str = "MyEnum.{:?}")]
#[derive(Debug, PartialEq)]
pub enum MyEnum {
    Variant,
    OtherVariant,
}

I am not sure how else we might declare this with a shorthand format without the user needing to implement Display directly. Though maybe we should force them to that for simple enums (so removing the shorthand format string entirely for simple enums). They only downside I see to that, is if they want to have a different formatting applied on the Python side vs the Rust side (though they could implement both str and Display explicitly to achieve it). I suppose we have to strike the right balance for flexibility vs. intuition while retaining decent ergonomics.

Comment on lines 30 to 36
#[pyclass(eq, str = "MyEnum.{}")]
#[derive(Debug, PartialEq)]
pub enum MyEnum3 {
#[pyo3(name = "AwesomeVariant")]
Variant,
OtherVariant,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, similar to complex enum I feel like we should either allow specifying the display string per variant, or only delegate to Display. This feels a bit to magic for my taste, given that this is meant for simple cases a convenience and more complex cases can still implement __str__ manually.

Also: Shouldn't this require Display for MyEnum3 the way it is currently written?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this case comes off as a bit of magic. It is a nesting of convenience methods. But this is exactly what happens in the current repr case for simple enums. So much flexibility was given that it obscures intent when just reading the code.

The reason this doesn't implement Display directly is also a bit odd. Basically, renaming builds a match case to map the variants to a String representing the name on the Python side. Without the renaming you would write #[pyclass(str="MyEnum.{:?}") to accomplish the same result as the above referenced code that contains renaming. That is because you are no longer passing a variant to the format macro, you are passing a string. If you retained the debug format string, you would get MyEnum."AwesomeVariant". Changing the format string gives the output without the quotations in the output string.

I struggled with the behavior around these renaming cases. On the Python side, the user will see these other names, and if the format string outputs the rust names in Python, this would be un-intuitive. However. supporting the renaming leads to these un-intuitive format string cases. It feels like the high degree of flexiblity leads to an un-intuitive result one way or another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was already wondering what this renaming business was about. (Haven't looked too closely at the implementation yet).

If you retained the debug format string, you would get MyEnum."AwesomeVariant".

Especially something like this is fragile and prone to errors with weird messages if you get it wrong. I don't think this is something that we should support or recommend.

We could allow #[pyo3(str = "...")] on the variants to do this, but I'm not sure if it is worth to add so much complexity here. Maybe a viable option is to add another attribute (str_variant?, not sure) that replicates the current behavior and have str only work on Display. These options would be exclusive, so at most one of the could be active. And a manual implementation of __str__ would of course also be possible if neither attribute was given.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Icxolu I have made a check for the rename cases and made the string shorthand format incompatible with renaming. This strange formatting scenario can no longer arise to surprise users.

Michael Gilbert added 2 commits June 7, 2024 13:23
# Conflicts:
#	tests/ui/invalid_pyclass_args.rs
#	tests/ui/invalid_pyclass_args.stderr
@davidhewitt
Copy link
Member

Thanks, this looks looks really great! I have to confess when I proposed the feature I hadn't realised the number of interactions around renaming and really thought too hard about the differences between Rust and Python formatting. I'd prefer move slowly and err on the side of not making this too powerful too quickly, even if that means we only allow simpler cases we're sure about.

The cases I think risk causing users pain:

  • #[pyclass(repr)] without a format string - I think I see now that almost always it would be a bad default to use the Rust Debug implementation here, because users would expect a Python syntax.
  • Interactions with struct / field renaming, which have been discussed at length above.
  • Enum scoping - I think users probably want MyEnum. prepended on their string most of the time? It would be nice to not need to have to specify this.

Moving to the design, this makes me wonder a few things:

  • Is allowing :? debug format specifier ever useful in this shorthand? Probably yes, in repr particularly. It makes me a bit easy that it makes it easy for Rust syntax to leak into the formatting though 🤔
  • For complex enums, should we require all variants to have a str = "format" specifier, and then we'll be responsible for prepending the correct variant name?

Maybe a lot of these design problems are more related to repr - e.g. would str on a complex enum even want to refer to the variant name? I imagine the repr definitely would. I think field names are also more usually a concern for repr.

I suppose, what are the forms we're sure about? I think it's the following:

  • #[pyclass(str)] using Display seems intuitive on structs. On enums we run into the renaming problem.
  • #[pyclass(str = "<format>")] also seems clear on structs. On enums we have challenges with renaming, and we have a bit of a question whether to put the format string on the top-level or the variants.

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

@Icxolu
Copy link
Contributor

Icxolu commented Jun 8, 2024

I suppose, what are the forms we're sure about? I think it's the following:

  • #[pyclass(str)] using Display seems intuitive on structs. On enums we run into the renaming problem.
  • #[pyclass(str = "<format>")] also seems clear on structs. On enums we have challenges with renaming, and we have a bit of a question whether to put the format string on the top-level or the variants.

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

I agree here. On structs it seems pretty clear and intuitive what this does.

One thing we should probably test here: Does the formatting handle raw identifiers (correctly)?

#[pyclass(str = "{r#type}")]
struct Foo {
    r#type: Bar
}
  • Is allowing :? debug format specifier ever useful in this shorthand? Probably yes, in repr particularly. It makes me a bit easy that it makes it easy for Rust syntax to leak into the formatting though

It's true that it easier to leak syntax here, but we also don't have any guarantee about the syntax in any Display impl and I think it's pretty clear what to expect from a Debug format in Rust usually. So allowing this should not cause too much confusion and would be pretty useful for something like arrays or Vecs

@Zyell
Copy link
Contributor Author

Zyell commented Jun 10, 2024

@Icxolu Thank you for pointing out the raw identifiers case! It now properly handles that in the latest commit.

I agree that it is reasonable to expect that the users know what will come from a debug format in Rust for a given type. For me, the biggest concern in this feature is ensuring the user understands the cause and effect of this feature without surprises. I mentioned above that if we have strange cases, we would need to document them thoroughly, but such a thing is brittle (it relies on the user finding and reading those caveats). As for the leaking of rust syntax, I don't see that as a problem. As mentioned, the user should generally know what to expect with debug output in Rust. Also, I actually like some of the debug output in rust and would personally retain it in my own implementations of repr or str. In my Python code I mostly use repr implementations (since they are picked up by str unless you implement str separately) and use them for logging and debugging. As my primary use case is debugging quickly, I prefer output structures that "map the issue" quickly and efficiently for me. While there is guidance for repr and str implementation in Python, there is a "practicality beats purity" implementation within dataclasses themselves (for example enabling/disabling per field - such that repr could not be used to fully recreate the instance).

I think that the renaming of variants adds surprises to how the user implements the shorthand format strings and I think requiring every variant to have a str format might be onerous. I am leaning toward the following caveats:

  • If the user has implemented naming or renaming for the variants or the class, disallow the string format shorthand and direct implementation of str or the Display trait. We would raise a compile time error explaining this.
  • Allow any format specifiers in the shorthand string at the user's discretion such that it is clear what type is being output (hence forbidding the above case)
  • Allow specifying variant string formats for ease of use with complex enum cases in particular.

One that I'm not sure about:

* Enum scoping - I think users probably want `MyEnum.` prepended on their string most of the time? It would be nice to not need to have to specify this.

If we do this automatically (I really wish Rust would do this by default for Debug derivations :-) ), it would lead to surprising behavior if they have implemented Display directly, as it wouldn't be used. I haven't played around with this, but could we detect the Display implementation in the macro and use that knowledge to switch accordingly? Then I think this would be great.

Michael Gilbert added 3 commits June 25, 2024 10:16
# Conflicts:
#	pyo3-macros-backend/src/pyclass.rs
…ss, field, or variant args when mixed with a str shorthand formatter.
@Zyell
Copy link
Contributor Author

Zyell commented Jun 25, 2024

@davidhewitt @Icxolu Sorry for the delay in returning to this PR. I have made the string shorthand format incompatible with any renaming, forcing the user to either implement Display or __str__ explicitly to remove any ambiguity in the shorthand format token choices. It looks like the remaining issues surround handling Enums:

  • Enum scoping affecting both Simple and Complex (prepending the enum class name)
  • Complex Enum variant formatting

For the first case, I don't want to make the assumption that they need the name prepended if Display has been implemented explicitly. I don't believe proc-macros have access to the what traits have been implemented. I looked into how I could bound the problem, but I'm not entirely sure how to do so just yet. There is also the other case where they explicitly prepended the class name using a shorthand string formatter and we don't want to double prepend it. If we make the assumption for the user, it would be surprising.

For the second case, it was mentioned

* For complex enums, should we require all variants to have a `str = "format"` specifier, and then we'll be responsible for prepending the correct variant name?  

This again runs into the potential double pre-pending mentioned above, or maybe the user simply doesn't want it prepended at all.

Which brings us to the following:

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

It isn't clear to me how the shorthand logic should function for the enum cases without making potentially bad assumptions for the user. For this initial release, should we release just the following?

  • Support for str which uses a Display trait implementation for structs, simple enums, and complex enums.
  • Support a string formatter str="<string formatter>" only for structs and disallowed for enums.
  • Disallow the the string formatter for all cases of renaming across all pyclass implementations. (Optionally I can re-enable this for structs if I remove the ability to reference self in the shorthand notation with {}. Then there won't be a chance of accidentally introducing incorrect field names when using the shorthand for structs. This only works under the assumption that the shorthand only applies for structs.)

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants