pep440: some minor refactoring, mostly around error types #780

I'll like expand this comment a little bit in subsequent commits, but I felt like this was useful to put in the beginning of this patch series to "set the stage." That is, this data will motivate some of the optimizations we'll do for version parsing, version representation and comparisons.

This moves LocalSegment and PreRelease after Version, and also keeps their impl blocks grouped together. No code is changed here.

It turns out that the current parser permits non-ASCII whitespace in places. We can be sneaky here to cause the existing implementation to produce incorrect offsets. The core issue here is that `unicode-width` is used to compute codepoint offsets, but its actual purpose is to compute the *visual width* of a codepoint that has been rendered. Some codepoints use more than 1 unit of visual width. While this establishes a mismatch between the implementation and the documented behavior of `Pep440Error`, we will also address this by switching to byte offsets instead of codepoint offsets. (Codepoint offsets are almost never what you want.)

We still use unicode-width, but only when rendering the error message. We also add the error message itself to the Display impl.

This error type was only be used when one attempted to parse a string as a `VersionSpecifiers`. We'll want to introduce more structured error types for parsing `Version` and `VersionSpecifier` as well, so renaming the error type helps make room for that. We also make the error type opaque. Nothing (in puffin at least) seemed to need its internals. We can always add accessor methods in the future if something else needs it. It's overall rare to need to expose the entire internal representation of an error type.

Since it's specifier to version specifier parsing, it makes sense to have it live with the corresponding type. Putting it in lib.rs I think makes it seem like an error type that applies to other things as well.

This smoothes out the error handling to using something a bit more structured. I did this because I intend to do the same for Version (eventually), and it seems good to be consistent. It also lets us nest errors a bit more easily and scrutinize the different error classes at a glance.

This makes the top-level error type small, just as the others we added in the previous commit are. This was specifically flagged by Clippy.

Prior to the refactoring in previous commits, tests were generally written against the rendered error message instead of the structured message. This was in part because many of the error types themselves were just a `String`, but also to explicitly test what an end user actually sees. That's valuable. While we keep most of the tests as rewritten to target the new structured error representation, we add some new tests that captures the value of testing the messages than humans will actually see.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pep440: some minor refactoring, mostly around error types #780

pep440: some minor refactoring, mostly around error types #780

Commits on Jan 4, 2024