-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Issue 16642 - byCodeUnit doesn't work properly with alias this. #4879
Conversation
|
Params needs to be updated |
|
There seem to be bugs in the codecov coverage marking. Pinging @wilzbach |
|
Docs updated. |
Hmm - it seems like CodeCov shifts the resulting coverage for a couple of lines. Compare an excerpt from their coverage report: with running the coverage result locally (e.g. Unfortunately I have no idea what could cause this shift (I tried to restart CircleCi). |
|
@jmdavis you would need to remove the https://github.com/dlang-bots/dlang-bot#automated-references |
It could be due to the fact that we merge the commit into the |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! I would add a couple of static asserts to check for the correct front type and maybe use characters that are actually within UTF-16 and UTF-32. Just "t" is a bit boring/redundant.
std/utf.d
Outdated
| auto byCodeUnit(R)(R r) | ||
| if (isAutodecodableString!R || | ||
| isInputRange!R && isSomeChar!(ElementEncodingType!R) || | ||
| (is(R == enum) && is(R : const dchar[]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mentioned on #5137 this a good example of what's typically considered as "stringLike"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is and it isn't. If anything, it's a great example of why we need to be very careful in defining what on earth "string-like" means.
std/utf.d
Outdated
| { | ||
| return r[$ - 1]; | ||
| } | ||
| pure nothrow @nogc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this @safe as well (at least if R is a struct)?
std/utf.d
Outdated
|
|
||
| auto fn = WStringish("test.d"w); | ||
| auto x = fn.byCodeUnit(); | ||
| assert(x.front == 't'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using characters that wouldn't fit into a char or adding a static assert(typeof(x.front) == wchar)?
|
Okay, I updated the tests to use more characters that won't fit in a |
std/utf.d
Outdated
| return r; | ||
| return ByCodeUnitImpl(r); | ||
| } | ||
| else static if (is(R == enum) || is(R : const dchar[])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and the line in the template constraint aren't the same, was this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this line different from the one in the template constraint? What's different is the previous static if, because the parts of the template constraint not covered by this line are covered by either the previous static if or the else. So, the rest of the condition from the template constraint was split for the static if-else chain, but this branch is straight from the constraint. It covers the types that are implicitly convertible to an array of dchar but aren't actually an array of dchar and thus need to be cast to force the conversion - and they don't need to be wrapped, because arrays of dchar aren't auto-decoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reading it as a diff is making it hard to see what's actually there? In this case, I'd suggest reading the resulting code rather than trying to parse the diff. It should be much easier to understand that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmdavis is(R == enum) && is(R : const dchar[])) vs (is(R == enum) || is(R : const dchar[]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL. Ah, I see now. Yeah, very close, and I missed the difference upon review. And now you're making me think. :)
Well, the && is definitely required in the template constraint, otherwise any old enum would pass, which would obviously be bad. We only want enums that are strings to pass. isAutodecodableString does that already for enums of arrays of char and wchar. The is(R == enum) && is(R : const dchar[]) bit is supposed to then cover enums of arrays of dchar. However, isAutodecodableString also lets user-defined types that implicitly convert to narrow strings through (but not static arrays). So, it would appear that I made an oversight with regards to types that implicitly convert to arrays of dchar. What that really should be is is(R : const dchar[]) && !isStaticArray!R, because that would be the dchar equivalent of what isAutodecodableString does. The isInputRange!R && isSomeChar!(ElementEncodingType!R) lets any ranges of any character type through, and that should be fine.
As for the static if-else chain, the first branch needs to be for narrow strings and anything that implicitly converts to narrow strings - excepting the bizarre corner case where a user-defined type is already a range but also implicitly converts to a narrow string. For this branch, the implicit conversion is enforced (which is a no-op for actual narrow strings), and then we get the wrapper range that avoids auto-decoding. I see nothing wrong with that static if condition. Either it's a narrow string, or it's an enum or user-defined type which implicitly converts to a narrow string and is not itself a range.
That leaves two branches. One of them must cover anything that implicitly converts to an array of dchar - but is not itself a range - and the other must cover any ranges of characters. Actual arrays of dchar that aren't enums could actually go in either branch. As it stands, the fact that || is used makes arrays of dchar take the first branch. It also makes it so that ranges that implicitly convert to an array of dchar get converted (which we don't want). And the enum actually ends up being pointless. If it were && like in the template constraint, then only enums would get through, and actually, since only enums that implicitly converted to an array of dchar could get to this condition (since any other enum type would not be a range and thus would have failed the template constraint, and all of the enums that implicitly converted to narrow strings went with the first branch), simply checking is(R == enum) would be the same as changing it to &&. But if we changed it to &&, then user-defined types that were not ranges in and of themselves but implicitly converted to an array of dchar would simply be returned rather than converted, which would be bad. So, neither || nor && is correct. I'll have to adjust it so that it does the conversion for user-defined types only if they're not already ranges while still doing the conversion for those that aren't already ranges.
So, if I've thought this through correctly, ultimately, both were conditions involving is(R : const dchar[]) were wrong. Man, I hate this maze of traits involving strings, and implicit conversions are hell for generice code. :|
I'll fix this up later today. Thanks for catching that.
If a template is going to allow implicit conversions, it really needs to force the conversion in order to avoid subtle bugs. This fixes byCodeUnit so that it forces the conversion to a string type for types that convert implicitly. It also fixes it so that types which implicitly convert to a string type but are also ranges of characters without the conversion will not be converted by byCodeUnit.
Also renames the member variable in ByCodeUnitImpl so that it doesn't have the same name as the function parameter as well as making it more clear that it's a string rather than a more general range type.
|
Okay. I made the fix to the template constraint and the 2nd static if condition as well as beefing up the tests so that they actually catch what I missed before. |
Yup, the |
|
Auto-merge toggled on |
| } | ||
|
|
||
| /// | ||
| @safe unittest | ||
| { | ||
| import std.range.primitives; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? This is a documented unittest, meaning it should be self-contained.
|
@jmdavis @JackStouffer Looks like this broke CircleCI tests on master. To avoid this in the future you can use the "auto-merge" label to use @dlang-bot instead of enabling auto-merge on Brad's CI, which does not take into account other CI services. |
| static assert(!is(typeof(bcu) == Stringish)); | ||
| static assert(is(typeof(bcu) == typeof(bcu.byCodeUnit()))); | ||
| static assert(is(ElementType!(typeof(bcu)) == immutable char)); | ||
| assert(bcu.front == cast(char)244); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after cast(x), here and 4 other places. It is another thing that was pointed out by the CircleCI failures, though it did not get that far because of the unittest import problem.
| } | ||
| } | ||
| auto ref opIndex(size_t index) inout { return str[index]; } | ||
| auto opSlice(size_t lower, size_t upper) { return ByCodeUnitImpl(str[lower..upper]); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space around ..


Basically,
byCodeUnitworks properly with strings, with structs that are ranges of characters (but do not implicitly convert to any string type), and with structs that usealias thiswith a member variable to convert to a string type. It does not work properly with enums (which pass the template constraint but fail to compile, because they're not actually ranges), it does not work with structs that use a member function withalias this, and it does not work with structs that are already ranges of characters but also usealias thiswith a string type. This PR fixes all that.The main thing is that it now forces the conversion to the string type when the type needs an implicit conversion to be a range of characters. In general, it's bad practice to have a template test for an implicit conversion and then not force the conversion, because it can result in some fun, subtle problems. A prime example being that enums are not actually ranges even if their base type is a string type.
Given that enums aren't really ranges and that they didn't compile properly with
byCodeUnitbefore, it might make more sense to just disallow them just like static arrays are disallowed, but since at heart, this is basically a function for taking a string and converting it to a non-autodecoding range, and functions that take strings work with enums with a base type of string, it makes sense to me thatbyCodeUnitwould work with enums.Probably the ugliest bit with this is making it so that a struct that's already a range of characters but also implicitly converts to a string doesn't do the implicit conversion in
byCodeUnit. But since normally, implicit conversions do not occur unless the type needs to be converted to work, it doesn't make sense to do the implicit conversion on a type that's already a range of characters on its own.The second commit is for some formatting clean-up for
byCodeUnit, so you may want to look at the first commit separately.