Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 16642 - byCodeUnit doesn't work properly with alias this. #4879

Merged
merged 3 commits into from
Mar 10, 2017

Conversation

jmdavis
Copy link
Member

@jmdavis jmdavis commented Oct 27, 2016

Basically, byCodeUnit works properly with strings, with structs that are ranges of characters (but do not implicitly convert to any string type), and with structs that use alias this with a member variable to convert to a string type. It does not work properly with enums (which pass the template constraint but fail to compile, because they're not actually ranges), it does not work with structs that use a member function with alias this, and it does not work with structs that are already ranges of characters but also use alias this with a string type. This PR fixes all that.

The main thing is that it now forces the conversion to the string type when the type needs an implicit conversion to be a range of characters. In general, it's bad practice to have a template test for an implicit conversion and then not force the conversion, because it can result in some fun, subtle problems. A prime example being that enums are not actually ranges even if their base type is a string type.

Given that enums aren't really ranges and that they didn't compile properly with byCodeUnit before, it might make more sense to just disallow them just like static arrays are disallowed, but since at heart, this is basically a function for taking a string and converting it to a non-autodecoding range, and functions that take strings work with enums with a base type of string, it makes sense to me that byCodeUnit would work with enums.

Probably the ugliest bit with this is making it so that a struct that's already a range of characters but also implicitly converts to a string doesn't do the implicit conversion in byCodeUnit. But since normally, implicit conversions do not occur unless the type needs to be converted to work, it doesn't make sense to do the implicit conversion on a type that's already a range of characters on its own.

The second commit is for some formatting clean-up for byCodeUnit, so you may want to look at the first commit separately.

@JackStouffer
Copy link
Member

Params needs to be updated

@JackStouffer
Copy link
Member

There seem to be bugs in the codecov coverage marking. Pinging @wilzbach

@jmdavis
Copy link
Member Author

jmdavis commented Oct 28, 2016

Docs updated.

@wilzbach
Copy link
Member

wilzbach commented Jan 4, 2017

There seem to be bugs in the codecov coverage marking. Pinging @wilzbach

Hmm - it seems like CodeCov shifts the resulting coverage for a couple of lines. Compare an excerpt from their coverage report:

image

with running the coverage result locally (e.g. make -f posix.mak std/utf.test):

image

Unfortunately I have no idea what could cause this shift (I tried to restart CircleCi).

@wilzbach
Copy link
Member

@jmdavis you would need to remove the # from the git message in order for the bot (and changelog gen) to detect. For more details:

https://github.com/dlang-bots/dlang-bot#automated-references

@wilzbach
Copy link
Member

Unfortunately I have no idea what could cause this shift (I tried to restart CircleCi).

It could be due to the fact that we merge the commit into the master branch before running the coverage tests and thus it's possible that CodeCov receives a different .lst file from CircleCi as the underlying sources are different.

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
16642 byCodeUnit doesn't work AutodecodableStrings unless they're actually strings or alias a variable that's a string

@wilzbach wilzbach added this to the 2.074.0 milestone Feb 17, 2017
Copy link
Member

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! I would add a couple of static asserts to check for the correct front type and maybe use characters that are actually within UTF-16 and UTF-32. Just "t" is a bit boring/redundant.

std/utf.d Outdated
auto byCodeUnit(R)(R r)
if (isAutodecodableString!R ||
isInputRange!R && isSomeChar!(ElementEncodingType!R) ||
(is(R == enum) && is(R : const dchar[])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned on #5137 this a good example of what's typically considered as "stringLike"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is and it isn't. If anything, it's a great example of why we need to be very careful in defining what on earth "string-like" means.

std/utf.d Outdated
{
return r[$ - 1];
}
pure nothrow @nogc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this @safe as well (at least if R is a struct)?

std/utf.d Outdated

auto fn = WStringish("test.d"w);
auto x = fn.byCodeUnit();
assert(x.front == 't');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using characters that wouldn't fit into a char or adding a static assert(typeof(x.front) == wchar)?

@jmdavis
Copy link
Member Author

jmdavis commented Mar 6, 2017

Okay, I updated the tests to use more characters that won't fit in a char or wchar, and I redid the changes on top of master so that they're mergeable again.

std/utf.d Outdated
return r;
return ByCodeUnitImpl(r);
}
else static if (is(R == enum) || is(R : const dchar[]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line and the line in the template constraint aren't the same, was this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this line different from the one in the template constraint? What's different is the previous static if, because the parts of the template constraint not covered by this line are covered by either the previous static if or the else. So, the rest of the condition from the template constraint was split for the static if-else chain, but this branch is straight from the constraint. It covers the types that are implicitly convertible to an array of dchar but aren't actually an array of dchar and thus need to be cast to force the conversion - and they don't need to be wrapped, because arrays of dchar aren't auto-decoded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reading it as a diff is making it hard to see what's actually there? In this case, I'd suggest reading the resulting code rather than trying to parse the diff. It should be much easier to understand that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmdavis is(R == enum) && is(R : const dchar[])) vs (is(R == enum) || is(R : const dchar[]))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL. Ah, I see now. Yeah, very close, and I missed the difference upon review. And now you're making me think. :)

Well, the && is definitely required in the template constraint, otherwise any old enum would pass, which would obviously be bad. We only want enums that are strings to pass. isAutodecodableString does that already for enums of arrays of char and wchar. The is(R == enum) && is(R : const dchar[]) bit is supposed to then cover enums of arrays of dchar. However, isAutodecodableString also lets user-defined types that implicitly convert to narrow strings through (but not static arrays). So, it would appear that I made an oversight with regards to types that implicitly convert to arrays of dchar. What that really should be is is(R : const dchar[]) && !isStaticArray!R, because that would be the dchar equivalent of what isAutodecodableString does. The isInputRange!R && isSomeChar!(ElementEncodingType!R) lets any ranges of any character type through, and that should be fine.

As for the static if-else chain, the first branch needs to be for narrow strings and anything that implicitly converts to narrow strings - excepting the bizarre corner case where a user-defined type is already a range but also implicitly converts to a narrow string. For this branch, the implicit conversion is enforced (which is a no-op for actual narrow strings), and then we get the wrapper range that avoids auto-decoding. I see nothing wrong with that static if condition. Either it's a narrow string, or it's an enum or user-defined type which implicitly converts to a narrow string and is not itself a range.

That leaves two branches. One of them must cover anything that implicitly converts to an array of dchar - but is not itself a range - and the other must cover any ranges of characters. Actual arrays of dchar that aren't enums could actually go in either branch. As it stands, the fact that || is used makes arrays of dchar take the first branch. It also makes it so that ranges that implicitly convert to an array of dchar get converted (which we don't want). And the enum actually ends up being pointless. If it were && like in the template constraint, then only enums would get through, and actually, since only enums that implicitly converted to an array of dchar could get to this condition (since any other enum type would not be a range and thus would have failed the template constraint, and all of the enums that implicitly converted to narrow strings went with the first branch), simply checking is(R == enum) would be the same as changing it to &&. But if we changed it to &&, then user-defined types that were not ranges in and of themselves but implicitly converted to an array of dchar would simply be returned rather than converted, which would be bad. So, neither || nor && is correct. I'll have to adjust it so that it does the conversion for user-defined types only if they're not already ranges while still doing the conversion for those that aren't already ranges.

So, if I've thought this through correctly, ultimately, both were conditions involving is(R : const dchar[]) were wrong. Man, I hate this maze of traits involving strings, and implicit conversions are hell for generice code. :|

I'll fix this up later today. Thanks for catching that.

jmdavis added 3 commits March 8, 2017 12:54
If a template is going to allow implicit conversions, it really needs to
force the conversion in order to avoid subtle bugs. This fixes
byCodeUnit so that it forces the conversion to a string type for types
that convert implicitly. It also fixes it so that types which implicitly
convert to a string type but are also ranges of characters without the
conversion will not be converted by byCodeUnit.
Also renames the member variable in ByCodeUnitImpl so that it doesn't
have the same name as the function parameter as well as making it more
clear that it's a string rather than a more general range type.
@jmdavis
Copy link
Member Author

jmdavis commented Mar 8, 2017

Okay. I made the fix to the template constraint and the 2nd static if condition as well as beefing up the tests so that they actually catch what I missed before.

@JackStouffer
Copy link
Member

Man, I hate this maze of traits involving strings, and implicit conversions are hell for generice code. :|

Yup, the alias this and enums make templates very hard to get 100% correct. I bet there are tons of hidden bugs in most D code in the wild dealing with strings/string-like-ranges

@JackStouffer
Copy link
Member

Auto-merge toggled on

@JackStouffer JackStouffer merged commit 13b3da6 into dlang:master Mar 10, 2017
}

///
@safe unittest
{
import std.range.primitives;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is a documented unittest, meaning it should be self-contained.

@CyberShadow
Copy link
Member

@jmdavis @JackStouffer Looks like this broke CircleCI tests on master. To avoid this in the future you can use the "auto-merge" label to use @dlang-bot instead of enabling auto-merge on Brad's CI, which does not take into account other CI services.

static assert(!is(typeof(bcu) == Stringish));
static assert(is(typeof(bcu) == typeof(bcu.byCodeUnit())));
static assert(is(ElementType!(typeof(bcu)) == immutable char));
assert(bcu.front == cast(char)244);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after cast(x), here and 4 other places. It is another thing that was pointed out by the CircleCI failures, though it did not get that far because of the unittest import problem.

}
}
auto ref opIndex(size_t index) inout { return str[index]; }
auto opSlice(size_t lower, size_t upper) { return ByCodeUnitImpl(str[lower..upper]); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space around ..

@CyberShadow
Copy link
Member

#5266

@jmdavis jmdavis deleted the issue_16642 branch June 27, 2018 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants