Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

char[] and wchar[] should be output ranges #6471

Merged
merged 2 commits into from
Jun 29, 2018

Conversation

schveiguy
Copy link
Member

There's no reason for this not to work. put jumps through hoops to helpfully transcode between character types just so it can work with any kind of output ranges that accept characters, but it had some specific arbitrary limitations for narrow strings that prevented them from working. This removes the restrictions, so putting to a char[] or wchar[] is now allowed. All widths work.

I'm not 100% sure of the paths that are taken, maybe they could use some improvement. But at least it will now work.

This enables nice optimizations such as easily allocating buffers on the stack to use with formattedWrite

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @schveiguy!

Bugzilla references

Auto-close Bugzilla Severity Description
18790 enhancement can't put a const(char)[] into a char[]

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#6471"

@@ -391,7 +391,7 @@ template Base64Impl(char Map62th, char Map63th, char Padding = '=')
* The number of times the output range's `put` method was invoked.
*/
size_t encode(E, R)(scope const(E)[] source, auto ref R range)
if (is(E : ubyte) && isOutputRange!(R, char))
if (is(E : ubyte) && isOutputRange!(R, char) && !is(R == char[]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other overload had this check, so I'm just thinking it was forgotten but the compiler didn't complain because R was never an output range of char.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the compiler didn't complain because R was never an output range of char

I specifically asked it to be removed IIRC in a different PR for this reason

immutable len = e.length;
r[0 .. len] = e;
r = r[len .. $];
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made it work, but I'm open to moving stuff around. I'm not quite sure what the point of doPut is, but I think this follows the rules above (it doesn't slice or modify e at all).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 379, if you change the static if, you should be able to consolidate this code with the existing code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that is what I did originally (I just removed the !isNarrowString). But the problem is that if the char widths don't match, it defers to putChar, but putChar never looks back at put again, it only goes to doPut. So I had to copy that code here.

Or maybe you mean something different?

I'm not sure the reason for this whole put trifecta, it's really hard to tell the purpose for all of these. So I figured this was the least disruptive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add the condition on line 281 as an OR to the condition on line 379

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see the issue. Yeah you'll have to add an is(typeof(r[0]) == typeof(e[0]))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then it won't work for putting one type of character into another type of buffer. For example, putting a wchar into a char[]. Right now, that does work.

enum dsCond = !isDynamicArray!R && is(typeof(doPut(r, dstringInit())));
enum csCond = is(typeof(doPut(r, cstringInit())));
enum wsCond = is(typeof(doPut(r, wstringInit())));
enum dsCond = is(typeof(doPut(r, dstringInit())));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to tell, but I think that the isDynamicArray!R restrictions were specifically for narrow strings. If anyone wants to shed some light as to why they were there, I'd appreciate it.

else
{
put(sink, result[]);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole part was super-messy, and avoided any nice optimizations that could be had with putting an entire array. Plus that cast to just "make it work" is so bad, but I'd rather leave that for another day.

Unfortunately, I had to rearrange things, because in cases where we are transcoding, put(sink, result[]) is not nothrow, and code that uses it is sometimes nothrow.

@thewilsonator
Copy link
Contributor

thewilsonator commented Apr 22, 2018

Nice!

You should add tests for formatted write since that is the motivation for this PR.
Please also check if this also fixes 18472, I suspect it does (making this a bug fix as opposed to an enhancement. I'll probably cherry pick this for LDC anyway but still).

@schveiguy
Copy link
Member Author

I can put in a test for formattedWrite (I did verify locally that this works), but I don't see how I can check 18472, as that seems to have a TypeInfo error on top of the put error. How would I test it?

@thewilsonator
Copy link
Contributor

thewilsonator commented Apr 22, 2018

The typeinfo was fixed in dlang/dmd#8173.

@schveiguy
Copy link
Member Author

still happens on my branch, what PR fixed the TypeInfo thing?

@thewilsonator
Copy link
Contributor

thewilsonator commented Apr 22, 2018

dlang/dmd#8173

@schveiguy
Copy link
Member Author

Ah, that's a DMD PR I updated your comments to reflect that. Will update my local DMD and see if it helps.

@thewilsonator
Copy link
Contributor

Ah whoops, i forgot this is phobos.

@schveiguy
Copy link
Member Author

Nope still fails: Error: TypeInfo cannot be used with -betterC. I think @JinShil says in the bug report that the PR helped in one case, but not the formatting case.

@schveiguy
Copy link
Member Author

schveiguy commented Apr 22, 2018

Wait, I didn't realize that went into stable. Doesn't seem to be in master yet. Will re-check.

Update: still fails.

@schveiguy
Copy link
Member Author

Added test case in std.format for formattedWrite.

@JackStouffer
Copy link
Member

I’ll be able to review this tomorrow. I’m very interested in this change of it works.

@JackStouffer JackStouffer self-assigned this Apr 23, 2018
Copy link
Member

@JackStouffer JackStouffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The byCodeUnit example on put should be modified

@schveiguy
Copy link
Member Author

@JackStouffer ok, I didn't notice those.

@schveiguy
Copy link
Member Author

@JackStouffer fixed the example.

Copy link
Member

@JackStouffer JackStouffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examining the original code, I think the original intention of the function doPut was to handle the case where the type had a put or opCall method that accepted E.

Is it possible to move the new slicing code to a static if after the call to putChar? Then we can modify putChar to handle the put(char[], char) case.

Copy link
Member

@n8sh n8sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@schveiguy
Copy link
Member Author

@JackStouffer didn't see your message from before sorry. I'll take a look and see if there's a way to shuffle this around.

@schveiguy
Copy link
Member Author

@JackStouffer I looked again at this. I'd much rather have put handle the slice assign, since it's already handling the slice assign for a normal array, but the problem is that putChar can try transforming a single character (code unit) into an array, and it only calls doPut. Your suggestion is what I tried first, and it didn't work because of this limitation. If putChar would call put again, then we could handle it there, but it simply doesn't go that way (maybe it should?).

Not understanding the reason behind the put/doPut split, and in light of the fact that there is so much code that depends on the static introspection which I'm not sure is all tested in unittests here, I didn't want to mess too much with how this works. This was the least bad option.

So let me know if there is anything else stopping this from being merged, or if you can think of a better way to rearrange this.

@schveiguy
Copy link
Member Author

@JackStouffer ping, any comment here, or can we dismiss the stale review?

@schveiguy schveiguy dismissed JackStouffer’s stale review June 22, 2018 00:51

Fixed the example in question

@schveiguy
Copy link
Member Author

Anyone with a concern here? I think this is ready to be merged.

@n8sh
Copy link
Member

n8sh commented Jun 22, 2018

Anyone with a concern here?

Not a concern, but I'd like to know why it was disabled to begin with.

@schveiguy
Copy link
Member Author

I think it was preserving the original behavior. Here is a really OLD version of std.range that I think even predates github, that shows how the original put function looked:

void put(R, E)(ref R r, E e)

From what I can tell, at that point put would do nothing for parameters char[] and char[].

At this point, it was fixed to be an error, and then char[] became not an output range of char[]:

cdc5468

So I would say I don't think it was a PLAN to make it not work, it just happened to work that way and was formalized.

@n8sh n8sh added Merge:72h no objection -> merge The PR will be merged if there are no objections raised. Merge:auto-merge labels Jun 23, 2018
@dlang-bot dlang-bot merged commit 0e8722a into dlang:master Jun 29, 2018
@schveiguy schveiguy deleted the putstrings branch July 4, 2018 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge:auto-merge Merge:72h no objection -> merge The PR will be merged if there are no objections raised. Severity:Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants