use ScopeBuffer in std.file #2014

WalterBright · 2014-03-17T09:13:03Z

This is an initial run at eliminating the generation of GC allocated garbage in std.file by replacing calls to toStringz and toUTF16z with use of ScopeBuffer.

Does not change the user facing API at all.

ghost · 2014-03-17T09:43:47Z

std/file.d

-        auto h = CreateFileW(std.utf.toUTF16z(name), defaults);
+
+        wchar[64] tmpbuf = void;
+        auto sz = ScopeBuffer!wchar(tmpbuf);


Why does ScopeBuffer need an explicit template parameter? This looks to me like we could use a helper function with IFTI, e.g.:

auto sz = tmpbuf.scopeBuffer();

Actually if it can't even be a function (due to enregistering?) we could also use pass-by-alias:

auto sz = ScopeBuffer!tmpbuf;

As discussed in the ScopeBuffer PR thread, putting the buffer as an alias parameter would cause a new ScopeBuffer to be instantiated for every use, rather than sharing a single instance.

This is kinda funny. IFTI works fine. Just use scopeBuffer factory function that is already present. This should have worked:
auto sz = tmpbuf.scopeBuffer();

If anything I'd mark explicit constructor as private as it simply adds more boilerplate in user code over 1-line helper function.

monarchdodra · 2014-03-17T09:44:50Z

The code is a bit more verbose than I'd like, but I think it's fine in a low level IO lib, so worries there.

The function "toUTFWz" is not nothrow though. In the sports where you used it, you should place the sz.free's in a scope.

Or, if you judge the performance hit does not justify it, please at least leave in a comment about it in the code, for future reviewers (EG: "Potential acceptable leak in case of UTF exception.").

Also, wchar[64] ? Wouldn't a bigger number, specifically 260, make more sense? That's a question btw, not a suggestion. AFAIK, it would reduce allocation to near 0.

ghost · 2014-03-17T09:48:51Z

std/file.d

+    /* *******
+     * Reads from range t and writes as wchar's to s. Appends a 0.
+     */
+    private wchar* toUTFWz(T, S)(ref T t, ref S s)


Seems like a good opportunity to enhance our existing toUTFz from std.utf to take an optional buffer (ok we can do this later).

I looked into doing it, and it'll take some careful work. It can be deferred for the moment.

ghost · 2014-03-17T09:53:52Z

I don't like this, the number of lines is exploding 5x over.

WalterBright · 2014-03-17T10:08:23Z

I don't like this, the number of lines is exploding 5x over.

Explicit memory management takes more lines of code than automatic.

WalterBright · 2014-03-17T19:21:29Z

@monarchdodra I had overlooked that toUTFWz can throw, which would screw things up. I tried to make it nothrow, then found that the foreach can throw. Looks like I'll need to revise the code.

In doing this, I also discovered that stat() and friends were incorrectly considered throwable. Here's a PR to fix that: dlang/druntime#742

WalterBright · 2014-03-17T19:28:47Z

Also, wchar[64] ?

Of course, the 64 is completely arbitrary. Considerations are:

be large enough cover most common cases so no malloc is necessary
be conservative in use of stack, so fibers and threads won't overflow it
be small enough that at least sometimes it is too small, so that the overflow malloc does get executed now and then
the array's total size should be a multiple of 8, otherwise stack alignment will waste bytes

andralex · 2014-03-17T23:20:54Z

std/file.d

+        from.toUTFWz(sz);
+        auto fromlen = sz.length;
+        auto p = to.toUTFWz(sz);
+        scope(exit) sz.free();


yah, missed that one, and the other one below.

monarchdodra · 2014-03-18T15:11:03Z

std/file.d

+        foreach (wchar w; t)
+            s.put(w);
+        s.put(0);
+        return s[].ptr;


Is there a rationale for this return? It's not generic at all, and it also makes very little sense in the context of this function: Why return a pointer to the beginning to the buffer?

toUTFWz should have no knowledge of how the output range S is implemented (or if it even has [] at all).

Why not return just void?

Ah. I see how you are using it now. Though I can't say I completely agree with it. Why not return s by reference. Then, you could use:

SetFileAttributesW(name.toUTFWz(sz)[].ptr, attributes); //or, with .ptr as a member property: SetFileAttributesW(name.toUTFWz(sz).ptr, attributes);

Isn't it obvious that all improvements you propose end up worse? toUTFz converts to a zero-terminated UTF string, and that's what it returns.

toUTFz converts to a zero-terminated UTF string, and that's what it returns.

Actually, no, that's not what it returns. That's the point I'm making.

Isn't it obvious that all improvements you propose end up worse?

:/

toUTFz converts to a zero-terminated UTF string, and that's what it returns.
Actually, no, that's not what it returns. That's the point I'm making.

Wait, then I'm missing something. Doesn't it return a pointer to the beginning of the converter string?

Isn't it obvious that all improvements you propose end up worse?
:/

What I see is that you take the expression in the return statement and sprinkle it in all caller code.

Wait, then I'm missing something. Doesn't it return a pointer to the beginning of the converter string?

It returns a pointer to the beginning of the buffer into which the string is converted, not at the begining of the converted string.

from.toUTFWz(sz); auto fromlen = sz.length; auto p = to.toUTFWz(sz); // points to 0 terminated from[]

Why would to.toUTFWz(sz) point to from[] :s ???

Also, It makes the un-written assumption that the ouput range S is of type ScopeBuffer. IT wouldn't compile with Appender, for example.

As I said when I finished reviewing, this is "nit" material, so we don't have to blow it out of proportions here. I'm OK if we keep it that way. I'm just saying that if we want to have a public toUTFWz that takes an output range, that design won't fly.

Got it, thanks for the explanation. I'm also fine with it either way.

monarchdodra · 2014-03-18T15:28:59Z

This looks good to go to me. I saw nothing that actually needed addressing. Will leave open 24h, and then toggle the auto-merge if no issues are raised by then.

andralex · 2014-03-18T15:33:01Z

This makes it painfully obvious that we need a ScopeBuffer with a destructor. The ScopeBuffer idiom is highly repetitive. Just looking at the code in this diff and contemplating the spectrum of plopping it in a lot of places clarifies that making the idiom shorter is a big deal.

We should, I think, rename ScopeBuffer to ScopeBufferImpl and define this:

struct ScopeBuffer
{
    private ScopeBufferImpl impl;
    alias impl this;
    ~this() { impl.free(); }
}

and use it wherever we need automatic destruction.

DmitryOlshansky · 2014-03-18T21:13:29Z

@andralex Long live the destructor! ;)

dnadlinger · 2014-05-06T21:11:23Z

@andralex, resurrecting this discussion: IIRC Walter's reasoning behind not giving ScopeBuffer a destructor is that it then would no longer be POD (in the ABI sense), forcing it to be passed on the stack. This seems to have been a performance bottleneck in one of Walter's applications. I agree that the manual lifetime management is a big downside of the design, though, avoidable or not.

DmitryOlshansky · 2014-05-31T10:50:46Z

Would be nice to go ahead with @andralex suggestion and simplify repetitive code. Nothing is ever passed to functions here.

denis-sh · 2014-07-11T12:05:05Z

So I'll be the one who have to say: I's unacceptable ugly and dangerous.

Current approach:

res = c_func(dStr.toStringz());

It's not just slow, it is also invalid as there is no guarantee c_func keeps it's argument in GC visible way during the execution (Issue 12417), I'm not even talking about moving GC.

Proposed approach:

// code before
c_func_res res;
{
    char[64] tmpbuf = void;
    auto sz = ScopeBuffer!char(tmpbuf);
    scope(exit) sz.free();
    res = c_func(dStr.toUTFCz(sz);)
}
// code after

Without enclosing additional scope tmpbuf & sz will live after c_func call without any need.
If one will forget free call we will have a memory leak.

For a long time I'm proposing this approach:

res = c_func(dStr.tempCString());

But never ever had any response from you guys whether or not it is usable in Phobos. Can anybody explain why this short and almost safe (except really incorrect usage, see docs) proposal is ignored?

denis-sh · 2014-07-13T16:59:06Z

Formalized my proposal in pull #2332.

mihails-strasuns · 2014-07-29T14:53:10Z

I agree with destructor part being a blocker.

burner · 2014-08-26T11:36:25Z

this can be closed, as far as I see it. reason #2332

mihails-strasuns · 2014-08-28T01:28:35Z

Does #2332 include similar update to std.file? I don't think so.

DmitryOlshansky · 2014-08-28T05:37:46Z

@Dicebot It seems to do, grep the diff for 'std/file' - 140+ LOCs

denis-sh · 2014-08-28T09:11:30Z

Does #2332 include similar update to std.file? I don't think so.

Yes. That pull replaced all toStringz/toUTF16z/toUTFz except couple of non-trivial cases.

mihails-strasuns · 2014-08-28T15:39:22Z

Ah, ok, sorry, missed that.
Safe to close.

ghost reviewed Mar 17, 2014
View reviewed changes

andralex reviewed Mar 17, 2014
View reviewed changes

use ScopeBuffer in std.file

4be307b

monarchdodra reviewed Mar 18, 2014
View reviewed changes

denis-sh mentioned this pull request Jul 13, 2014

Add temp c string allocation #2332

Merged

mihails-strasuns added the needs work label Jul 29, 2014

mihails-strasuns closed this Aug 28, 2014

Uh oh!

use ScopeBuffer in std.file #2014

use ScopeBuffer in std.file #2014

Uh oh!

Conversation

WalterBright commented Mar 17, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monarchdodra commented Mar 17, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Mar 17, 2014

Uh oh!

WalterBright commented Mar 17, 2014

Uh oh!

WalterBright commented Mar 17, 2014

Uh oh!

WalterBright commented Mar 17, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monarchdodra commented Mar 18, 2014

Uh oh!

andralex commented Mar 18, 2014

Uh oh!

DmitryOlshansky commented Mar 18, 2014

Uh oh!

dnadlinger commented May 6, 2014

Uh oh!

DmitryOlshansky commented May 31, 2014

Uh oh!

denis-sh commented Jul 11, 2014

Uh oh!

denis-sh commented Jul 13, 2014

Uh oh!

mihails-strasuns commented Jul 29, 2014

Uh oh!

burner commented Aug 26, 2014

Uh oh!

mihails-strasuns commented Aug 28, 2014

Uh oh!

DmitryOlshansky commented Aug 28, 2014

Uh oh!

denis-sh commented Aug 28, 2014

Uh oh!

mihails-strasuns commented Aug 28, 2014

Uh oh!

Uh oh!