Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.

Conversation

WalterBright
Copy link
Member

This is an alternative to dlang/phobos#1911 , adding it to druntime instead.

@monarchdodra
Copy link
Contributor

I'll only leave this single message, since I've said plenty in the other thread.

I think this is a mistake. It's basically sweeping something under the rug before it even appears, because the design is dangerous.

I really think this would make a brilliant addition to phobos in range as ScopedAppender: It does everything walter is asking for, but could still be safelly usable for all types in any algorithm.

I feel by now that if I haven't been heard (or have been heard, but failed to convince), then there's nothing more I can add. I'll just be quiet now and let the others have their say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pity that we have to compromise safety for performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but D has @System for good reason.

@DmitryOlshansky
Copy link
Member

I concur with @monarchdodra, I've repeatedly asked for numbers that would back the hackish decisions to no avail. There is a chance for good generic primitive that does all of presented for any type and doesn't cut fingers of those who use it. It is then usable far beyond internals of Phobos.

As this pull stands it doesn't have the chance of being a publicly usable primitive, IMO.
Basically I'm done reviewing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import seems to be incorrect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@andralex
Copy link
Member

I'll only leave this single message, since I've said plenty in the other thread.

I think this is a mistake. It's basically sweeping something under the rug before it even appears, because the design is dangerous.

I really think this would make a brilliant addition to phobos in range as ScopedAppender: It does everything walter is asking for, but could still be safelly usable for all types in any algorithm.

I feel by now that if I haven't been heard (or have been heard, but failed to convince), then there's nothing more I can add. I'll just be quiet now and let the others have their say.

I'd love you to elaborate a bit more on this because I think placing the module in core/internal obviates a bunch of the counterarguments.

For my money, I made a thorough pass through the module just now. Walter has nicely improved it following the meaningful reviews gotten. I can see how it can be immediately used as is, as well as backend for other work. It's a simple, primitive, "no room left below" artifact that can be used in a variety of nicer abstractions.

It did look out of place in the middle of phobos like a dirty greasy rivet among a neat Lego set, but here it fits quite snugly. What is the core objection to this?

@JakobOvrum
Copy link

Are there any plans to actually use this anywhere specific in druntime? Otherwise I don't see why it should go in core.internal.

@CyberShadow
Copy link
Member

Question: Why is it important that ScopeBuffer fits into two 64-bit registers, if it is non-copyable? It will be passed by address to writing functions (such as formattedWrite) anyway, right?

Same with why it has to be a POD type and not have a destructor.

@monarchdodra
Copy link
Contributor

I'd love you to elaborate a bit more on this because I think placing the module in core/internal obviates a bunch of the counterarguments.

Please understand that my main gripe is not what it is, but what it could be. Yes, Walter created something that gives mind shattering speeds. I do think this is awesome, and could probably help a ton of projects where even the tiniest of speed increases is of the essence.

The thing is that because of this, it is barely usable for anything other than concatenating built-in types (chars, bytes, ints). Even using it with pointers/slices is tricky business (do they point the GC?). Walter basically hailed the thing as "the savior of Phobos", when in reality, all it'll do is maybe help the path building functions (which is still good yes) :/

What I think though is that with the tiniest of consessions, his ScopedBuffer could be so much more. The speeds might go down from "mind blowing" to "mere eye blinding". On the flip side, it'd be usable in for any type, and in any context where append can be used. Where talking any algorithm that accumulates anything here.


At this point, If Walter really thinks this mind blowing speed justifies these concessions, then fine. He's probably right anyways.

Just don't expect wide-spread use in Phobos. On the other-hand, I don't think it will prevent the introduction of a similar, say ScopedAppender, in Phobos, but safer and more robust: Appender, while still faster than simple concatenation, still isn't quite as fast as it could be due to its use of GC, and non-ownership of its buffer.

And that's what I want. An eye-blindingly fast replacement for Appender, which ScopedBuffer is not. Which is very disappointing for me.

@WalterBright
Copy link
Member Author

At Andrei's suggestion, I also added a realloc template parameter so the user can choose how to allocate storage.

@WalterBright
Copy link
Member Author

@CyberShadow it needs to be POD because otherwise it will not be enregistered. Since it will be used in tight loops, getting it in registers is everything. Squeezing it into two registers leaves the max number of registers available for other variables in those tight loops.

Although doing this must be done with care, I've used it by passing/returning ScopeBuffers from functions, so I still want them copyable. Also, when it fits in 2 registers, the call/return sequence is done in registers rather than using hidden pointer arguments. Essentially, it's designed to fit in with how the 64 bit ABI is specified.

@CyberShadow
Copy link
Member

Since it will be used in tight loops, getting it in registers is everything.

OK, that makes sense when the ScopeBuffer is used in the same function declaring it.

I've used it by passing/returning ScopeBuffers from functions, so I still want them copyable.

But it's currently marked as non-copyable (this(this) is marked as @disable). What am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(lower <= upper && upper <= bufLen);

I know it's assertive code, but if this is meant to be used in a tight loop, we might as well not gratuitously slow it down in non-release.

@WalterBright
Copy link
Member Author

But it's currently marked as non-copyable ( this(this) is marked as @disable ). What am I missing?

Eh, I should remove that.

@monarchdodra
Copy link
Contributor

But it's currently marked as non-copyable ( this(this) is marked as @disable ). What am I missing?
Eh, I should remove that.

Please remember that you marked it as such because it has both value and reference semantics. If you pass it around by value, and then the passed value gets modified, the original object will spontaneously corrupt (it may point to data that was removed). If this(this) is not disabled, then it needs to be implemented, which means ScopedBuffer will not be POD.

Or at the very least, simply document that no more than 1 ScopedBuffer should be "live" at any 1 time. This seems the best compromise? It would allow returning one by value, should it point to a global heap allocated buffer, for example?

@WalterBright
Copy link
Member Author

@monarchdodra the idea is that one can pass/return it to a function, temporarily transferring ownership, as in:

ScopeBuffer s;
s = doSomething(s, args);

The call/return by register will be faster than by pointer.

@CyberShadow
Copy link
Member

So like C++11's move? Does D have that?

@JakobOvrum
Copy link

So like C++11's move ? Does D have that?

Yes, D has move semantics, and you can force it with std.algorithm.move in the rare case that it cannot be automatic.

@WalterBright
Copy link
Member Author

And that's what I want. An eye-blindingly fast replacement for Appender, which ScopedBuffer is not.

Why not put your ideas into practice and submit a PR for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

@WalterBright
Copy link
Member Author

awwight, who broke the autotester?

@WalterBright
Copy link
Member Author

Seeing your code has given me inspiration to what I think could be both efficient, and publicly acceptable in phobos.

I'll look forward to seeing it. Keep in mind that various features and the way they are used interact in strange ways and can have unexpected consequences in the code generated. It pays off to constantly look at the generated code while developing performance oriented code. And when I say "pays off", factors of 2 are not uncommon.

@rainers
Copy link
Member

rainers commented Mar 16, 2014

I wonder why everyone is so obsessed with stdlib malloc/free. IMO the problem to solve is not the very unlikely collection that might be triggered during an allocation of a temporary buffer, but deterministic removal of that data because you know it didn't escape. GC.free offers just that, too.

I've made a few measurements comparing the stdlib version with the GC using the example provided by blackwhale, with an argument of 250, which causes one allocation and one reallocation per iteration.

On Win32, the GC version takes only 75% of the time of the stdlib version!
On Win64, the GC version needs about 9% longer than the stdlib version.

There are some easy optimizations to the GC version which could make it par on Win64 aswell (removing the GC proxy, making the GC implementation nothrow, disabling setting memory to zero in malloc. etc).

Maybe someone can make a similar measurement for linux...

@dnadlinger
Copy link
Contributor

@WalterBright: I haven't been following this discussion so far, but are the performance results that drove the engineering decisions behind this available somewhere? It would be a shame for people wanting to compare a "safe" implementation to this to have to re-do the benchmarks all over again, especially as the detailed profile of your use case is not known.

@WalterBright
Copy link
Member Author

@rainers there are some C library functions that people have obsessed over making fast for 30 years. malloc/free is one, memcpy is another. Why not take advantage of that? I long ago gave up trying to compete with the C memcpy implementations.

You're right that the malloc speed is less of an issue - ScopeBuffer is designed to try and minimize its use. The larger issue is the loop performance of adding items to the array.

Besides, with the current iteration, the user can decide which allocator he wants to use.

And, of course, there are the users for whom 'GC' is a trigger word for "I won't use it". By using malloc/free instead where the lifetime is known caters to that.

@WalterBright
Copy link
Member Author

@klickverbot as is normal for working with clients, the work is under NDA.

I would have thought that it's non-controversial that getting variables into registers means they run faster. It doesn't really matter how much faster it is, just that it is faster. Individual results will also vary considerably based on which D compiler you use, which CPU, the data set, the surrounding code, etc.

As for why making Phobos implementations run as fast as possible: users of libraries do not care in the slightest what is under the hood - they only care about what they see, which is the API, the size, and the speed. Smaller size and faster speed is always better. Getting the runtime library to be fast has high leverage in that it doesn't just speed up an app, it speeds up every app. It's second in leverage only to better compiler optimizations.

I'd bet that hardly 1 programmer in 100,000 has ever even looked at the source code for their system's malloc/free/memcpy, and fewer than that would make much sense out of it. ScopeBuffer is a peach compared to those gears & wires :-)

For another example, Andrei and I have worried about the performance implications of using ranges and algorithms, versus hand coding. If R&A turned out to be slower than hand coding, then R&A becomes a hard sell. D needs R&A to be zero cost abstractions. To my great pleasure, I have discovered that they really are zero cost abstractions, if (and only if) the person designing the range or algorithm takes care in their implementation.

@andralex
Copy link
Member

I'll pull this now. It won't appear in the documentation and will be strictly for internal use. After a few iterations on use, we can decide its future direction.

@andralex
Copy link
Member

Auto-merge toggled on

@monarchdodra
Copy link
Contributor

Auto-merge toggled off

@monarchdodra
Copy link
Contributor

Auto-merge toggled off

Sorry, I agree with the pull, but let's not jump the gun. There still very useful review about the code coming in. Is this so urgent it can't even wait a day or two's worth of review? I mean, I made a comment where this doesn't even compile:

   ScopeBuffer!(int*) b;
   int*[] s;
   b.put(s); //Error: cannot implicitly convert expression (s[]) of type const(int*)[] to int*[]

Let's at least give Walter the time to react to what was said in the actual review...

@monarchdodra
Copy link
Contributor

this doesn't even compile

Corrections. It doesn't even instanciate!. Just this: ScopeBuffer!(int*) b;

@andralex
Copy link
Member

nice - @WalterBright could you please fix those.

@WalterBright
Copy link
Member Author

Let's at least give Walter the time to react to what was said in the actual review...

But I have posted innumerable replies here and in the Phobos thread on it! ScopeBuffer has been reviewed for a couple months now.

I'll address the instantiation issue you discovered shortly.

@dnadlinger
Copy link
Contributor

Also, where in druntime will this be used?

@WalterBright
Copy link
Member Author

Also, where in druntime will this be used?

I haven't looked. There are many places in Phobos where it can be fruitfully used, such as std.file.

@WalterBright
Copy link
Member Author

@monarchdodra fixed.

@WalterBright
Copy link
Member Author

As for why time is of the essence, there are a lot of much needed changes to Phobos that cannot be done without ScopeBuffer. I'd like to get these into the next release, so we can show major progress towards satisfying users who want to use Phobos with much less reliance on the GC.

@dnadlinger
Copy link
Contributor

Why should it be in druntime if there are no use-cases for it there? (I don't doubt that there might be some, but as druntime is supposed to be somewhat minimal…)

There is no reason why it couldn't go into an internal Phobos module, as far as I can see.

@JakobOvrum
Copy link

As this is now for the internal package and apparently we're in a hurry, I no longer have objections to this even with the unsafe interface. We can always change things later more or less freely.

As for right now, I still question whether it should be in druntime's or Phobos' internal package - Phobos use-cases have been mentioned, but druntime use-cases haven't. Since it can be moved anyway, maybe we should conservatively place it in Phobos until druntime use-cases are proven.

@WalterBright
Copy link
Member Author

Why should it be in druntime if there are no use-cases for it there?

This was discussed in dlang/phobos#1911, but the summary is people thought it was too dirty/hackish/unsafe for Phobos.

I don't know if there are use cases in druntime or not, since I haven't looked.

@monarchdodra
Copy link
Contributor

But I have posted innumerable replies here and in the Phobos thread on it!

I'm really talking about the specific comments made in the code review. I said I was OK with the pull. Just that even then, it's standard procedure for something new/complex to be left open for a bit more than 2 days, to give anyone a chance to catch something.

ScopeBuffer has been reviewed for a couple months now.

I missed that, and I appologize. But I still think it's fair to give it an opportunity for a last review.

BTW: @MartinNowak and I would be curious to have your reply regarding a variadic put.

@dnadlinger
Copy link
Contributor

@WalterBright: As far as I can see, people objected to it becoming public API there, i.e. std.buffer.scopebuffer. We have lots of internal unsafe helpers in Phobos.

Also note that Phobos accessing a druntime-internal module would be a bit sketchy. We have been working hard to make Phobos "just another library", after all.

@WalterBright
Copy link
Member Author

I would be curious to have your reply regarding a variadic put .

Sorry, I had missed that. My thoughts are that although I used ScopeBuffer extensively, there was no use case for that. I tend to eschew adding things until there is a noticeable need for it, because it's much harder to take away unneeded features than it is to add them.

I don't think it matters which internal package scopebuffer is in, after all, it is an internal package and users don't care about internal implementations. Phobos is dependent on druntime, and nothing about scopebuffer being in druntime changes that.

@andralex
Copy link
Member

This underwent significant scrutiny already, a whole lot more than a bunch of more significant work. It's gotten to the point it's unproductive to continue the stream of nits.

I'm seeing potential use in druntime in core.sys.windows.generateSearchPath (far as I can tell it's incorrect at the moment), core.runtime.loadLibrary, and probably a few others, but neither has high impact.

It is true there is no precedent of phobos using druntime internal code. So then let's move this to std/internal "for the last time" and start making use of it. It will be open to future change without breaking client code. Thanks.

@andralex andralex closed this Mar 16, 2014
@MartinNowak
Copy link
Member

I'm seeing potential use in druntime in core.sys.windows.generateSearchPath (far as I can tell it's incorrect at the moment), core.runtime.loadLibrary, and probably a few others, but neither has high impact.

It is true there is no precedent of phobos using druntime internal code. So then let's move this to std/internal "for the last time" and start making use of it. It will be open to future change without breaking client code. Thanks.

Yep, right decision. I wouldn't know of any performance sensitive code in druntime that would benefit from this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants