Skip to content

Conversation

WalterBright
Copy link
Member

This adds a package std.buffer, and the first entry in that package, std.buffer.scopebuffer.

ScopeBuffer is an OutputRange that sits on the stack, and overflows to malloc/free. It's designed to help in eliminating gc allocation by lower level functions that return buffers, such as std.path.buildPath(). With some judicious user tuning of the initial stack size, it can virtually eliminate storage allocation.

Using it is @System, but the user of ScopeBuffer can be @trusted.

An output range like this is a precursor to eliminating the excessive gc use by functions such as buildPath().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO make it a unittest example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added static assert for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason why the below example should not use a documented unit test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@WalterBright
Copy link
Member Author

newsgroup discussion: http://forum.dlang.org/post/ld2586$17f6$1@digitalmars.com

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this block indented?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, do we want to document contracts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this block indented?

Because I thought it looked nicer.

In general, do we want to document contracts?

I don't find trivia to be helpful - in this case, any additional documentation would be trivia.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I thought it looked nicer.

I have not seen that anywhere else. It's not consistent with the reset of Phobos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indenting it is also consistent with how constraints are used in templates, and is analogous in that they apply to the function parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indenting it is also consistent with how constraints are used in templates, and is analogous in that they apply to the function parameters.

It's not consistent with any other in-block. I would say that template constraints are put on the same line as the function declaration if the length of the line allows that. Otherwise they're put on a new line and indented, following the standard style guide for splitting up a single line.

@WalterBright
Copy link
Member Author

can you replace uint with size_t?

The idea here was to get the entire struct into two registers on 64 bit code, which significantly improves performance. size_t is not necessary because it's hard to conceive of a stack allocated temporary buffer larger than 4 gigs. So there is a very good reason why it is uint.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add

/// ditto
size_t opDollar()
{
    return i;
}

here. "/// ditto" is optional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you didn't suggest alias opDollar = length; here, as is usually done?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because most ranges are not containers, and only have size_t length(). Containers, on the other hand, have void length(size_t desired). I think alias opDollar = length; would work correctly in this context, but it could have some weird edge cases... On the other hand, I am 100% confident that re-writing opDollar is correct 100% of the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spot the below overload, but opDollar being an overload set still works, so I think it would only affect code that messes with .opDollar explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the alias. If that turns out to not work, will fix.

@monarchdodra
Copy link
Collaborator

I like this: It can work as a nice deterministic but high performance alternative for Appender or Array. It has less "functionality" than both, so isn't bogged down by "non-features": The fact that i owns its buffer at all times should give it a real edge performance wise. Such an object was requested before as a "Real Appender" by @denis-sh :
http://d.puremagic.com/issues/show_bug.cgi?id=11138

That said, I am concerned by the things complete lack of support for types with postblit, elaborate assignment or constructor. This is sad, because since because ScopedBuffer completely owns its elements, so is ideally placed to (correctly) elide postblits, and correctly manage life cycles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this had been a documented unit test, you'd find this doesn't compile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, my bad.

@JakobOvrum
Copy link
Contributor

This burden of proof is ridiculous. Enregistering is known to help a lot tight loops when performance is an issue. Those may or may not be common depending on the domain etc. but it's bizarre to now require to show the bottom line effect of achieving a common and useful optimization.

It's important in this case because it radically changes the interface. It goes from being safe and encapsulated to a leaky abstraction.

@andralex
Copy link
Member

@JakobOvrum: it is important but the proof has been made.

@DmitryOlshansky
Copy link
Member

This burden of proof is ridiculous. Enregistering is known to help a lot tight loops when performance is an issue. Those may or may not be common depending on the domain etc. but it's bizarre to now require to show the bottom line effect of achieving a common and useful optimization.

Seems to me that I'm in the minority that has to prove things, while others don't. If it's the well-known fact the proof must be trivial, isn't it? When PERFORMANCE in any form comes as an argument for DETRACTING from USABILITY, numbers and test-cases are MUST HAVE.

I feel like you're asking others to do your homework and show to see if's correct, because, of course, you know it all and doing it yourself has no point.

Anyway, here the simplest I've come up with:
https://gist.github.com/blackwhale/9569368

If anything I'm seeing that ~this is consistently faster. You may point out any failures in this snippet.

Some sample runs (compiled both files in one go with dmd -O -inline -release):

D:\D>dtor 25
22000000 time: 624us

D:\D>dtor 25
22000000 time: 621us

D:\D>dtor 25
22000000 time: 618us

D:\D>dtor 25
22000000 time: 618us

D:\D>dtor 45
39600000 time: 1137us

D:\D>dtor 45
39600000 time: 1132us

D:\D>dtor 45
39600000 time: 1135us

D:\D>manual 25
22000000 time: 646us

D:\D>manual 25
22000000 time: 647us

D:\D>manual 25
22000000 time: 654us

D:\D>manual 45
39600000 time: 1153us

D:\D>manual 45
39600000 time: 1161us

D:\D>manual 45
39600000 time: 1233us

@andralex
Copy link
Member

Wait, https://gist.github.com/blackwhale/9569368 uses scope(exit), not a destructor. Big difference.

@DmitryOlshansky
Copy link
Member

@andralex I change it back and forth.

@DmitryOlshansky
Copy link
Member

@andralex

@blackwhale oic. At any rate, the need for proof can stop safely at enregistering.

Then of course, FIle also should avoid having a destructor, right? We'd be able to enregister that tiny struct! Less code and faster ;) But you'd agree that it makes no sense, precisely because the gains (if any) are minuscule compared to risk of a resource leak.

I believe that anything that aims to be faster should have quantifiable benefits. Be it very specially crafted benchmark, or some metric derived with profiler or other tool.

@andralex
Copy link
Member

I agree we should add regular benchmarking as a matter of procedure to the toolchain. That said, this particular part of the discussion about proving performance improvements has run its course. I worked with Walter on said project. Yes, there are performance gains. Yes, in some cases they are spectacular. It's a memory buffer, which is very core. It doesn't take much expertise to figure that once enregistering happens, a lot of good stuff comes with it.

I understand there are material objections to this module, so let's better focus on those. We can make it private to Phobos, move it to druntime, undocumment it and build a safer abstraction on top of it, etc.

My own objection is that a very non-Phobosy module claims front and center stage position in std.buffer.scopebuffer. It should hang out in some internal/private/bits module.

@DmitryOlshansky
Copy link
Member

I agree we should add regular benchmarking as a matter of procedure to the toolchain.

Great.

Yes, there are performance gains. Yes, in some cases they are spectacular. It's a memory buffer, which is very core.

I see that I failed to deliver the message. Said spectacular gains should be easily testable - just compile a version with destructor vs scope(exit) and tell us what's the difference in run-time, if anything I'm curious.

Is it this request that hard to accommodate? What's wrong with you guys? Why constant appeal to authority and doesn't take much expertise instead of simple facts?

(Especially as these versions must be just different commits on the same source tree)

That said, this particular part of the discussion about proving performance improvements has run its course

Nice. Well, uh-oh.

My own objection is that a very non-Phobosy module claims front and center stage position in std.buffer.scopebuffer. It should hang out in some internal/private/bits module.

Something I can agree with.

@MartinNowak
Copy link
Member

Before drifting away into detailed implementation discussions please help to clarify a few things. @Dicebot summarized my concern very nicely (#1911 (comment)).
While I'm agreeing that this is a nice tool for certain tasks, this pull is a prime example for another uncoordinated and incomplete piece.

  • What problem are you trying to solve?
  • How does this relate to OutBuffer and Appender?
  • How will this help to avoid allocations in phobos functions?

It's just that I don't know how I could ever explain this mess to someone when advertising D.
I think a std.buffer package for optimized output ranges is a good start.

@andralex
Copy link
Member

Is it this request that hard to accommodate? What's wrong with you guys? Why constant appeal to authority and doesn't take much expertise instead of simple facts?

No need to get agitated. The fact of the matter is most everybody in our group, including probably yourself, has made changes to code that they argued and alleged it improved performance, and there was seldom a request for (or providing of) hard proof.

The thing is, asking for hard proof without a systematic benchmarking framework is a tall order. One would need to build a synthetic benchmark that exercises code generation in similar ways as the real application, without pulling in a significant fraction of it. Once than, the few people on this review would be like, "mmkay, fine" and that entire work goes to nothing.

All of that would change if we did have a systematic benchmarking framework. I'd say it's very productive to champion one and use this discussion as part of the motivating example. Asking for exhaustive proof here is, in my opinion, a bit much.

I think concerns along the lines of @MartinNowak are the ones we need to address here.

@DmitryOlshansky
Copy link
Member

I think I should probably refrain from commenting on this, so my final remarks on subject of low-level optimization at all costs.

The fact of the matter is most everybody in our group, including probably yourself, has made changes to code that they argued and alleged it improved performance, and there was seldom a request for (or providing of) hard proof.

We did, and I always ask for it. How informal the reported numbers are varies wildly from pull to pull, but none I recall have come through solely on being theoretically solid.

One would need to build a synthetic benchmark that exercises code generation in similar ways as the real application, without pulling in a significant fraction of it.

Since you reported spectacular gains, obviously they are easy to observe else how would you confirm it in the first place? Since application in question obviously was benchmarked on some real work, then just use it as an indicator, that is all I asked. No exhaustive proof required, just tell us what gains you've got (privately on your own project).

It's just that I suspect (no proof, but test runs with my tiny snippet suggest) that the gains of having it w/o destructor are immeasurable.

Once than, the few people on this review would be like, "mmkay, fine" and that entire work goes to nothing.

Which indeed happens and no amount of benchmarking harness allows us to forget the common use case we optimize for, that is a benchmark after all.

@andralex
Copy link
Member

Since you reported spectacular gains, obviously they are easy to observe else how would you confirm it in the first place? Since application in question obviously was benchmarked on some real work, then just use it as an indicator, that is all I asked. No exhaustive proof required, just tell us what gains you've got (privately on your own project).

It's just that I suspect (no proof, but test runs with my tiny snippet suggest) that the gains of having it w/o destructor are immeasurable.

I don't understand this. Is it that Walter is lying if he doesn't tell you numbers? You refuse to take his word? He needs to sit down now and change code and collect numbers that show you things? What's this putting the hounds on someone all of a sudden whereas in all other case it's been like "yeah, fine, merge".

@andralex
Copy link
Member

FWIW the project will be open sourced soon and available for scrutiny. I still think this obstinate asking for evidence is not proportional response.

@ghost
Copy link

ghost commented Mar 15, 2014

OT: What project are you and Walter working on? Was it some kind of collaboration? Very interesting!

@DmitryOlshansky
Copy link
Member

I don't understand this. Is it that Walter is lying if he doesn't tell you numbers? You refuse to take his word?

I trust the machinery was done right and the code generated must be looking awesome. I don't easily trust that sacrificing usability was justified.

Putting things into perspective - I started this rant as explicitly removing destructor in publicly advertised primitive "that saves Phobos from GC leaks" is ehm in need of a good reason.

Right now D has a large problem with Phobos leaking memory like a ship made out of a cheese grater. This problem is definitely putting people off from using D (rightly or wrongly). We must address it. ScopeBuffer is the answer to a lot of that, while delivering the best performance we can offer as well.

To put it simply - I don't know yet how much was brought in practice with trading away the destructor, and I failed to observe the gains myself. I do understand the significant loss in usability however. I thought ScopeBuffer would be a different primitive with wider scope ;) and simpler usage for generic code but I understand I can't affect that.

Now that it is core.internal.scopebuffer much of my original motivation for getting the justification
evaporated. I really don't care about the interface or usability of it any more.

He needs to sit down now and change code and collect numbers that show you things?

So you don't trust that I naturally believed he did something like that before disabling the destructor? Well, alternatively he could have solely observed ASM listings and focused on the code generated. Telling just that would help me understand things.

@WalterBright
Copy link
Member Author

| What problem are you trying to solve?

Using the stack for temporary buffers rather than storage allocation. Avoid generating garbage. Highest possible speed at doing things like string processing.

How does this relate to OutBuffer and Appender?

They're too slow.

How will this help to avoid allocations in phobos functions?

Many phobos functions internally use GC allocations for temporary buffers, and then rely on a GC sweep to clean them up. These need to be replaced with ScopeBuffers as much as possible. std.file is a prime example.

@andralex
Copy link
Member

added to druntime

@andralex andralex closed this Mar 16, 2014
@WalterBright
Copy link
Member Author

Now moved to std.internal.scopebuffer

@WalterBright WalterBright reopened this Mar 16, 2014
win64.mak Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this and make sure you test

@andralex
Copy link
Member

Auto-merge toggled on

andralex added a commit that referenced this pull request Mar 17, 2014
@andralex andralex merged commit 60e3c54 into dlang:master Mar 17, 2014
@monarchdodra
Copy link
Collaborator

This is missing the void put(const(T)[] s)/ScopeBuffer!(int*) fix from druntime. Could someone write the patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.