Parameterized unittests and benchmarks #2995

burner · 2015-02-16T18:36:18Z

Basically, Haskell Quickcheck with in build benchmarking for performance regression testing.

IMO phobos is missing something like Quickfix (randomized test data generation). Additionally, it would be nice to use the tests as a way to measure the performance of the functions and the code generation over time.

The benchmark results are written to a file as csv. The program benchmarkplotter.d generates .dat and .gp files for all uniquely named benchmarks. gnuplot then generates a plot of the benchmark values over time. This could than be displayed on dlang.org to show our progress and be transparent about our performance.

make BUILD=benchmark enables all benchmarks. For more info see example in std/experimental/randomized_unittest_benchmark.d

/// The following examples show an overview of the given functionalities.
unittest
{
    void theFunctionToTest(int a, float b, string c)
    {
        // super expensive operation
        auto rslt = (a + b) * c.length;

        /* Pass the result to doNotOptimizeAway so the compiler
        can not remove the expensive operation, and thereby falsify the
        benchmark.
        */
        doNotOptimizeAway(rslt);

        debug
        {
            /* As the paramters to the function assume random values,
            $(D benchmark) allows to quickly test function with various input
            values. As the verification of computed value or state will at to
            the runtime of the function to benchmark, it makes sense to only
            execute these verifications in debug mode.
            */
            assert(c.length ? true : true);
        }
    }

    /* $(D benchmark) will run the function $(D theFunctionToTest) as often as
    possible in 1 second. The function will be called with randomly selected
    values for its parameters.
    */
    benchmark!theFunctionToTest();
}

dub: https://github.com/burner/std.benchmark

nordlow · 2015-02-16T21:01:52Z

I've thought about implementing this. Good thing you've started it. Things I've been dreaming about:

Benchmarker keeps track of total time spent and stops iteration (progression of test data sizes) after a caller-specified total clock time. To be exact this will require prediction of execution time for next data iteration (see next).
Guess time-complexity by fitting samples to a list of models, say O(n), O(n^^2), ..., O(n*log(n)). This can be used in previous point to predict execution time given a specific data size.
Multi-Dimensional Size Iterations: Multi-dimensional data structures (containers, ranges) in, for instance, linear algebra packages would need to have some way to tell the benchmarker about their dimensionality and how to instantiate them given a sample of the these size dimensions. It would be really cool to have a benchmarker that random samples this space in a clever way and visualizes the result in some clever (spatial) way. I believe that would require extending the existing hierarchy of Ranges in Phobos to Multi-Dimensional variants. That might become complicated, though. Nevertheless https://github.com/kyllingstad/scid/tree/master/source/scid might gives some inspirations. Do you see any applications of your own in this regard?

burner · 2015-02-16T23:01:40Z

I'm not sure about the max run time stop part (but I accept PRs :-) )
The time complexity stuff would be nice if we had a way to feed it back in into the source. (aka auto doc)
what do you mean by multi-dim size iterators?
I'm planning to have another unittest target running all the benchmark unitttest and have the result be convert into a gnuplot graph that we can show on the webpage.
next I'm will create a unicode string generator

nordlow · 2015-02-16T23:15:39Z

I updated my previous comment. I hope that makes my idea more clear.

burner · 2015-04-08T09:41:47Z

anyone?

MartinNowak · 2015-04-10T10:30:58Z

I have a very complete library for this kind of random testing, build after QuickCheck.
http://code.dlang.org/packages/qcheck
Andrei recently asked me to move the arbitrary part of it to std.random or so. Now you're aging something similar and duplicate existing work std.benchmark.
While I think this is a good addition to phobos we should probably go the long route, integrate existing stuff with your code and phobos.
I could really take some help on getting arbitrary Phobos ready.

burner · 2015-04-10T12:49:09Z

IMO the long road is wrong. qcheck, as far as I can see it, was not build for benchmarking. and std.benchmark was not build for testing. Both were never meant to work together, let alone in a consistent way. Therefore, I believe combining both will not fly and be a total mess. Making the combination work, would properly yield this PR. Just because these two projects exists does not mean they have to be used.

On the other hand PR shows a clear vision. It presents a clear track how to integrate all the way from phobos to marketing D and phobos on dlang.org. It works seamlessly with TypeTuple foreach loops. The Implementation and usage is trivial and was build to fit together. Extending the random value generation to new types is easy and flexible. benchmarkplotter.d is a nice showcase for D and especially phobos, IMO. It is properly trivial to integrate the benchmarking and record-keeping in the autotester (the BUILD target already exists). I can go on, how this PR will make all the bikeshedding about benchmarking superfluous and how it puts pressure one PRs to not worsen performance which leads to got marketing and so forth.

But most importantly, this PR is ready now. Not in a week or a year. Now!

p.s. Sorry for the rant, I'm sometimes just really annoyed by the scared, half-cocked decision making that seams to be governing D development. Instead of bold, risk taking, vision driven decision making.

p.p.s. Even if this toolchain is a bust there is no risk. It is in internal and will not be exposed to the public, apart from the gnunplot graphs.

MartinNowak · 2015-04-10T20:17:22Z

std/traits.d

+template RefType(T) 
+{
+	alias RefType = ref T;
+}


Adding a public symbol without documentation is a no-go.

moved that. as ref if strange anywhere

MartinNowak · 2015-04-10T20:24:05Z

qcheck, as far as I can see it, was not build for benchmarking. and std.benchmark was not build for testing

That's a pretty made up argument, benchmark was made for benchmarking and qcheck was made for generating random values. Now you want to benchmark with random value, well...
Anyhow, I was just trying to take the opportunity to push an import phobos addition (std.benchmark), and also wanted to take up the existing idea to move arbitrary to phobos.

MartinNowak · 2015-04-10T20:26:43Z

I think you also mean QuickCheck right?

MartinNowak · 2015-04-10T20:29:20Z

std/string.d

+            {
+                immutable theChar = cast(dchar)charsToSearch[toSearchFor];
+                auto idx = toSearchIn.indexOf(theChar);
+                ben.stop();


What's that ben.stop() doing here? Who is reactivating the stopwatch?

generator is reactivation the stopwatch

MartinNowak · 2015-04-10T20:41:03Z

That's a lot of ugly boilerplate code for a single benchmark.

    auto ben = Benchmark(format("string.lastIndexOf(%s,%s)", 
        S.stringof, R.stringof));

    auto generator = RndValueGen!(
        GenUnicodeString!(S, 10, 500), 
        GenUnicodeString!(R, 1, 10))
            (rnd, ben, rounds);

    foreach(S toSearchIn, R toSearchFor; generator)
    {
        auto idx = toSearchIn.lastIndexOf(toSearchFor);
        ben.stop();

        if (idx != -1) {
            assert(equal(
                toSearchIn[idx .. idx + to!S(toSearchFor).length],
                toSearchFor));
        }
    }

I'd suggest to redesign this around functions.

void testee(S, R)(S toSearchIn, R toSearchFor)
{
    auto idx = toSearchIn.lastIndexOf(toSearchFor);
    if (idx != -1) {
        assert(equal(
            toSearchIn[idx .. idx + to!S(toSearchFor).length],
            toSearchFor));
    }
}

runBench!(testee, GenUnicodeString!(S, 10, 500), GenUnicodeString!(R, 1, 10))();

You could even infer the arguments from the function parameters (runBench!(testee!(S, R))) and go with a generic config for things like low and high, though runBench should still take generators.
https://github.com/MartinNowak/qcheck/blob/master/src/qcheck/config.d

MartinNowak · 2015-04-10T20:44:36Z

If you're benchmarking functions like indexOf I'd be worried about the overhead of clock_gettime called by StopWatch.stop/start().

MartinNowak · 2015-04-10T20:45:40Z

std/internal/test/randomized_unittest_benchmark.d

+        the $(D theFunctionToTest). The member function $(D stop) of 
+        $(D Benchmark) stops the stopwatch. The stopwatch is automatically
+        resumed when the loop is continued. */
+        ben.stop(); 


If you're benchmarking, the that's a release build with assertions disabled.

MartinNowak · 2015-04-10T20:50:24Z

At least you need to do something to handle noisy benchmark results, the simplest effective approach is to take the best out of 10 runs or so.
http://forum.dlang.org/post/jlsi14$1pi5$1@digitalmars.com
Please have a look at Andrei's benchmark module. It contains some very good concepts https://github.com/andralex/phobos/blob/benchmark/std/benchmark.d.

burner · 2015-04-11T13:24:09Z

I'd be worried about the overhead of clock_gettime

I do worry about this. I'm have not fund a good solution yet. maybe generate data for multiple runs and than just serve it.

the forum link didn't lead me anywhere

burner · 2015-04-11T13:31:10Z

runBench!(testee, GenUnicodeString!(S, 10, 500), GenUnicodeString!(R, 1, 10))();

I'm not to sure if that will reduce the code size in the long run. All that and testee needs to be include into a unittest and it is missing the name of the benchmark etc and you can't stop the stopwatch for the assertion.

I could collect some info with ctfe, but I rather like it obviouse. I will think about this some more.

burner · 2015-04-11T13:34:19Z

thank you for reviewing

MartinNowak · 2015-04-11T18:18:50Z

you can't stop the stopwatch for the assertion.

No need to do that, assertion are disabled for release benchmarks anyhow.
That also solves part of the clock_gettime cost problem, if generating test data is cheap enough (or is pregenerated).

MartinNowak · 2015-04-11T18:19:11Z

missing the name of the benchmark

Just take the name of the function then.

wilzbach · 2016-05-11T10:28:08Z

etc/c/donotoptimizeaway/donotoptimizeaway.c

@@ -0,0 +1,29 @@
+#include <stdio.h>


Is there no way we can do this in D?
Currently there is a huge movement to get rid of etc/c ...

wilzbach · 2016-05-11T10:34:15Z

@wilzbach awesome. I will rebase to make it merge again

I am happy to push people!
Feel free to make a newsgroup announcement - I will collect the feedback then ;-)
Looking at the last announcement there are probably a few things to bear in mind

title should have [phobos-experimental] tag
short & concise description
people want to preview an addition with dub

burner · 2016-06-18T15:04:47Z

@wilzbach still feel like being review manager?

wilzbach · 2016-06-18T15:20:08Z

@wilzbach still feel like being review manager?

Sure - as said I am more than happy to ensure that feedback will be collected ;-)
Should I make the NG announcement or do you want to describe your project yourself?

burner · 2016-06-18T15:58:32Z

If you could please start the NG thread. That would be awesome.

wilzbach · 2016-06-19T13:18:33Z

If you could please start the NG thread. That would be awesome.

I hope this announcement describes your PR correctly.

JackStouffer · 2016-06-19T18:46:05Z

There's something wrong with your docs, see the third example: http://dtest.thecybershadow.net/artifact/website-73c6475e4be442350a727d80e562bb2328f494b1-af1863fd56f736d1f27cee0862282c44/web/library-prerelease/std/experimental/randomized_unittest_benchmark.html

nomad-software · 2016-06-19T19:28:45Z

IMHO the namespace needs a little work. std.experimental.randomized_unittest_benchmark is not very scalable and ruins the namespace for future work. I would suggest std.experimental.unittest.benchmark.randomised or something near. Then other work could be added around those specific modules.

JackStouffer · 2016-06-28T17:35:01Z

You should remove all global selective imports. While DMD now warns about them, LDC users who import the library manually will still experience 314 with no warning.

JackStouffer · 2016-06-28T17:37:13Z

std/experimental/randomized_unittest_benchmark.d

+    else static if (is(T : GenASCIIString!(S), S...))
+        alias ParameterToGen = T;
+    else
+        static assert(false);


Need an error message here

I will fix that

andralex · 2016-08-24T13:54:46Z

Plopping this here to make sure: http://forum.dlang.org/post/npk8rb$13pk$1@digitalmars.com

burner · 2016-08-24T14:51:01Z

@andralex IMO the current approach is superior to your suggests.

You can access the surrounding non global scope to access additional no random data.
This approach also allows me create a patch makes it possible to seamlessly pass additional data as function arguments.
My approach does not require a different test runner. Your approach requires to test all functions in all modules if they are benchmarks. Not hard but unnecessarily complex, especially when we have the unit test runner.
Aggregate.median already exists. The additional tool I created even allows you to track the performance over time (see Figure at the top).
If you can have functions you can have template functions. That allows you to use all their magic. Sure you can do that with your approach as well, but that would require to create another function that calls the benchmark function. And I would argue using unittests as that function is more consistent.
The binding between parameter value range and parameter is clearer in my approach. (Is Aggregate.median the value of float b?)
IMO your example is overusing UDAs. Sure UDAs are awesome, but that does not mean we have to invent yet another DSL.

andralex · 2016-08-24T15:03:41Z

@andralex IMO the current approach is superior to your suggests.

A productive thing to do would be to look at other benchmark frameworks and see how yours stacks against them. But at the end I agree it may end up as you and I having different design sensibilities.

You can access the surrounding non global scope to access additional no random data.

What I wrote was but a sketch, shouldn't be taken as a complete design. Clearly more features may be added (such as fixtures, before/after code and state, etc).

This approach also allows me create a patch makes it possible to seamlessly pass additional data as function arguments.

The same can be trivially achieved with the benchmark you just fix Param!3(3.14). What am I missing?

My approach does not require a different test runner. Your approach requires to test all functions in all modules if they are benchmarks. Not hard but unnecessarily complex, especially when we have the unit test runner.

I agree that there's more work involved with introspecting the attributes. But that work will be reused across all benchmarks, making it easy to write many individual benchmarks. It's scalability and nicely applied.

Aggregate.median already exists. The additional tool I created even allows you to track the performance over time (see Figure at the top).

Sure, but that's not a competitive trait; I assume such a feature would be present in any framework along with a few others (min, max, average, p90 come to mind).

If you can have functions you can have template functions. That allows you to use all their magic. Sure you can do that with your approach as well, but that would require to create another function that calls the benchmark function. And I would argue using unittests as that function is more consistent.

I'm not sure how you mean that. Surely attributes can be applied to template functions?

The binding between parameter value range and parameter is clearer in my approach. (Is Aggregate.median the value of float b?)

It is clearer because the function is a benchmark, not a function being benchmarked.

IMO your example is overusing UDAs. Sure UDAs are awesome, but that does not mean we have to invent yet another DSL.

Now this is just a fallacy: https://en.wikipedia.org/wiki/Begging_the_question

andralex · 2016-08-24T15:07:04Z

The main thing here is that an attribute-based framework benchmarks existing functions that are not written necessarily to implement benchmark, whereas this PR defines a framework in which the user must define benchmarks. It follows that an attribute-based framework needs less code from the user.

This PR: "You write the benchmarks and I call them"
Attribute-based: "You write your usual application code and mention what values it should be benchmarked with, and I'll take care of the rest".

There is no contest here.

burner · 2016-08-24T15:28:40Z

The main thing here is that an attribute-based framework benchmarks existing functions that are not
written necessarily to implement benchmark, whereas this PR defines a framework in which the user
must define benchmarks. It follows that an attribute-based framework needs less code from the user.

I think I do not understand what you mean. I mean

int someFunctionYouWantToBenchmark(int a, int b);

unittest {
    void fun(int a, int b) {
        int r = someFunctionYouWantToBenchmark(a, b);
       ....
    }

    benchmark!fun();

    // update
    benchmark!someFunctionYouWantToBenchmark();
    // also works ;)
}

At look at the list at stackoverflow and more closely at those that looked good. This PR has all there features (some are implicitly given by D itself), additionally I couldn't find any library that allows for user defined types. This PR allows that.

Param!3(3.14) What am I missing?

You have two places to keep in sync, which is worse than one place. Being DRY and all. And 3 is not really clear IMO ;-)

I agree that there's more work involved with introspecting the attributes.

That is not really a problem, if it were no library ever would be created. IMO UDA are harder to use in the long run (I have no data, just my opinion on UDAs from what I have seen and used in D libraries.)

It is clearer because the function is a benchmark, not a function being benchmarked.

I do not understand what you mean.

There is no contest here.

Well sort of, not between us, but between our ideas. Because, I hope you would agree that providing both approaches is not a good idea, that means we either have to choose one, or invent something better.

forgot a file some more made it compile again some more rebase something makes some trouble translate also works some tutorial makefile update for windows started to create the program that will create the graphs the .dat gets generated some plotting no more release and RefType docu some more another update updates just some tickering rebase RefType broken more tinkering more tinkering works again some work on the plotter some plotter work candlesticks another update whitespace fix whitespace part 2 move and donotoptimizeaway a restart made it compile again win attempt andrei round 1 whitespace win32 try updates whitespace another makefile try some update small fix more fixing another build bug a litte hack dmc do not optimize away fix started moving string benchmark to seperate file * to remove cyclic includes some reworking makefile fix whitespace makefile fix veelo corrections veelo strikes again improved donotoptimize away another getpid test sort of builds again some formatting nitpicks more nipicks whitespace whitespace redoing do not optimize away doNotOptimizeAway whitespace format

dlang-bot · 2016-08-24T18:11:30Z

@burner, thanks for your PR! By analyzing the annotation information on this pull request, we identified @andralex, @JesseKPhillips and @9rnsr to be potential reviewers. @andralex: The PR was automatically assigned to you, please reassign it if you were identified mistakenly.

(The DLang Bot is under development. If you experience any issues, please open an issue at its repo.)

codecov-io · 2016-08-24T18:20:48Z

Codecov Report

Merging #2995 into master will increase coverage by 0.02%.
The diff coverage is 97.66%.

@@            Coverage Diff             @@
##           master    #2995      +/-   ##
==========================================
+ Coverage   88.78%   88.81%   +0.02%     
==========================================
  Files         121      122       +1     
  Lines       74161    74375     +214     
==========================================
+ Hits        65847    66057     +210     
- Misses       8314     8318       +4

Impacted Files	Coverage Δ
std/string.d	`99.71% <ø> (ø)`	⬆️
std/experimental/randomized_unittest_benchmark.d	`97.66% <97.66%> (ø)`
std/concurrency.d	`83.42% <0%> (+0.17%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update beaba6e...8841a53. Read the comment docs.

burner · 2017-05-17T11:11:20Z

superseeded by https://github.com/burner/benchmark

burner added the Review:Needs Work label Feb 16, 2015

burner changed the title ~~Parameter unittest and benchmark~~ Parameterized unittests and benchmarks Feb 16, 2015

burner added the Review:Needs Review label Feb 17, 2015

burner force-pushed the parameter_unittest_and_benchmark branch from 7be5538 to 6176b16 Compare March 16, 2015 10:51

burner force-pushed the parameter_unittest_and_benchmark branch from b73d2b2 to 4b92fd1 Compare March 23, 2015 18:47

burner removed the Review:Needs Work label Mar 25, 2015

burner force-pushed the parameter_unittest_and_benchmark branch from 216d56d to 44c435f Compare March 31, 2015 23:22

burner force-pushed the parameter_unittest_and_benchmark branch from 44c435f to 391a3d1 Compare April 9, 2015 15:01

This was referenced Apr 10, 2015

add std.uni.toCapitalized() #3167

Merged

Enhance std.string.indexOf() to work with ranges #3172

Merged

MartinNowak reviewed Apr 10, 2015
View reviewed changes

wilzbach reviewed May 11, 2016
View reviewed changes

burner mentioned this pull request May 31, 2016

More lexers lodo1995/experimental.xml#12

Closed

burner force-pushed the parameter_unittest_and_benchmark branch from 2792fa9 to cb91641 Compare June 18, 2016 15:00

burner force-pushed the parameter_unittest_and_benchmark branch from cb91641 to 7ea67d8 Compare June 18, 2016 15:08

JackStouffer reviewed Jun 28, 2016
View reviewed changes

burner force-pushed the parameter_unittest_and_benchmark branch from 7ea67d8 to b937c27 Compare June 30, 2016 14:28

burner force-pushed the parameter_unittest_and_benchmark branch from b937c27 to 8841a53 Compare August 24, 2016 18:11

dlang-bot assigned andralex Aug 24, 2016

wilzbach mentioned this pull request Aug 26, 2016

Added fast paths in std.utf.byUTF #4642

Merged

wilzbach mentioned this pull request May 7, 2017

Add the MonoTime equivalents of std.datetime.StopWatch/benchmark. #5367

Merged

burner closed this May 17, 2017

wilzbach mentioned this pull request May 22, 2017

Add dontOptimizeAway to std.datetime.stopwatch #5416

Merged

Uh oh!

Parameterized unittests and benchmarks #2995

Parameterized unittests and benchmarks #2995

Uh oh!

Conversation

burner commented Feb 16, 2015 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nordlow commented Feb 16, 2015

Uh oh!

burner commented Feb 16, 2015

Uh oh!

nordlow commented Feb 16, 2015

Uh oh!

burner commented Apr 8, 2015

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

burner commented Apr 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

burner commented Apr 11, 2015

Uh oh!

burner commented Apr 11, 2015

Uh oh!

burner commented Apr 11, 2015

Uh oh!

MartinNowak commented Apr 11, 2015

Uh oh!

MartinNowak commented Apr 11, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wilzbach commented May 11, 2016

Uh oh!

burner commented Jun 18, 2016

Uh oh!

wilzbach commented Jun 18, 2016

Uh oh!

burner commented Jun 18, 2016

Uh oh!

wilzbach commented Jun 19, 2016

Uh oh!

JackStouffer commented Jun 19, 2016

Uh oh!

nomad-software commented Jun 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackStouffer commented Jun 28, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andralex commented Aug 24, 2016

Uh oh!

burner commented Aug 24, 2016

Uh oh!

andralex commented Aug 24, 2016

Uh oh!

andralex commented Aug 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

burner commented Feb 16, 2015 •

edited

Loading

nomad-software commented Jun 19, 2016 •

edited

Loading

andralex commented Aug 24, 2016 •

edited

Loading

burner commented Aug 24, 2016 •

edited

Loading

codecov-io commented Aug 24, 2016 •

edited

Loading