-
-
Notifications
You must be signed in to change notification settings - Fork 741
Parameterized unittests and benchmarks #2995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've thought about implementing this. Good thing you've started it. Things I've been dreaming about:
|
|
I updated my previous comment. I hope that makes my idea more clear. |
7be5538
to
6176b16
Compare
b73d2b2
to
4b92fd1
Compare
216d56d
to
44c435f
Compare
anyone? |
44c435f
to
391a3d1
Compare
I have a very complete library for this kind of random testing, build after QuickCheck. |
IMO the long road is wrong. qcheck, as far as I can see it, was not build for benchmarking. and std.benchmark was not build for testing. Both were never meant to work together, let alone in a consistent way. Therefore, I believe combining both will not fly and be a total mess. Making the combination work, would properly yield this PR. Just because these two projects exists does not mean they have to be used. On the other hand PR shows a clear vision. It presents a clear track how to integrate all the way from phobos to marketing D and phobos on dlang.org. It works seamlessly with TypeTuple foreach loops. The Implementation and usage is trivial and was build to fit together. Extending the random value generation to new types is easy and flexible. benchmarkplotter.d is a nice showcase for D and especially phobos, IMO. It is properly trivial to integrate the benchmarking and record-keeping in the autotester (the BUILD target already exists). I can go on, how this PR will make all the bikeshedding about benchmarking superfluous and how it puts pressure one PRs to not worsen performance which leads to got marketing and so forth. But most importantly, this PR is ready now. Not in a week or a year. Now! p.s. Sorry for the rant, I'm sometimes just really annoyed by the scared, half-cocked decision making that seams to be governing D development. Instead of bold, risk taking, vision driven decision making. p.p.s. Even if this toolchain is a bust there is no risk. It is in internal and will not be exposed to the public, apart from the gnunplot graphs. |
template RefType(T) | ||
{ | ||
alias RefType = ref T; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a public symbol without documentation is a no-go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved that. as ref if strange anywhere
That's a pretty made up argument, benchmark was made for benchmarking and qcheck was made for generating random values. Now you want to benchmark with random value, well... |
I think you also mean QuickCheck right? |
{ | ||
immutable theChar = cast(dchar)charsToSearch[toSearchFor]; | ||
auto idx = toSearchIn.indexOf(theChar); | ||
ben.stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's that ben.stop()
doing here? Who is reactivating the stopwatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generator is reactivation the stopwatch
That's a lot of ugly boilerplate code for a single benchmark. auto ben = Benchmark(format("string.lastIndexOf(%s,%s)",
S.stringof, R.stringof));
auto generator = RndValueGen!(
GenUnicodeString!(S, 10, 500),
GenUnicodeString!(R, 1, 10))
(rnd, ben, rounds);
foreach(S toSearchIn, R toSearchFor; generator)
{
auto idx = toSearchIn.lastIndexOf(toSearchFor);
ben.stop();
if (idx != -1) {
assert(equal(
toSearchIn[idx .. idx + to!S(toSearchFor).length],
toSearchFor));
}
} I'd suggest to redesign this around functions. void testee(S, R)(S toSearchIn, R toSearchFor)
{
auto idx = toSearchIn.lastIndexOf(toSearchFor);
if (idx != -1) {
assert(equal(
toSearchIn[idx .. idx + to!S(toSearchFor).length],
toSearchFor));
}
}
runBench!(testee, GenUnicodeString!(S, 10, 500), GenUnicodeString!(R, 1, 10))(); You could even infer the arguments from the function parameters ( |
If you're benchmarking functions like |
the $(D theFunctionToTest). The member function $(D stop) of | ||
$(D Benchmark) stops the stopwatch. The stopwatch is automatically | ||
resumed when the loop is continued. */ | ||
ben.stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're benchmarking, the that's a release build with assertions disabled.
At least you need to do something to handle noisy benchmark results, the simplest effective approach is to take the best out of 10 runs or so. |
I do worry about this. I'm have not fund a good solution yet. maybe generate data for multiple runs and than just serve it. the forum link didn't lead me anywhere |
runBench!(testee, GenUnicodeString!(S, 10, 500), GenUnicodeString!(R, 1, 10))(); I'm not to sure if that will reduce the code size in the long run. All that and testee needs to be include into a unittest and it is missing the name of the benchmark etc and you can't stop the stopwatch for the assertion. I could collect some info with ctfe, but I rather like it obviouse. I will think about this some more. |
thank you for reviewing |
No need to do that, assertion are disabled for release benchmarks anyhow. |
Just take the name of the function then. |
@@ -0,0 +1,29 @@ | |||
#include <stdio.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way we can do this in D?
Currently there is a huge movement to get rid of etc/c
...
I am happy to push people!
|
2792fa9
to
cb91641
Compare
@wilzbach still feel like being review manager? |
cb91641
to
7ea67d8
Compare
Sure - as said I am more than happy to ensure that feedback will be collected ;-) |
If you could please start the NG thread. That would be awesome. |
I hope this announcement describes your PR correctly. |
There's something wrong with your docs, see the third example: http://dtest.thecybershadow.net/artifact/website-73c6475e4be442350a727d80e562bb2328f494b1-af1863fd56f736d1f27cee0862282c44/web/library-prerelease/std/experimental/randomized_unittest_benchmark.html |
IMHO the namespace needs a little work. |
You should remove all global selective imports. While DMD now warns about them, LDC users who import the library manually will still experience 314 with no warning. |
else static if (is(T : GenASCIIString!(S), S...)) | ||
alias ParameterToGen = T; | ||
else | ||
static assert(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need an error message here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix that
7ea67d8
to
b937c27
Compare
Plopping this here to make sure: http://forum.dlang.org/post/npk8rb$13pk$1@digitalmars.com |
@andralex IMO the current approach is superior to your suggests.
|
A productive thing to do would be to look at other benchmark frameworks and see how yours stacks against them. But at the end I agree it may end up as you and I having different design sensibilities.
What I wrote was but a sketch, shouldn't be taken as a complete design. Clearly more features may be added (such as fixtures, before/after code and state, etc).
The same can be trivially achieved with the benchmark you just fix
I agree that there's more work involved with introspecting the attributes. But that work will be reused across all benchmarks, making it easy to write many individual benchmarks. It's scalability and nicely applied.
Sure, but that's not a competitive trait; I assume such a feature would be present in any framework along with a few others (min, max, average, p90 come to mind).
I'm not sure how you mean that. Surely attributes can be applied to template functions?
It is clearer because the function is a benchmark, not a function being benchmarked.
Now this is just a fallacy: https://en.wikipedia.org/wiki/Begging_the_question |
The main thing here is that an attribute-based framework benchmarks existing functions that are not written necessarily to implement benchmark, whereas this PR defines a framework in which the user must define benchmarks. It follows that an attribute-based framework needs less code from the user.
There is no contest here. |
I think I do not understand what you mean. I mean int someFunctionYouWantToBenchmark(int a, int b);
unittest {
void fun(int a, int b) {
int r = someFunctionYouWantToBenchmark(a, b);
....
}
benchmark!fun();
// update
benchmark!someFunctionYouWantToBenchmark();
// also works ;)
} At look at the list at stackoverflow and more closely at those that looked good. This PR has all there features (some are implicitly given by D itself), additionally I couldn't find any library that allows for user defined types. This PR allows that.
You have two places to keep in sync, which is worse than one place. Being DRY and all. And 3 is not really clear IMO ;-)
That is not really a problem, if it were no library ever would be created. IMO UDA are harder to use in the long run (I have no data, just my opinion on UDAs from what I have seen and used in D libraries.)
I do not understand what you mean.
Well sort of, not between us, but between our ideas. Because, I hope you would agree that providing both approaches is not a good idea, that means we either have to choose one, or invent something better. |
forgot a file some more made it compile again some more rebase something makes some trouble translate also works some tutorial makefile update for windows started to create the program that will create the graphs the .dat gets generated some plotting no more release and RefType docu some more another update updates just some tickering rebase RefType broken more tinkering more tinkering works again some work on the plotter some plotter work candlesticks another update whitespace fix whitespace part 2 move and donotoptimizeaway a restart made it compile again win attempt andrei round 1 whitespace win32 try updates whitespace another makefile try some update small fix more fixing another build bug a litte hack dmc do not optimize away fix started moving string benchmark to seperate file * to remove cyclic includes some reworking makefile fix whitespace makefile fix veelo corrections veelo strikes again improved donotoptimize away another getpid test sort of builds again some formatting nitpicks more nipicks whitespace whitespace redoing do not optimize away doNotOptimizeAway whitespace format
b937c27
to
8841a53
Compare
@burner, thanks for your PR! By analyzing the annotation information on this pull request, we identified @andralex, @JesseKPhillips and @9rnsr to be potential reviewers. @andralex: The PR was automatically assigned to you, please reassign it if you were identified mistakenly. |
Codecov Report
@@ Coverage Diff @@
## master #2995 +/- ##
==========================================
+ Coverage 88.78% 88.81% +0.02%
==========================================
Files 121 122 +1
Lines 74161 74375 +214
==========================================
+ Hits 65847 66057 +210
- Misses 8314 8318 +4
Continue to review full report at Codecov.
|
superseeded by https://github.com/burner/benchmark |
Basically, Haskell Quickcheck with in build benchmarking for performance regression testing.
IMO phobos is missing something like Quickfix (randomized test data generation). Additionally, it would be nice to use the tests as a way to measure the performance of the functions and the code generation over time.
The benchmark results are written to a file as csv. The program benchmarkplotter.d generates .dat and .gp files for all uniquely named benchmarks. gnuplot then generates a plot of the benchmark values over time. This could than be displayed on dlang.org to show our progress and be transparent about our performance.
make BUILD=benchmark enables all benchmarks. For more info see example in std/experimental/randomized_unittest_benchmark.d
dub: https://github.com/burner/std.benchmark