Discussion: benchmark package from NPM #3

jgonggrijp · 2021-02-12T17:30:11Z

The benchmark package does (at least) two questionable things:

In order to decide whether the performance difference is significant, it uses the Mann-Whitney U test, which is meant for ordinal scale data, instead of Student's t test, which is more appropriate for ratio scale data such as the performance measurements we're making here.
It recompiles JavaScript code from a string on every test run in order to prevent engine optimizations. Usually we're interested in performance with engine optimizations, since this is how code tends to run in the real world. Besides that, since most of the benchmark code is probably out of reach for this compilation trick, optimization is only partly disabled, producing inconsistent optimization characteristics. Maybe this behavior can be disabled; this is worth investigating.

I can think of a couple of options (from least to most effort):

Accept the quirks and do nothing.
Switch to an alternative benchmark framework that doesn't have these quirks. I'm not (yet) aware of an alternative.
Forego a convenient benchmark framework. Instead, repeat the benchmark code a fixed number of times (say, 10) for each version of Underscore and report each individual result, so that people can compute their own statistics.
Fork the benchmark package, fix the issues, submit a PR. Use our own version regardless of whether the PR is accepted. If not accepted, publish as a separate package on NPM.

m90 · 2021-02-12T18:29:26Z

I'm not the most knowledgeable person when it comes to benchmarks, so I would trust a well-used package much more than the one-off code I would come up with. Your concerns (especially the forceful de-optimization) do sound valid to me however, although they don't seem to be total blockers.

I think would opt for either your first or your last option. As I won't be able to do it, this probably depends on how much time you would like to spend on working on the benchmark package. If there is a fork, using it here instead of the original version should be pretty simple.

I couldn't find any other alternative package when setting this up either. Having testers do their own statistics seems complicated and error prone to me.

jgonggrijp · 2021-03-03T19:52:15Z

I decided to look for alternative benchmark libraries first. Searching for "benchmark" on NPM turned up too many results, most being simple one-offs that haven't seen much use, wrappers of benchmark.js (which would suffer from the same problems), or both. However, I was able to identify a few promising options:

tachometer by the Polymer project is the most elaborate one. It automatically repeats samples and computes confidence intervals. The main drawback is that it specifically targets browsers.
pretty-hrtime is a minimal wrapper around the Node.js builtin process.hrtime. It doesn't have any features except for measuring the running time and displaying it nicely. It also only works in Node.js, but it could possibly fill the void that tachometer leaves there.
@thi.ng/bench provides a very simple benchmarking interface, but nevertheless takes care of warmup, repetition and some statistics. It also works both in the browser and in Node.js, although the time measurement is low-res in the browser. This benchmark probably doesn't need a very high resolution, though. Confidence intervals are missing, but can be added with a few lines of code based on the statistics that the package does provide (i.e., mean and standard deviation).

None of these options run a Student's t test, but to be honest, I rather have no test than an incorrect one. I checked the source code of these packages for silly things like unwarranted de-optimizations, and found nothing of the kind.

@m90 how would you feel about converting the benchmark to @thi.ng/bench? I could provide the code for the confidence intervals.

Update: uubench, though minimal, might also be an option. Source code here.

m90 · 2021-03-04T08:08:42Z

how would you feel about converting the benchmark to @thi.ng/bench?

I just had a cursory glance at the docs and it occurred to me this might not support async benchmark functions (at least it is never mentioned anywhere) which the benchmark this repo is running would require. One could remove that requirement from the original code but then it deviates from its real world use. This might be a blocker, but maybe I am missing something.

About the other options, I don't know. Browser-only seems like a blocker to me.

As we talked about before I don't have too strong of an opinion on this topic, so I would leave choosing and implementing a different benchmark framework / approach up to you. This seems more like a question of personal preferences (which is perfectly fine) than one that really affects the result of the benchmark, and I simply don't have a preference as I don't know enough about the topic.

You are the one who wants others to run this benchmark, so you need to be the one who needs to be content with what is being used.

jgonggrijp · 2021-03-04T11:23:04Z

Oops, totally forgot to check about async. I agree that's necessary. I decided I don't want to invest any time in fixing benchmark.js (at least not at this time), so I guess the option that remains is accepting the quirks.

I'll announce the benchmark in the pull request. Again, thanks for your awesome work!!

jgonggrijp closed this as completed Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: benchmark package from NPM #3

Discussion: benchmark package from NPM #3

jgonggrijp commented Feb 12, 2021

m90 commented Feb 12, 2021

jgonggrijp commented Mar 3, 2021 •

edited

m90 commented Mar 4, 2021

jgonggrijp commented Mar 4, 2021

Discussion: benchmark package from NPM #3

Discussion: benchmark package from NPM #3

Comments

jgonggrijp commented Feb 12, 2021

m90 commented Feb 12, 2021

jgonggrijp commented Mar 3, 2021 • edited

m90 commented Mar 4, 2021

jgonggrijp commented Mar 4, 2021

jgonggrijp commented Mar 3, 2021 •

edited