Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

Discussion: benchmark package from NPM #3

Closed
jgonggrijp opened this issue Feb 12, 2021 · 4 comments
Closed

Discussion: benchmark package from NPM #3

jgonggrijp opened this issue Feb 12, 2021 · 4 comments

Comments

@jgonggrijp
Copy link
Collaborator

The benchmark package does (at least) two questionable things:

  1. In order to decide whether the performance difference is significant, it uses the Mann-Whitney U test, which is meant for ordinal scale data, instead of Student's t test, which is more appropriate for ratio scale data such as the performance measurements we're making here.
  2. It recompiles JavaScript code from a string on every test run in order to prevent engine optimizations. Usually we're interested in performance with engine optimizations, since this is how code tends to run in the real world. Besides that, since most of the benchmark code is probably out of reach for this compilation trick, optimization is only partly disabled, producing inconsistent optimization characteristics. Maybe this behavior can be disabled; this is worth investigating.

I can think of a couple of options (from least to most effort):

  • Accept the quirks and do nothing.
  • Switch to an alternative benchmark framework that doesn't have these quirks. I'm not (yet) aware of an alternative.
  • Forego a convenient benchmark framework. Instead, repeat the benchmark code a fixed number of times (say, 10) for each version of Underscore and report each individual result, so that people can compute their own statistics.
  • Fork the benchmark package, fix the issues, submit a PR. Use our own version regardless of whether the PR is accepted. If not accepted, publish as a separate package on NPM.
@m90
Copy link
Member

m90 commented Feb 12, 2021

I'm not the most knowledgeable person when it comes to benchmarks, so I would trust a well-used package much more than the one-off code I would come up with. Your concerns (especially the forceful de-optimization) do sound valid to me however, although they don't seem to be total blockers.

I think would opt for either your first or your last option. As I won't be able to do it, this probably depends on how much time you would like to spend on working on the benchmark package. If there is a fork, using it here instead of the original version should be pretty simple.

I couldn't find any other alternative package when setting this up either. Having testers do their own statistics seems complicated and error prone to me.

@jgonggrijp
Copy link
Collaborator Author

jgonggrijp commented Mar 3, 2021

I decided to look for alternative benchmark libraries first. Searching for "benchmark" on NPM turned up too many results, most being simple one-offs that haven't seen much use, wrappers of benchmark.js (which would suffer from the same problems), or both. However, I was able to identify a few promising options:

  • tachometer by the Polymer project is the most elaborate one. It automatically repeats samples and computes confidence intervals. The main drawback is that it specifically targets browsers.
  • pretty-hrtime is a minimal wrapper around the Node.js builtin process.hrtime. It doesn't have any features except for measuring the running time and displaying it nicely. It also only works in Node.js, but it could possibly fill the void that tachometer leaves there.
  • @thi.ng/bench provides a very simple benchmarking interface, but nevertheless takes care of warmup, repetition and some statistics. It also works both in the browser and in Node.js, although the time measurement is low-res in the browser. This benchmark probably doesn't need a very high resolution, though. Confidence intervals are missing, but can be added with a few lines of code based on the statistics that the package does provide (i.e., mean and standard deviation).

None of these options run a Student's t test, but to be honest, I rather have no test than an incorrect one. I checked the source code of these packages for silly things like unwarranted de-optimizations, and found nothing of the kind.

@m90 how would you feel about converting the benchmark to @thi.ng/bench? I could provide the code for the confidence intervals.

Update: uubench, though minimal, might also be an option. Source code here.

@m90
Copy link
Member

m90 commented Mar 4, 2021

how would you feel about converting the benchmark to @thi.ng/bench?

I just had a cursory glance at the docs and it occurred to me this might not support async benchmark functions (at least it is never mentioned anywhere) which the benchmark this repo is running would require. One could remove that requirement from the original code but then it deviates from its real world use. This might be a blocker, but maybe I am missing something.

About the other options, I don't know. Browser-only seems like a blocker to me.


As we talked about before I don't have too strong of an opinion on this topic, so I would leave choosing and implementing a different benchmark framework / approach up to you. This seems more like a question of personal preferences (which is perfectly fine) than one that really affects the result of the benchmark, and I simply don't have a preference as I don't know enough about the topic.

You are the one who wants others to run this benchmark, so you need to be the one who needs to be content with what is being used.

@jgonggrijp
Copy link
Collaborator Author

Oops, totally forgot to check about async. I agree that's necessary. I decided I don't want to invest any time in fixing benchmark.js (at least not at this time), so I guess the option that remains is accepting the quirks.

I'll announce the benchmark in the pull request. Again, thanks for your awesome work!!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants