Organize performance benchmarks #95

countvajhula · 2023-03-08T05:23:06Z

Summary of Changes

The SDK has been a bit of a "ball of mud" in that scripts were just added as they became necessary. This PR introduces some order by organizing the modules into:

local benchmarks that exercise individual forms
nonlocal benchmarks that exercise many components at once
module loading

In addition, we're often interested in checking for regression of any of these in relation to some baseline, and also comparison of these in relation to some other language like Racket. Both of these are now implemented using the same regression logic, i.e. "regression" is just the case where we're comparing the performance of the language in relation to itself (for either local, nonlocal, or module benchmarks). Similarly, "competitive" benchmarks are nonlocal benchmarks (only, since competitive isn't well-defined for local and module benchmarks) run independently for each language and then compared for "regression."

The new layout of the SDK is:

qi-sdk
├── info.rkt
└── profile
    ├── loading
    │   ├── loadlib.rkt
    │   └── report.rkt
    ├── local
    │   ├── base.rkt
    │   ├── benchmarks.rkt
    │   └── report.rkt
    ├── nonlocal
    │   ├── intrinsic.rkt
    │   ├── qi
    │   │   └── main.rkt
    │   ├── racket
    │   │   └── main.rkt
    │   ├── report-competitive.rkt
    │   ├── report-intrinsic.rkt
    │   └── spec.rkt
    ├── regression.rkt
    ├── report.rkt
    └── util.rkt

Each category (folder) contains a report.rkt file which is the actual CLI for that benchmarking category. This does result in some duplication, but I think that could be minimized in the future via countvajhula/cli#3.

Some standard features available to all of the benchmarking (via CLI) include the ability to:

configure the output format (e.g. CSV or JSON -- ./report.rkt -f csv)
check regression (./report.rkt -f json > before.json for the reference data, and then ./report.rkt -r before.json)
configure which benchmarks are run (-s <benchmark-name> ...)

The format of the output is just the one used by the github-action-benchmark tool that we use here, so we should now be able to add all the benchmarks there (i.e. nonlocal too).

This is in support of #78 . As we consider compiler optimizations (e.g. in #76), we can define and add nonlocal benchmarks in the appropriate paths for each such candidate optimization, to validate whether the optimization does what we're expecting.

Public Domain Dedication

In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

Have separate scripts to generate reports of different aspects, to preserve separation of concerns. This results in some duplication of configuration, though, which is probably best avoided via some improvements to the cli library. Follow the general pattern of echoing live results to STDERR and actual formatted output to STDOUT.

countvajhula · 2023-03-15T19:58:59Z

This is ready for review if anyone has time to give it a look. If not, no worries as there will be time for final review on the integration branch 🙂

benknoble · 2023-03-22T13:22:25Z

qi-sdk/profile/nonlocal/intrinsic.rkt

+;; uninstalled.
+
+(define-runtime-path lexical-module-path ".")
+(current-load-relative-directory lexical-module-path)


If this file isn't meant to be run as a script, it might be worth using parameterize around the eval calls rather than setting this globally for the current thread (personal pet peeve when modules have unnecessary side effects :) )

Yeah good call, I made the change.

benknoble · 2023-03-22T13:23:47Z

qi-sdk/profile/report.rkt

         [require-data (if (member? (report-type) (list "all" "loading"))
                           (list (profile-load "qi"))
                           null)]
-         [output (append local-data require-data)])
+         [output (~ local-data nonlocal-data require-data)])


I'm not familiar with this ~; is it a synonym for append or output formatting somewhere?

It's the operator described in this blog post, and provided by the relation library 🙂

countvajhula added 11 commits March 6, 2023 16:21

document SDK makefile targets

c8ff0b6

makefile target for performance regression report

dd7051d

Refactor benchmarks to unify form-related ones

32116d8

start to separate form benchmarks from other benchmarks

7c955e8

Use require-latency library instead of measuring load time locally

49a827a

move competitive benchmarks into a separate folder

eaa26cf

update phonies in makefile

87fed46

merge some require forms

3958320

categorize performance modules into intrinsic vs competitive

662a0bc

rename a file for uniformity

25a267a

countvajhula mentioned this pull request Mar 8, 2023

Let's Write a Qi Compiler! #74

Merged

29 tasks

countvajhula added 16 commits March 14, 2023 22:38

begin refactor of competitive benchmarks for uniformity/tractability

f82f1ba

label a todo so it doesn't get lost

e967c1b

standardize nonlocal benchmark names for use via CLI

a4517fa

support selecting specific nonlocal benchmarks to run via CLI

c213a94

standardize flag conventions

9411eaf

reorganize benchmarks as local and nonlocal

09b24c4

continue reorganizing benchmarks..

551b9a1

run nonlocal benchmarks for racket or qi via CLI

7c34773

use regression logic to implement competitive benchmarks

71c638e

respect CLI flags in performance regression reporting

a3f230e

check regression wrt the "after" data to respect narrowed selection

a796dac

update makefile targets and name things consistently

e055609

improve live output in competitive report

ab28a32

cleanup, remove unused imports

0033742

use "local" instead of "forms"

73b5b6b

add back needed import

916b7b9

add nonlocal benchmarks to the performance report

f8f3e8a

benknoble reviewed Mar 22, 2023

View reviewed changes

contain load path parameter to eval where it's needed

e2bcf06

countvajhula merged commit 0d62895 into drym-org:lets-write-a-qi-compiler Jul 21, 2023
1 of 6 checks passed

countvajhula deleted the nonlocal-benchmarks-starter branch July 21, 2023 18:34

countvajhula mentioned this pull request Jul 21, 2023

Charts to visualize performance data #108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organize performance benchmarks #95

Organize performance benchmarks #95

countvajhula commented Mar 8, 2023 •

edited

countvajhula commented Mar 15, 2023

benknoble Mar 22, 2023

countvajhula Mar 23, 2023

benknoble Mar 22, 2023

countvajhula Mar 23, 2023

Organize performance benchmarks #95

Organize performance benchmarks #95

Conversation

countvajhula commented Mar 8, 2023 • edited

Summary of Changes

Public Domain Dedication

countvajhula commented Mar 15, 2023

benknoble Mar 22, 2023

Choose a reason for hiding this comment

countvajhula Mar 23, 2023

Choose a reason for hiding this comment

benknoble Mar 22, 2023

Choose a reason for hiding this comment

countvajhula Mar 23, 2023

Choose a reason for hiding this comment

countvajhula commented Mar 8, 2023 •

edited