Skip to content

AHA regression changes: pr_aha = pr_aha[123]

steveri edited this page May 10, 2024 · 2 revisions

Changing how regression suites are specified.

The CI tests that run on each aha-repo push/pull recently passed 8 hours per build, e.g. https://buildkite.com/stanford-aha/aha-flow/builds/9929 . These tests use our "aha regress" command on a set of 26 ish tests comprising a single test suite called "pr_aha" (previously misnamed the "daily" suite).

To improve this time, we split the suite into three parts which can run in parallel, i.e. instead of a single command

aha regress pr_aha  # This takes 7-8 hours

there is now the option of running three commands in parallel

aha regress pr_aha1 &  # This takes 3-4 hours
aha regress pr_aha2 &  # This takes 2-3 hours
aha regress pr_aha3 &  # This takes 2-3 hours

(Note that the original 8-hour option "aha regress pr_aha" is still available for whoMMMever might prefer that.)

The faster 3-in-parallel option is now the default for pull requests, and an example can be seen here: https://buildkite.com/stanford-aha/aha-flow/builds/10295 . You will note that basically, pr_aha2 and 3 run variants of conv2 only, as this app takes about an hour to run all on its own. pr_aha1 runs the remaining apps in the test suite, which combined take about twice the time of a single conv2 run.

To help facilitate this change, we altered the way test suites are defined in the "aha regress " command.

Previously, test suites were specified in the file regress.py ( https://github.com/StanfordAHA/aha/blob/master/aha/util/regress.py ) e.g. if you wanted to change the suite of tests invoked by "aha regress fast", you would modify this code block in regress.py:

    if args.config == "fast":
        width, height = 4, 4
        sparse_tests = [
            "vec_identity"
        ]
        glb_tests = [
            "apps/pointwise"
        ]
        glb_tests_fp = [
            "tests/fp_pointwise",
        ]
        resnet_tests = []
        resnet_tests_fp = []
        hardcoded_dense_tests = []

    elif args.config == "daily":
        width, height = 28, 16
        sparse_tests = [
            "vec_elemadd",
        ...

Now, this code has been offloaded to a separate "Tests" class defined in a separate package regress_tests/tests.py ( https://github.com/StanfordAHA/aha/blob/master/aha/util/regress_tests/tests.py ). The code looks pretty much the same as before, it's just in a different place. That package is now where you would look to add new tests and/or possibly create new test suites.

As a result, 150 lines of code that used to do be in regress.py are replaced by the following:

    # Fetch test-suite from Tests class as 'imported_tests'
    imported_tests = Tests(args.config)

    # Unpack imported_tests into convenient handles
    width, height = imported_tests.width, imported_tests.height
    sparse_tests = imported_tests.sparse_tests
    glb_tests = imported_tests.glb_tests
    glb_tests_fp = imported_tests.glb_tests_fp
    resnet_tests = imported_tests.resnet_tests
    resnet_tests_fp = imported_tests.resnet_tests_fp
    hardcoded_dense_tests = imported_tests.hardcoded_dense_tests

Also, while the original test-suite names still work, the "official" name for some of the suites have been updated to be more descriptive, i.e. "pr" tests ran in response to PR's from aha submods only, while aha-repo pull requests used the "daily" test suite. So the new "official" names "pr_aha" and "pr_submod" are aliases for the old names "daily" and "pr" respectively.

To be clear, at the user level, nothing changes. "aha regress fast" still runs the "fast" test suite as before, and "aha regress daily" still runs daily/pr_aha etc.

Clone this wiki locally