Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing AVA snapshots v3 #2685

Merged
merged 88 commits into from Mar 13, 2021
Merged

Conversation

ninevra
Copy link
Contributor

@ninevra ninevra commented Feb 14, 2021

Implements a new snapshot file format containing all information necessary to generate the snapshot report file.

Allows the use of test.skip(), test.only(), --match, line number selection, and t.snapshot.skip() during --update-snapshots runs.

Closes #2634 by removing the "Could not update snapshots" case entirely.
Closes #2635 by allowing existing test & assertion selection mechanisms to be used with --update-snapshots.
Hopefully will assist with #1768 and #2099.

Current proposed snapshot file format:

  • Readable prefix: AVA Snapshot v3\n
  • Snapshot version (UInt16LE): 3
  • sha256 checksum of the following
  • gzip compressed data:
    • CBOR encoded data:
      {
        blocks: [
          {
            title: "A test title",
            snapshots: [
              {
                label: "A snapshot label",
                buffer: /* Serialized concordance descriptor */
              },
              ...
            ]
          },
          ...
        ]
      }
      

Notes:

  • The label field is absent unless the user supplied a label. This allows changing the default snapshot label if desired in the future.
  • The buffer field may be absent, if the snapshot has only ever been skipped. Such snapshots are currently rendered as <No Data>.
  • Currently, some data still exists only in the report file, namely the filenames of the test file and the snapshot file. This information can be derived from the name of the snapshot file, but theoretically that could change in the future.

Pros of the proposed format:

  • The snapshot report can now be regenerated from the snapshot file.
    • Enables the use of test selection mechanisms during --update-snapshots
    • Allows snapshot reports to be kept always in order, rather than the previous approach of appending changes at the end.
    • Prerequisite for any feature that edits snapshots without overwriting them entirely.
  • AVA now maintains less byte-manipulation code
  • The snapshot format is easier (though not trivial, because of the gzipping and the header) for third-parties to read
  • Adding new properties either to snapshots, to blocks, or to the file overall should be easy and remain readable by older AVAs (though older AVAs may or may not preserve such unrecognized data when updating the .snap file)

Cons of the proposed format:

  • AVA can't totally guarantee that the snapshot file format is deterministic; some leeway is given to the CBOR encoder. .snap files might show spurious changes e.g. between minor versions of the CBOR encoder.
  • AVA's attack surface is changed. Attackers remain able to submit a malicious .snap file in a pull request, in hope of exploting potential bugs in AVA to exfiltrate repository secrets from CI runs or gain control of developers' machines. The total amount of code exposed to such an attack is significantly increased, and unless we carefully filter the result of CBOR decoding (not currently done), such an attacker can produce unexpected javascript types and values. On the other hand, the binary-parsing code now has rather more eyes on it.

Other changes:

Frontend:

  • Rather than append new tests to the old report, adding new tests without --update-snaphots now maintains declaration-order.
  • When the snapshot report is changed, tests not touched in that run (e.g. removed tests, tests which no longer use snapshots) are moved to the end of the report, maintaining their previous order among themselves.

Edge cases:

  • If --update-snapshots is passed and the only use of snapshots is in discarded t.try() attempts, then the snapshot file is removed. There's no snapshot data to record in this case, so I don't see any reason to keep the file.
  • If t.snapshot.skip(); t.snapshot(); is added at the end of a test, RangeError is no longer thrown; instead a "blank space" is recorded for the skipped snapshot. This is currently rendered in the snapshot report as <No Data>.
  • snapshot-manager now assumes Array.prototype.sort is stable. This is the true in all supported Node versions afaik, and is required by the standard as of ecmascript 2019, but notably is false in node v10. If Array.prototype.sort is not stable, the snapshot report order may not be deterministic.

Backend:

  • The snapshot file is eagerly loaded in the Runner constructor, rather than lazily loaded.
    • Runner always emits a dependency on its .snap file, even in --update-snapshots mode and even if no snapshot assertions are performed.
    • In some very narrow edge cases where the snapshot file has a valid version and checksum but unreadable data, this could result in throwing an unhelpful error. AVA@3 only threw such errors if snapshots were actually used.
  • Introduces a dependency on cbor.
  • Removes a dependency on md5-hex.

TODO:

  • Implement a placeholder snapshot format containing the necessary information
  • Test that snapshot reports can be regenerated from .snap files
  • Allow running --update-snapshots with test.skip(), t.snapshot.skip(), test.only(), --match, line number selection
    • Finish testing this
    • Document this
  • Refactor & clean up code
    • Ordering data should be collected as-it-happens, like skipBlock() and skipSnapshot(), not once at load time
    • Now that the snapshot manager has all the information on skipping tests and snapshots, cleanSnapshots() fits more naturally as part of Manager#save(). But,
    • The above requires that the snapshot manager exist when not used, which conflicts with Runner's lazy-loading behavior, which in turn is relied on by existing Runner unit-test suites.
    • Remove code for handling cannotSave events
  • Debug the performance issues somehow. Are they caused by this PR or preexisting? Are they specific to CI, specific to temp dirs, or neither?
    • Tests were just spawning too many processes in CI. Fixed.
  • Finalize the snapshot file format

This index was used to establish declaration order of beforeEach and 
afterEach hooks. Since these hooks no longer support snapshots, 
associatedTaskIndex is unnecessary.
This index was used to establish declaration order of snapshots with 
ids. Since snapshots with ids are no longer supported, this index is 
unnecessary.
This format contains all the necessary information for regenerating
snapshot reports (test titles, snapshot labels, snapshot ordering) but
isn't designed for robustness of any sort, only ease of implementation.
Once a proof of concept is ready, the format should be revisited.
This requires deleting all the .snap files, as well as modifying 
contains-only.js and then putting it back after.
Still prohibit removing snapshots when any block was skipped.
This enables unit tests to continue using Runners without snapshot 
support, while allowing Runner to use the snapshot manager interface as 
though it was pre-loaded.
Including test.only(), --match, and line number selection.
Better conveys the purpose of the test suite.

Clarifies the PR diff, since the old snapshot-updates is being entirely
removed and replaced.
@ninevra
Copy link
Contributor Author

ninevra commented Feb 16, 2021

Looks like the testing approach I've been using here and in snapshot-removal is very prone to causing timeouts in CI. Tests are taking upwards of half a minute to run in CI vs 3-4s locally. Further, since each suite contains a large number of concurrent tests in a single file, they're all kicked off at once, increasing the chance that no test finishes within AVA's global timeout period.

Most of these tests are creating a temporary directory, copying a fixture, running AVA once to set up some initial state, running AVA again to modify that state, checking that the change matches the expectation, and then cleaning up the temporary directory. (The windows runs are somehow timing out within the first fixture execution, which is very strange.) I'll have to rethink this & hopefully eliminate some of those steps and/or reorganize the tests to run in a timeout-friendly pattern.

@novemberborn
Copy link
Member

  • md5 checksum of the following

Shall we update this to SHA-256? Just to deal with projects that rule out insecure hashes throughout the codebase.

@novemberborn
Copy link
Member

  • Adding new properties either to snapshots, to blocks, or to the file overall should be easy and remain readable by older AVAs (though older AVAs may or may not preserve such unrecognized data when updating the .snap file)

Our only concern should be being able to read snapshots written by previous minor versions (within the same major release).

A test using a new API won't pass when used with an older version either.

@novemberborn
Copy link
Member

  • AVA can't totally guarantee that the snapshot file format is deterministic; some leeway is given to the CBOR encoder. .snap files might show spurious changes e.g. between minor versions of the CBOR encoder.

The CBOR spec itself makes a big deal about determinism, so hopefully this will be rare.

@novemberborn
Copy link
Member

The snapshot file is eagerly loaded in the Runner constructor, rather than lazily loaded.

Why does it need to be loaded eagerly? Could we load it synchronously when needed?

@novemberborn
Copy link
Member

The snapshot file is eagerly loaded in the Runner constructor, rather than lazily loaded.

Why does it need to be loaded eagerly? Could we load it synchronously when needed?

Although, of course, if there is no snapshot file this is a cheap operation, and if there is, it's likely needed. So this is fine, never mind 😄

Copy link
Member

@novemberborn novemberborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When the snapshot report is changed, tests not touched in that run (e.g. removed tests, tests which no longer use snapshots) are moved to the end of the report, maintaining their previous order among themselves.

What happens when tests are skipped? Are those sections moved to the end as well? We should have access to the declaration order.

docs/04-snapshot-testing.md Outdated Show resolved Hide resolved
Copy link
Member

@novemberborn novemberborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to need some more time to review this. Impressive work @ninevra.

@ninevra
Copy link
Contributor Author

ninevra commented Mar 2, 2021

  • When the snapshot report is changed, tests not touched in that run (e.g. removed tests, tests which no longer use snapshots) are moved to the end of the report, maintaining their previous order among themselves.

What happens when tests are skipped? Are those sections moved to the end as well? We should have access to the declaration order.

Tests that are skipped should remain ordered properly. (edit: this is demonstrated by the snapshot for snapshot-workflow/selection.js, "With --update snapshots and test.skip(), other tests' snapshots are updated")

Currently, ordering info is collected for tests that are skipped (or "not selected" by other means) or that use t.snapshot() or t.snapshot.skip() (if used only in a discarded t.try(), I think things get a little inconsistent; I need to look at that case in more detail). Ordering info is not currently collected for tests that are run but don't use snapshots, nor for .todo() tests, nor for hooks.

It's possible to collect ordering info for all tests that are declared, whether they use snapshots or not; I'd add a Manager.prototype.touch(title, taskIndex) method and call it every time a test is declared. It's of course not possible to collect ordering info for tests that have been removed entirely.

The snapshot file is eagerly loaded in the Runner constructor, rather than lazily loaded.

Why does it need to be loaded eagerly? Could we load it synchronously when needed?

Although, of course, if there is no snapshot file this is a cheap operation, and if there is, it's likely needed. So this is fine, never mind smile

Manager now has a substantially larger interface to Runner (compare and save are joined by skipSnapshot and skipBlock), some of which needs to be called before the first snapshot comparison. I initially tried to keep lazyloading, which is certainly possible, but it requires basically shimming the entire Manager interface and replaying calls to skipSnapshot and skipBlock when the real Manager is finally loaded. The duplicated interface was really annoying to iterate on and debug, so I scrapped it.

Plus, in theory eager loading should make the performance characteristics of snapshot-based tests a little more consistent; before, the first t.snapshot() to run would perform sync file io and then parse a potentially-large file, while later assertions would be fast.

  • AVA can't totally guarantee that the snapshot file format is deterministic; some leeway is given to the CBOR encoder. .snap files might show spurious changes e.g. between minor versions of the CBOR encoder.

The CBOR spec itself makes a big deal about determinism, so hopefully this will be rare.

I expect it should be rare, yes. I'm invoking cbor.encode with {omitUndefinedProperties: true, canonical: true}, which should help. Unfortunately the cbor package doesn't yet appear to support "Deterministic CBOR", but if i recall correctly the main difference between that and "Canonical CBOR" is the choice of sort order for map keys, so it shouldn't really make a difference.

  • md5 checksum of the following

Shall we update this to SHA-256? Just to deal with projects that rule out insecure hashes throughout the codebase.

Sounds good to me. I'll hopefully have time to make changes in a day or two.

  • Adding new properties either to snapshots, to blocks, or to the file overall should be easy and remain readable by older AVAs (though older AVAs may or may not preserve such unrecognized data when updating the .snap file)

Our only concern should be being able to read snapshots written by previous minor versions (within the same major release).

A test using a new API won't pass when used with an older version either.

Makes sense.

Should we consider including a minor and/or patch version in the .snap file? Could add it to the readable prefix without issue, keep the existing version field as-is so that AVA <= 3 keep reporting the right error, and add a field version: {major: 3, minor: 0, patch: 0} (or version: [3, 0, 0] or version: "3.0.0") to the top level of the data structure. Then AVA@4.0 could issue a warning (probably not a hard error??) if it encountered a minor version > 0.

The ideal here would be to hook into AVA's global maximum concurrency, so that the total number of tests running across all test files was always at the processor count.

You could do that with shared workers actually.

A shared worker could only control parallelism across the test files where it was loaded, though, right? That is, if it was loaded in snapshot-removal and snapshot-workflow, it could limit their total concurrency to os.cpus().length, but it wouldn't know if AVA was running tests from other suites at the same time? (Shared workers are really cool, but I'm not familiar with the interface yet, sorry.)

Keep docs accurate for AVA 3.

Co-authored-by: Mark Wubben <mark@novemberborn.net>
@novemberborn
Copy link
Member

It's possible to collect ordering info for all tests that are declared, whether they use snapshots or not; I'd add a Manager.prototype.touch(title, taskIndex) method and call it every time a test is declared.

What scenario would this cover? Adding a new snapshot assertion to an existing test which is run selectively? Though that would still order correctly right?

Should we consider including a minor and/or patch version in the .snap file? Could add it to the readable prefix without issue, keep the existing version field as-is so that AVA <= 3 keep reporting the right error, and add a field version: {major: 3, minor: 0, patch: 0} (or version: [3, 0, 0] or version: "3.0.0") to the top level of the data structure. Then AVA@4.0 could issue a warning (probably not a hard error??) if it encountered a minor version > 0.

I don't think the warnings are worthwhile. Either it's a hard error, and then a single major version suffices, or handling in newer versions deals with missing fields. Which is needed anyway to ensure CI passes when a newer (minor / patch) AVA version is installed and snapshots haven't been updated.

A shared worker could only control parallelism across the test files where it was loaded, though, right? That is, if it was loaded in snapshot-removal and snapshot-workflow, it could limit their total concurrency to os.cpus().length, but it wouldn't know if AVA was running tests from other suites at the same time? (Shared workers are really cool, but I'm not familiar with the interface yet, sorry.)

You're right, yes. Controlling overall execution could be an interesting concept, but possibly something that AVA should provide a framework for but not provide out of the box.

The blocksByTitle format is an implementation detail of Manager, whereas
the CBOR data structure is a characteristic of the snapshot file
protocol.

This change reduces codepath duplication between generateReport() and
encodeSnapshots(), and avoids passing blockIndices around everywhere.

If decodeSnapshots() and encodeSnapshots() are later exposed as a
library, this may be closer to the desired interface.
@ninevra
Copy link
Contributor Author

ninevra commented Mar 5, 2021

Switched to sha256 hashes and implemented the touch() method (the latter can be reverted easily if you think it's a detriment).

What scenario would this cover? Adding a new snapshot assertion to an existing test which is run selectively? Though that would still order correctly right?

It covers some narrow edge cases. For instance, if a test had its snapshot assertions removed, it's snapshot data would be retained but, since it wasn't skipped and no longer used snapshots, so the Manager would never learn its order. This woudn't matter until a new snapshot was added to the test file, at which point the report would get regenerated and the first test shunted to the end.

Collecting ordering info for every test also just seems a little easier to reason about.

@novemberborn
Copy link
Member

@ninevra I've pushed some small tweaks, let me know what you think. In the snapshot format I renamed buffer to data, but I think everything else is the same.

Your tests seem very thorough. There's a lot of it though, so it's hard to review! Any tips? Or else let's just ship it 😄

Hopefully will make the snapshot-worker snapshots easier to read.
@ninevra
Copy link
Contributor Author

ninevra commented Mar 12, 2021

@novemberborn all the tweaks look great to me! Especially the formatEntry() code, it looks a lot nicer now.

Your tests seem very thorough. There's a lot of it though, so it's hard to review! Any tips? Or else let's just ship it 😄

I mean, I don't think we should ship code you're not confident in, and I certainly don't intend to pressure you to do that. I am sorry that the diff ended up so massive.

What can I do to make the code and/or tests easier to review/maintain?

Ideas:

  • I've pushed a change to the snapshot-workflow tests that snapshots a before/after diff of the report, rather than just the final report; hopefully that'll make it easier to see what the tests are doing. Feel free to revert it if it's just making things worse.
  • I could consolidate all the snapshot-workflow tests in one file to make them less tedious to look through.
  • I could rewrite some of the test fixtures to use descriptive messages and titles instead of just 'foo' et al placeholders.

@ninevra
Copy link
Contributor Author

ninevra commented Mar 12, 2021

I have a crude-but-working implementation of the shared worker concurrency limiter we discussed; it's on ninevra/ava/snapshot-regenerate-shared-worker-semaphore. I think landing that here would require updating avajs/test to expose plugin.js, though. Without that the plugin can't require('@ava/test/plugin').

Would you be interested in a PR to add semaphores to @ava/cooperate, or a separate plugin? The interface I wrote could use some more work, but eventually it'd be nice to have.

@novemberborn
Copy link
Member

Would you be interested in a PR to add semaphores to @ava/cooperate, or a separate plugin? The interface I wrote could use some more work, but eventually it'd be nice to have.

That'd be perfect for @ava/cooperate!

I think landing that here would require updating avajs/test to expose plugin.js, though. Without that the plugin can't require('@ava/test/plugin').

That'd be fine, except that shared workers are still experimental and subject to breaking changes even in patch releases of AVA. I'm not sure we should be relying on that in our own test suite shipping AVA would break its tests.

@novemberborn
Copy link
Member

  • I've pushed a change to the snapshot-workflow tests that snapshots a before/after diff of the report, rather than just the final report; hopefully that'll make it easier to see what the tests are doing.

It's very meta, but also very helpful. Thanks!

I think I got side tracked by all the fixtures. Once I skipped those it got a lot easier. Great work.

@novemberborn novemberborn changed the title (Re)generate snapshot reports from .snap files Introducing AVA snapshots v3 Mar 13, 2021
@novemberborn novemberborn merged commit 14ec281 into avajs:main Mar 13, 2021
@novemberborn
Copy link
Member

🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants