-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cram-style integration tests #250
Conversation
This finds the correct directory even when the file is sourced
The script manages paths, runs some sanity checks, and runs a diff on the expected output to verify success or failure.
Also - Adds an apalache-jar target that builds the package without tests - Adds a phony promote target that promotes corrected integration test output back into the source .md files - Adds a step to remove the target/ directory
I will run this in parallel with the old integration tests until confirming that it provides comparable and consistent coverage.
f5ba1cb
to
9fd7b4c
Compare
This isn't supported on the diff command that ships with MacOS apprently.
Checking that both integration suites fail as expected.
This reverts commit ada26b1. This change didn't actually cause the expected failure.
This makes it clearer what parts of the process are eating up time.
In 5d5fdbd I triggered a failure, to confirm that the new integration tests catch the failure in CI in parity with the existing integration tests. See https://github.com/informalsystems/apalache/runs/1188178566?check_suite_focus=true#step:9:11 for the logs reporting the failure, showing both how the failures appear in CI (as a diff) and serving as evidence of the failure. So ada26b1 bears witnesses to both integration test harnesses pass under the same condition, and 5d5fdbd bears witnesses to both failing under the same condition. |
This reverts commit 5d5fdbd.
This lets us shadow the apalache-mc binary with an invocation of the dockerized version.
Used to shadow apalache-mc in integration tests
Just ensures that the proper environment is set up before invoking the mdx-tests.
A substantive improvement to the integrations tests would be and explanation in each section describing what is being tested, but I don't have the context to provide that yet, and this PR is already longer than would be ideal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cram tests are much cleaner than the handwritten ones! My only worry is the opam dependency (see the comment).
with: | ||
path: ~/.opam | ||
key: ${{ runner.os }}-opam-4.11.0 | ||
- name: Set up opam |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit worried that we add opam
as a dependency. I had problems with the stability of opam
packages a few years ago, though this may have been improved since then. Are there alternatives, in case we experience a problem with mdx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had problems with the stability of opam packages a few years ago
Did you encounter problems with opam when pinning to a specific version of a package and specific version of the compiler? Or was it with updating dependencies?
though this may have been improved since then
I suspect! Note that Opam 2 was released in 2018, and was a major overhaul.
Are there alternatives, in case we experience a problem with mdx?
I don't know of any alternative software that supports these features of mdx
:
- Extract tests from markdown
- Run single tests in particular sections
- Multiline wildcards
Also, I'm a contributor to mdx
(just some small patches) and I feel confident I could resolve any issues that might arise.
That said, we could try using cram instead. That adds a python dependency (which I more worrisome, but you may find less ;) ), and won't support extracting tests from a markdown file. But if we break all the tests into their own files we could still trigger them in isolation (well, at least each set that lives in its own file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main hesitation with cram
is that it appears to be abandoned: aiiie/cram#32 and it has not had a new release since 2016.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then let's stay with mdx
and merge!
Feel free to merge |
Also adds a flag to enable debugging output to stdout This is a followup to #250 -- I had misconfigured the logger, and so nothing was being logged to the files our stdout.
* Fix logging for integration tests Also adds a flag to enable debugging output to stdout This is a followup to #250 -- I had misconfigured the logger, and so nothing was being logged to the files our stdout.
Closes #192
This PR uses
mdx
for writing and running cram-style CLI integration tests.The tests are embedded in code blocks in a markdown file. Here's an example of what some tests look like:
See https://github.com/informalsystems/apalache/blob/4a01124f3bdf692fbc6ba356e1077676c52531dc/test/tla/cli-integration-tests.md for the full suite of cram-style integration tests, and for the documentation describing how to run and write the tests.
Note on impact on CI build times.
tl;dr; Note that building the
mdx
dependency will add about 7 minutes to the integrations tests in CI the first time they run onunstable
. But it should only add a few dozen seconds to fetch the catch after that, and this saving should be inherited by all branches based onunstable
.This is all according to GitHub's documentation on caching in actions. Once the dependency is cached on the default branch --
unstable
-- the cached artifact should be used by all branches based on the default. This holds for any branch based off of one that has already saved the cache. I tested this in #252, which confirmed that the dependency was fetched from the cache stored from this branch.If it becomes an issue, we can always dockerize the dependency, prebuild a standalone artifact for our test environments, or use a different tool that achieves the same purpose (tho I am unaware of an actively maintained replacement).