Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New watch mode: only re-run failing tests until they pass #1033

Closed
ngryman opened this issue Sep 6, 2016 · 10 comments
Closed

New watch mode: only re-run failing tests until they pass #1033

ngryman opened this issue Sep 6, 2016 · 10 comments

Comments

@ngryman
Copy link

ngryman commented Sep 6, 2016

Hi,

I would like to open a discussion around a new feature that might be interesting to use during development.

Usually when you add a new feature or fix a bug on a project, you add a new test, watch it fail and then code until it finally passes. This is tdd, nothing new. During this process, you are likely to be interested only in this particular test for several reason:

  • Your whole test suite takes time to run.
  • It's not relevent to run all tests while coding/saving.
  • You want to debug/trace only this test and shut down other tests noise.

So you end up adding/removing only modifiers manually. It can be quite frustrating sometimes and would be awesome if somehow, ava was smart about it.

watch mode is already half smart as it only re-runs tests included in the test file you modify, which is a huge improvement in d2d development workflow 👍 Still I think we can push things further and make it a bit smarter by adding a more aggressive test isolation algorithm that would only re-run previously failing tests until they pass. I don't know if it should be part of the watch default behavior or via another cli flag (i.e. --smart-watch 👓). But I know I would love this feature 😍.


Here is pseudo-code of the smart-watch algorithm, not related to your implementation:

const allTests = 'an array of all collected tests'

on 'file change event'
  let file = 'the file that has changed'
  let tests = 'an array of tests to run'
  tests = filter(allTests, 'failing')
  if no tests
    tests = allTests
  if file is 'a test file'
    tests = filter(tests, 'who belongs to file')
  run(tests)

Basically it only re-runs tests that are failing, falling back to running all tests. It does not change the actual watch behavior.
This would involve keeping track of t-1 failures, so we can know at t which tests have failed. I'm not familiar with ava internals but I guess it can be done quite easily.

What do you think guys?

@jfmengels
Copy link
Contributor

jfmengels commented Sep 7, 2016

Hi @ngryman!

Thanks for the proposal!

I'm not sure that this is a good idea though, as you would lose the information/feedback about whether your other tests (in the same file or the files affected by the modified file).

If you only re-run a previous failing test, then you might break 10 other tests without knowing about it.

@ngryman
Copy link
Author

ngryman commented Sep 7, 2016

Hey @jfmengels,

In the algorithm I proposed, all failing tests are re-run, so if you break things you'll know. Let's call this set of tests a scope. If you break tests out of the scope, you'll know it when you fix all tests in the current scope. They end up being run, as the algorithm run all tests when no test is failing in the current scope.

Not all tests have a direct relationship to each other. Directly related tests are often located in the same file and mostly fail fast if you break something. So there are chances that the first batch of failing tests will be the only one.

I know this proposal seems weird to read, and my english does not help, but when you think about it, it's exactly what you do manually. You rarely want all your test suite to be run on each file save. You isolate tests you want to focus on. And when they pass, you check that the whole test suite passes and you didn't break anything at a larger scale.
That's some kind of local ci, I don't know the term :bowtie:

Another way to define my proposal would be to implement an evolutive test scope: test current file first, test failing first.

Let's illustrate with a typical scenario:

Modification Test 1 Test 2 Test 3 Test 4 New test
Add test
Fix 1
Fix 2
Fix 3
Fix 4

√: Pass
✗: Fail
∅: Skip

@jfmengels
Copy link
Contributor

and my english does not help

You should not worry about that, it's good enough ;)

Okay, I think I may have read the proposal a bit too fast and skipped over the fact that tests are (almost) all re-run when the tests pass.

You still won't get the feedback that you broke other tests as long as you don't fix all the previous failing tests. In case you start with a lot of failing tests or a very tricky one, you might break a lot of things before you notice and end up not knowing which parts broke the other tests.

I'm not sure that's a very nice DX. I think that after a few minutes I'd get stressed out and re-run AVA just to make sure nothing else broke, but that might just be me.

There's also an issue with the algorithm. You only re-run the tests that failed, even if all previously failed tests pass this time. That means that you'd have to re-run AVA a second time to know which previously passing tests will now break. So in essence, this would be the workflow you'd get (Nice way of presenting it btw):

Modification Test 1 Test 2 Test 3 Test 4 Test 5 Comments
Add test
While test 5 fails
When test 5 gets fixed Other tests are not re-run, because we don't know when running tests that Test 5 will be fixed
First save after test 5 is fixed
While test 2 & 3 fail
When test 3 gets fixed
While test 2 fails
When test 2 gets fixed
Subsquent runs

√: Pass
✗: Fail
∅: Skip

This will be the scenario, unless AVA re-runs tests following the results of the tests, but then this becomes a pretty big change in how AVA works. Should AVA re-run everything when the failed tests are fixed, then you get to the scenarios you described.

(Just FYI, I'm used to writing a lot more tests not dealing with I/O / HTTP, so my tests all run very fast, and don't end up using .only that much (really only when I console.log a lot of things and it's getting hard to find the ones related to my failing test))

@ngryman
Copy link
Author

ngryman commented Sep 7, 2016

Thanks for your time and feedback man.

There's also an issue with the algorithm. You only re-run the tests that failed, even if all previously failed tests pass this time.

First I should have precised that while writing the second comment, I slightly changed the algorithm. If all of the tests in your scope pass then all the other tests are also run in the same batch. It avoids to manually get ava to re-run all tests.

So your table becomes again:

Modification Test 1 Test 2 Test 3 Test 4 New test
Add test, all run, test 5 fails
While test 5 fails, run test 5
When test 5 gets fixed, run all, test 2 & 3 fail
When test 3 gets fixed, run 2 & 3, test 2 fail
When test 2 gets fixed, all run, all pass

You still won't get the feedback that you broke other tests as long as you don't fix all the previous failing tests.

Yes and for me that's the whole point 😃! And that's already the case with the current watch implementation anyway, I'm just reducing the scope.


If I may explain a bit more why I think this feature would be great, instead of going down to the implementation itself.

I think we should separate test driven development and integration testing. When you add a new feature or fix a bug on your machine, you use watch and you are in tdd mode. You write your test and focus only on making it pass. You make a lot of code changes, want debug, traces, ... That's the first step. That's the 3 laws of Uncle Bob.

Then when you're satisfied with your implementation and your unit test pass, comes the integration testing phase where you sanity check that your modifications to the codebase did not add regressions. Most of the time it should be ok.
Back to the old days or for huge projects, integration testing phase was often defered to pre-commit or even only run remotely on a ci server because of compilation time or test suite time. That's the second step.

The mode/optin I'm ✊ for is for tdd. It's made precisely to only test what is directly related to your feature/bug and discard on purpose the rest while this precise test doesn't pass.

@novemberborn
Copy link
Member

Only skimmed the discussion so far, so my apologies if my summary is wrong.

Sounds like the suggestion is to prioritize rerunning the known failing tests, as well as any new test files. Track other tests that ordinarily would have rerun (due to source changes). Run those once there are no more failing tests.

Right now that would require a new test run, though ideally we'd add to an existing test run. So there's some architectural limitations there.

@ngryman
Copy link
Author

ngryman commented Sep 24, 2016

@novemberborn Yes, my proposition was to only re-run already failing tests until they pass. If there are technical limitations on this, I can understand.

I had 3 requirements to satisfy:

  1. Be able to focus on the current test or set of tests without extra noise.
  2. Optimize the time to first failing test (TTFFT).
  3. Be able to debug easily without having to play with the only modifier manually.

I guess 3. is optional and quite complex to achieve in the sense that it would require some sort of intelligence. That's probably totally out of scope of a test runner.

But I think 1. and 2. may be achieved with minimal modifications. We could prioritize failing tests to run first, then the others. It would allow to reduce drastically TTFFT (2.) and used in fail-fast mode achieve 1..

A bonus would be to prioritize failing tests by number of failures. If a test fails since 10 runs, it's less important that a test that just broke.

@novemberborn
Copy link
Member

@ngryman I like it!

The watcher is already quite complex; to be able to do what you're suggesting I think we need to do a fair bit of refactoring.

My thinking is that we should be able to start a test run and add more files to it before it finishes. That's a problem because currently (when not using the --concurrent option) no tests are run before all workers have spun up, but that's something we need to move away from regardless.

That way the runner can prioritize files with failing tests, while maintaining a second set of non-failing test files that should be rerun.

If we could reliably identify individual tests (e.g. if they have a unique title) we could prioritize those over other tests in the same file.

@sindresorhus
Copy link
Member

(e.g. if they have a unique title)

This should not be a problem as we now enforce tests to have unique titles.

@novemberborn
Copy link
Member

When writing integration tests, it can be tempting to run them serially, with an assumed order. This means you can't just run one of the tests. The proposed behavior would break watch mode for those users, but I think that's fine.

@novemberborn
Copy link
Member

AVA now runs failing tests first, and we do dependency analysis. I think that's good enough.

@novemberborn novemberborn closed this as not planned Won't fix, can't repro, duplicate, stale Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants