Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for batch compiling all test entry points #2138

Open
mraleph opened this issue Nov 14, 2023 · 13 comments
Open

Add support for batch compiling all test entry points #2138

mraleph opened this issue Nov 14, 2023 · 13 comments

Comments

@mraleph
Copy link
Member

mraleph commented Nov 14, 2023

Currently when doing dart test or flutter test we seem to load *_test.dart files one by one.

Would it be possible to change the implementation of test loading to support batch loading, so that all entry points are loaded together?

The idea here is to reduce compilation overhead: i.e. compile all *_test.dart files together into Kernel using a single CFE process and then use this Kernel blob to run individual test entry points.

/cc @jakemac53 @natebosch
/cc @jensjoha @derekxu16

@natebosch
Copy link
Member

We currently handle test suites independently as soon as we read them. Batching up all tests for a platform before taking any action on an individual suite will require some restructuring of the runner. I don't have any estimate for the work involved.

We do use a FrontendServerClient and I think in principle we can reuse state between compiles, but we currently reset the client between compiles. Does that cause each suite to do the full work of the compile?

@jakemac53 do you recall if we looked at keeping state between suite compiles?

@jakemac53
Copy link
Contributor

Afaik we do provide the previous compilation result when we compile each test, which isn't perfect but generally works pretty well assuming most tests share roughly the same transitive deps.

Grouping the tests into a single compile also means that no tests can run if a single test fails to compile, which could be very annoying during development of new features (it may be intentional that some of your tests can't be ran yet).

@natebosch
Copy link
Member

Grouping the tests into a single compile also means that no tests can run if a single test fails to compile, which could be very annoying during development of new features (it may be intentional that some of your tests can't be ran yet).

This might be most useful as a flag you enable for CI, or maybe for some interactive cases where you expect everything to compile. When the code is being edited, it's more likely that running a single test suite will be more useful than running all test suites with a single compile.

@mraleph
Copy link
Member Author

mraleph commented Nov 15, 2023

Grouping the tests into a single compile also means that no tests can run if a single test fails to compile, which could be very annoying during development of new features (it may be intentional that some of your tests can't be ran yet).

This did come up in our discussions with @jensjoha and I don't think compile time errors preclude batch compiles because you can just prune libraries which contain compilation errors from the dill and get runnable dill. This way you get to have your cake and eat it too: you report all compilation errors and you run all tests without compilation errors.

@jakemac53
Copy link
Contributor

We don't actually read the kernel file at all and there is no public API for doing so, so I am not sure how we could do that. Possibly the frontend server could have a special entry point and handle this logic (as well as reporting exactly which files failed)?

Either way though, I am not currently convinced that the strategy we have right now is insufficient, when I run VM tests locally in real packages I never see any loading messages after the very first one, test compilation is not getting in my way or dominating the overall time to run tests in real world packages that we maintain.

I could see if a package has a single tiny unit test in each file this could dominate more but I don't know that we need to optimize for that scenario (and it should still perform decently anyways).

@mraleph
Copy link
Member Author

mraleph commented Nov 15, 2023

I just did some measurements for @dnfield, so I am going to include them here. If we take test/rendering folder of Flutter package and run all tests from a clean state (flutter clean), on M1:

$ time flutter test -v test/rendering 2>&1 | tee /tmp/testing.log
... 

________________________________________________________
Executed in   20.94 secs    fish           external
   usr time   46.18 secs    0.10 millis   46.18 secs
   sys time    9.84 secs    2.05 millis    9.83 secs
$  cat /tmp/testing.log | grep "Compiling.*took.*" | awk '{t+=substr($6, 1, length($6)-2)} END{print NR, t}'
71 9956

So there are 71 test files and 10s spent in compilation (out of total 20 seconds of runtime).

Now if a manually collapse all tests together by renaming _test.dart files into _t.dart files and creating a single mega_test.dart which imports all of *_t.dart files and invokes all of mains I get the following:

$ flutter test -v test/rendering 2>&1 | tee /tmp/testing2.log
$ cat /tmp/testing2.log | grep "Compiling.*took.*" | awk '{t+=substr($6, 1, length($6)-2)} END{print NR, t}'
1 3736

So this will shave ~6.3 seconds, which is approximately 1/3 of the total running time.

This is on M1 - on GH Actions this translates into something more like 30s of savings of 100s total testing time, because I have observed GH to be 5-6x slower than my M1.

That being said: savings on compilation are just part of the story here. Having mega dill allows to use an isolate group for running tests (spawn test process with mega dill loaded as "root" isolate - then run individual test entry points by spawning isolates from root isolate). This will allow to share all kinds of VM metadata and JITed code. Which would lead to faster tests.

We don't actually read the kernel file at all and there is no public API for doing so, so I am not sure how we could do that. Possibly the frontend server could have a special entry point and handle this logic (as well as reporting exactly which files failed)?

Yep, I would think this is going to be an extension to CFE/frontend server logic.

@jakemac53
Copy link
Contributor

I can't speak to flutter tests, it is a different platform that likely has very different behavior, and also different constraints.

That platform is developed in the flutter repo so we might want to move this issue there?

@mraleph
Copy link
Member Author

mraleph commented Nov 15, 2023

If you think that no changes will be necessary in package:test or package:test_core to fix it on Flutter side then I guess we should just close this one.

I can't easily reproduce any bad behaviour by doing dart test with synthetic inputs (e.g. a bunch of tests which all import analyser package). So I guess it is only flutter test that is somehow suffering from these overheads.

@jakemac53
Copy link
Contributor

If you think that no changes will be necessary in package:test or package:test_core to fix it on Flutter side then I guess we should just close this one.

Well, now that you mention it we probably would need to do something in order to enable batch compilation at all, because today the way platforms work they are only told to "load" (which today involves compiling) individual tests.

So maybe it would need to be a special kind of platform which instead is given all the tests at once to "load".

But, I would start instead by trying to improve upon the existing model (which has other benefits) first, since it does work well for VM tests, it should also be able to work well for flutter tests?

@natebosch
Copy link
Member

Well, now that you mention it we probably would need to do something in order to enable batch compilation at all, because today the way platforms work they are only told to "load" (which today involves compiling) individual tests.

Yes, this is what I was referring to in the first paragraph of #2138 (comment)

Adding support for batch compilation could involve some significant changes to the test runner.

@jakemac53
Copy link
Contributor

jakemac53 commented Nov 15, 2023

I brought this up in another context, but will put it here just to surface it again, one reason we don't compile all tests together is that they don't necessarily support the same platforms (namely, dart: libraries).

So it might be intentional that some tests don't for instance compile for the VM, because they are specifically targetting web (and have dart:html etc imports).

In general I think flutter handles this a lot differently than the normal test runner, so I am not sure if this is relevant or not to flutter, but it does matter for general Dart tests.

@natebosch
Copy link
Member

natebosch commented Nov 15, 2023

one reason we don't compile all tests together is that they don't necessarily support the same platforms (namely, dart: libraries).

Yes, and this will mainly limit where and how we can add batching. We filter which suites get passed to a given platform, so it should be feasible to have a filtered batch to compile.

Edit: Also, as long as the compilation errors are only surfaced when the platform would want to run the test, the filtering shouldn't even need to happen before compile (outside of performance concerns).

@dnfield
Copy link
Contributor

dnfield commented Nov 15, 2023

I wrote a custom plugin that overrides load so that it uses something already compiled. It did not require any changes to package test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants