-
Notifications
You must be signed in to change notification settings - Fork 31
Deterministic timing for DeferrableTestCase #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sublime's worker thread is a task queue. If you put tasks on it, they will run *ordered*. We can use this to make the `yield` timing deterministic. If we just `yield` (without an argument) we enqueue a task to be run ASAP. That means it will wait for tasks in the queue and then yield back to the test code. E.g. from the test we invoke the unit-under-test which in turn puts task (A) and (B) on the worker. We then `yield` which puts itself as (C) on the queue and waits for it. Sublime will now run (A), (B), and finally (C) ordered. The test runner now yields control back to the test. If (A) in turn enqueued a task (D) on the queue, it will not have run yet, and in fact will not run at all if we not `yield` again. See the test included here. Using this, I could bring down the execution time of the test in timbrel/GitSavvy#1056 from ~7.7s to 2.7s. I could replace all `yield <condition>` expressions with single or double `yield`s, making the test more reasonable. Bc I know that a command e.g. 'hops' two times, I know exactly that I have to `yield` two times. This comes with the big caveat that coverage doesn't work on the worker thread. This is already true right now, see timbrel/GitSavvy#1056 (comment) and a big annoyance for writing functional tests. E.g. as soon as we just issue a command which itself is implemented to run 'async' (t.i. on the worker thread) and then just `yield <condition>` to wait for it, the code under test will not be marked as 'covered'.
|
Do this resolve the coverage issue in GitSavvy? |
|
Um, didn’t read your post carefully at the first glance. Coverage indeed doesn’t work with the sublime worker thread, I don’t have any solution except using |
In the first series of commits we tried to run all `yield` blocks, and by that a lot of the code under test, on the worker thread. This gives us more predictable timing bc a developer can count how many times their code 'hops' and then yield that many times. Generally, the FIFO principle is also true for the main thread. However, choosing the main thread is a bad option. (a) Usually the code under tests defers some work to the worker thread, and we really want to await that work. (b) The main thread runs faster. We choosed the slower thread bc when the worker yields a task, the main thread is usually done. There is one major caveat when we run on the worker thread t.i. coverage doesn't work anymore. This commit introduces a switch which will *globally* for the time of the tests swap `set_timeout_async` with `set_timeout`. By doing that (a) everything runs super fast, (b) every scheduled work/task runs on one thread, the main thread, so the FIFO principle holds and ensures deterministic timing, (c) all code under test gets coverage. This comes with one known caveat: Sublime puts '_async' events on the worker thread on its own, t.i. it doesn't use the patched API. For example, you cannot await such an event by just `yield`ing. You have to use `yield <delay>` or `yield <condition>` as before.
|
I pushed this PR. There is a long explainer in 63b08cc Goals:
I compare this PR and current master. The unit under test are (a) basically GitSavvy/#1056 and (b) the SublimeLinter test suite. Summary: this PR runs against the GitSavvy and the SublimeLinter test suite EDIT wrong numbers for (B) and (C); (B) runs in 4 not 11; (C) runs in 11 not 16 |
|
I want to make another comment. Between (E) and (G). The tests in the GitSavvy PR (E) mostly use With just
With |
|
Wow, thank you for working on this matter. I currently don't have enough time to review this. Does @Thom1729 want to help? |
|
I can probably take a look this weekend, or possibly tomorrow. I am concerned about patching the |
|
If the goal is to choose def defer(self, delay, callback, *args, **kwargs):
if self.controlled_timing:
schedule = sublime.set_timeout_async
else:
schedule = sublime.set_timeout
schedule(partial(callback, *args, **kwargs), delay) |
|
No that's not really the goal. There are three goals see above #146 (comment) To get all three is tricky. Personally, from a testers standpoint I would have chosen,
E.g. you place a cursor somewhere, you know this will trigger Now, okay, the tests run fast, but you will have such For GitSavvy, I think I saw such outdated events, and in some sense they make the code better bc after the test I close the view, and so the events runs with a detached view (e.g. Still, if we want turn based tasks, there is no concern to mock/stub out the main en-queuing function (here |
|
So is the purpose of patching Notwithstanding the above, patching the I recognize the coverage issue. I would suggest that breaking the code into smaller pieces might make it easier to unit-test those pieces without worrying about async issues. Alternatively, you could use dependency injection — pass a |
|
We should explore the design space we have here. Your 'concerns' are very broad and general opinions. We have at least two problems/issues to solve: a) The tests are too slow. (E.g. the GitSavvy tests: 9 tests in 7 seconds doesn't scale, and you basically can't practice TDD in a meaningful way.) b) The coverage is completely off. (The workaround would be to write tests for the coverage not around the features you actually want to ensure and protect.) As a plus, there is a third issue. This is a reasonable issue bc we already have a DefferableTestCase with c) Plain For this PR:
|
|
As you say, it's tricky to try to solve all of those problems at once. So let's look at them one by one:
Why are the GitSavvy tests slow? Would a less drastic solution make them faster? (For instance, using set_timeout in the runner instead of set_timeout_async, but not patching
This is a general problem with However, you could achieve the same result by having an explicit switch inside GitSavvy that chooses which function to use. This would be much safer than patching the shared
This sounds less like a problem and more like a place that a feature could be added. I'm not sure that this is really related to the other questions. I don't mean to be dismissive, but I don't think that we can "create some confidence over time" for monkeypatching the |
|
What do you actually think 'patching sublime' does here? It is during a test. You also mock getters like The SublimeLinter tests are just as slow, and these are more-or-less narrow unit tests for the backend. They don't involve drawing at least. How many programs do you have at hand that actually observe on what thread or process they run? If a function offloads some tasks A and B, the function usually falls off and terminates after that and actually doesn't care if A and B run now or later or at all. And you basically use that in unit tests all the time. The functional test delegates/'offloads' to tasks, but a different unittest can simply invoke such a task on the main thread. E.g. usually you can just call In a unit test, you usually forbid and catch (aka patch) Important note: the runner currently uses |
Mocking is one thing; patching the real objects is something else.
I'm not familiar with the SublimeLinter tests. I can say that the 171 sublime_lib tests just ran in 14.237 seconds. This includes a lot of tests that open windows, write files, and perform other relatively slow operations. Looking at the diff, I'm not sure that the performance gains you saw were actually due to patching |
|
Yes, of course, the performance gains come from changing the timings in the Changing the unit tests in GitSavyy or SublimeLinter has only some limited effect. I used in the example above SublimeLinter's For comparison since you mentioned sublime_lib. sublime_lib maybe has a handful |
|
Right now, Running all of the Based on these results, it seems like a very simple and unintrusive change will produce the performance gains you're looking for. I don't see why this change should break any tests unless those tests relied on race conditions. Can you reproduce these results? |
|
To be honest, I didn't read the whole thread but I am excited to see contributions to UnitTesting. @kaste feel free to merge any PRs to UnitTesting at your will. I don't have enough time to take care of all the projects. |
|
@randy3k Thanks for the kind words, but you actually didn't gave me push rights and I don't want to push something without some consensus. I already rolled back the more controversial changes. For now, I only want to make sure a user can make such drastic changes as patching So the current state would be:
@Thom1729 Optimizing the timing produces flaky and then failing tests in GitSavvy. Now, that's because GitSavvy's code is shitty, but most Sublime plugins are shitty. And these would be super annoying failures bc maybe a project gets a PR and suddenly some tests fail, and we can't even produce good fail messages for it. So the first draft of this PR was not failing for the usual test but of course as a drastic change would have been opt-in anyway. Okay, since this is breaking, I thought I add another option |
|
What is the purpose of To clarify, do the GitSavvy tests fail if you I think that we should start with the smallest possible change rather than trying to do it all in one PR. In this case, I think that the smallest possible change would be changing the 10ms delay to zero. We can make this an option if necessary. (If it causes tests to fail, that probably means that the behavior was not correct and that the tests should fail; however, this could also be interpreted as a breaking change that should wait for a major release.) I'm not sure that changing the condition timeout is necessary, or changing it quite that much. This part, at least, is already user-customizable. I can push changes when they're ready, if @randy3k approves in principle. |
|
@randy3k I didn't received any invitation to join that community. I only care about unittesting anyway, I think :-) @Thom1729 The title of the PR is 'Deterministic timing', just changing numbers and make it faster is not my goal, it's a side-effect of it. I want some meaningful timing here. I initially wanted single-threaded execution of the unit tests, now this PR is about allowing single-threaded execution, and I have a test now in this PR that ensure some properties when a user opts-in to single threaded testing. Changing the timing is a breaking change. It breaks more the faster you get. (Unless you constrain yourself to single-threadness. That's what I did but you're against it.) You start flaky, and then if you still lower the timing you get again consistent fail-behavior. Why is that? You want async test execution. If you do that you get synchronization problems, and the system under test only gets eventually consistent. That's why we Now in fact it seems For GitSavvy, we don't have 'pure' tests but feature tests which actually do require waiting time, bc Sublime must draw a view or panel etc. We start here with 7s, and already know that we can do the same in 2.4s bc I proved that above. (But the 2.4s have constraints.) If the math is roughly correct unittesting master has an implicit Is changing a timing from 20 to 10 [ms] the same as changing it from 10 to 0? Usually not, 0 should usually put the task as immediate in the queue. T.i. if you put it with 0, you actually add it to the end of the current turn, and the system doesn't go idle (or yields back to the scheduler) for a short period. (Which very likely means async events do not get handled.) Generally, if you have less waiting time it is more likely that events or tasks waiting in the worker queue leak into the next test. |
Sublime's worker thread is a task queue. If you put
tasks on it, they will run ordered.
We can use this to make the
yieldtiming deterministic.If we just
yield(without an argument) we enqueue a taskto be run ASAP. That means it will wait for tasks in the
queue and then yield back to the test code.
E.g. from the test we invoke the unit-under-test which in turn
puts task (A) and (B) on the worker. We then
yieldwhich putsitself as (C) on the queue and waits for it. Sublime will now run
(A), (B), and finally (C) ordered. The test runner now yields
control back to the test.
If (A) in turn enqueued a task (D) on the queue, it will not have
run yet, and in fact will not run at all if we not
yieldagain.See the test included here.
Using this, I could bring down the execution time of the
test in timbrel/GitSavvy#1056 from ~7.7s
to 2.7s. I could replace all
yield <condition>expressions withsingle or double
yields, making the test more reasonable. BcI know that a command e.g. 'hops' two times, I know exactly that
I have to
yieldtwo times.This comes with the big caveat that coverage doesn't work on the
worker thread. This is already true right now, see
timbrel/GitSavvy#1056 (comment)
and a big annoyance for writing functional tests. E.g. as soon
as we just issue a command which itself is implemented to run
'async' (t.i. on the worker thread) and then just
yield <condition>to wait for it, the code under test will not be marked as 'covered'.