Skip to content

Conversation

@kaste
Copy link
Contributor

@kaste kaste commented Dec 10, 2018

Closes #146
Closes #147
Closes #148
Closes #149

Because in the end #146 and parts of #147 are breaking, we need to put them behind a feature flag. I used fast_timings here. The default is for obvious reasons 'False'. I recommend to deprecate the old DeferringTextTestRunner because it has no advantage over the new one.

I back-ported as many features as possible to the old one though.

All users get:

Users opting 'fast_timings' get:

  • Super-fast tests 😁
  • yield condition will not continue but err on timeout
  • For functional test, allow running single-threaded. This will make coverage happy, and allows ticking the time using plain yield. I highly recommend using this if the app/plugin you're building allows it.

Notable fix

  • add/doCleanups works

Why is the system now so fast?

We do not add implicit waiting time anymore.

What to do if the test fail after opting 'fast_timings'?

Essentially, add explicit waiting time using one of the yield forms. E.g. for the failing GitSavvy tests (discussed in #146) it is enough to yield AWAIT_WORKER on tearDown. This is reasonable, and it doesn't make the tests as slow as before. The runtime of the GitSavvy test is still cut in half. (Note that AWAIT_WORKER is introduced by this PR, you usually used number guessing like yield 300 et.al. before.)

kaste added 13 commits December 7, 2018 00:25
If a user switches from test file to implementation file, it
is useful that 'Test Current File' remembers and executes the
actual last test file run.

Changes:

- We actually match the filename against `settings['pattern']`.
  Not all files with a '.py' suffix are test files.

- We store the last used test file in the `window.settings()`
  which is cheap to implement and probably the right thing.
# Conflicts:
#	unittesting/core/st3/runner.py
@kaste
Copy link
Contributor Author

kaste commented Dec 10, 2018

We could add this to the DeferrableTestCase as a convenience fixture:

def run_single_threaded(self):
    original = sublime.set_timeout_async
    sublime.set_timeout_async = sublime.set_timeout
    self.addCleanup(partial(setattr, sublime, 'set_timeout_async', original))

@randy3k
Copy link
Member

randy3k commented Dec 11, 2018

Thanks. This looks more promising.

@kaste
Copy link
Contributor Author

kaste commented Dec 11, 2018

We now have three options 'deferred', 'async', and 'fast_timings'. I think we can deprecated all of them. 'deferred=False' is slower than the current 'deferred=True' and has really limited functionality. (Since these are global switches, it is unlikely that a Sublime plugin never wants to wait for a Sublime doing something.)

'async=True' is not the way forward either. I can imagine that we run the framework on a separate thread, but again the feature set of the deferred TestCase is superior. But running the code under test on a 'third' thread which then delegates to Sublime's threads ... don't think that makes a lot of sense. As a tester you usually don't want more async, you want less.

run_on_worker = sublime.set_timeout_async


class FastDeferringTextTestRunner(TextTestRunner):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it different from DeferringTextTestRunner except DEFAULT_CONDITION_POLL_TIME?
It seems that they are almost identical.

@kaste
Copy link
Contributor Author

kaste commented Dec 12, 2018

You need to diff them, really

diff --git a/unittesting/core/st3/runner.py b/unittesting/core/st3/fast_runner.py
index dff1fbf..3785aa8 100644
--- a/unittesting/core/st3/runner.py
+++ b/unittesting/core/st3/fast_runner.py
@@ -10,13 +10,15 @@ def defer(delay, callback, *args, **kwargs):
     sublime.set_timeout(partial(callback, *args, **kwargs), delay)


+DEFAULT_CONDITION_POLL_TIME = 17
+DEFAULT_CONDITION_TIMEOUT = 4000
 AWAIT_WORKER = 'AWAIT_WORKER'
 # Extract `set_timeout_async`, t.i. *avoid* late binding, in case a user
 # patches it
 run_on_worker = sublime.set_timeout_async


-class DeferringTextTestRunner(TextTestRunner):
+class FastDeferringTextTestRunner(TextTestRunner):
     """This test runner runs tests in deferred slices."""

     def run(self, test):
@@ -48,7 +50,7 @@ class DeferringTextTestRunner(TextTestRunner):
                     startTestRun()
                 try:
                     deferred = test(result)
-                    defer(10, _continue_testing, deferred)
+                    _continue_testing(deferred)

                 except Exception as e:
                     _handle_error(e)
@@ -61,10 +63,10 @@ class DeferringTextTestRunner(TextTestRunner):
                     condition = deferred.send(send_value)

                 if callable(condition):
-                    defer(100, _wait_condition, deferred, condition)
+                    defer(0, _wait_condition, deferred, condition)
                 elif isinstance(condition, dict) and "condition" in condition and \
                         callable(condition["condition"]):
-                    period = condition.get("period", 100)
+                    period = condition.get("period", DEFAULT_CONDITION_POLL_TIME)
                     defer(period, _wait_condition, deferred, **condition)
                 elif isinstance(condition, int):
                     defer(condition, _continue_testing, deferred)
@@ -73,7 +75,7 @@ class DeferringTextTestRunner(TextTestRunner):
                         partial(defer, 0, _continue_testing, deferred)
                     )
                 else:
-                    defer(10, _continue_testing, deferred)
+                    defer(0, _continue_testing, deferred)

             except StopIteration:
                 _stop_testing()
@@ -82,21 +84,29 @@ class DeferringTextTestRunner(TextTestRunner):
             except Exception as e:
                 _handle_error(e)

-        def _wait_condition(deferred, condition, period=100, timeout=10000, start_time=None):
+        def _wait_condition(
+            deferred, condition,
+            period=DEFAULT_CONDITION_POLL_TIME,
+            timeout=DEFAULT_CONDITION_TIMEOUT,
+            start_time=None
+        ):
             if start_time is None:
                 start_time = time.time()

             try:
                 send_value = condition()
             except Exception as e:
-                defer(10, _continue_testing, deferred, throw_value=e)
+                _continue_testing(deferred, throw_value=e)
                 return

             if send_value:
-                defer(10, _continue_testing, deferred, send_value=send_value)
+                _continue_testing(deferred, send_value=send_value)
             elif (time.time() - start_time) * 1000 >= timeout:
-                self.stream.writeln("Condition timeout, continue anyway.")
-                defer(10, _continue_testing, deferred)
+                error = TimeoutError(
+                    'Condition not fulfilled within {:.2f} seconds'
+                    .format(timeout / 1000)
+                )
+                _continue_testing(deferred, throw_value=error)
             else:
                 defer(period, _wait_condition, deferred, condition, period, timeout, start_time)

@@ -154,4 +164,4 @@ class DeferringTextTestRunner(TextTestRunner):
             else:
                 self.stream.write("\n")

-        sublime.set_timeout(_start_testing, 10)
+        _start_testing()

@kaste
Copy link
Contributor Author

kaste commented Dec 12, 2018

We have quite a few places where we just continue sync. And we use '0' often which has different semantics bc it will also continue immediately after other immediate tasks have been run. ('0' will probably not yield back to Sublime's task scheduler but just append this task to end of the current loop.)

kaste added a commit to SublimeLinter/SublimeLinter that referenced this pull request Dec 13, 2018
@randy3k
Copy link
Member

randy3k commented Dec 13, 2018

You are copying FastDeferringTextTestRunner instead of subclassing it from DeferringTextTestRunner. Is it because you want to deprecate DeferringTextTestRunner eventually?

This whole PR looks good to me in general except that I am really reluctant to have FastDeferringTextTestRunner an almost identical clone of DeferringTextTestRunner.

defer(condition, _continue_testing, deferred)
elif condition == AWAIT_WORKER:
run_on_worker(
partial(defer, 0, _continue_testing, deferred)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once a TestCase yields to AWAIT_WORKER, the whole test case will be executed in the worker thread.
Should we send the coroutine back to the main thread like this?

partial(defer, 0, lambda: sublime.set_timeout(_continue_testing), deferred)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already do the double-hop here to get it back on the main thread.

This line evaluates to

timeout_async(lambda: timeout(lambda: continue))
^^^ await worker      ^^^ hop back    ^^^actually continue

but this quickly gets confusing, so we may need a test here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I missed the point that defer is really set_timeout.

@kaste
Copy link
Contributor Author

kaste commented Dec 13, 2018

Subclassing is the wrong approach here. We don't want to maintain two or three versions of it. We actually ship the old version because Package Control doesn't do the semantic versioning right.

As I see it, usually you would tag it 2.0, and people would just stick with their 1.x until they're ready. But we don't have that in Sublime land. So we basically ship two versions. (That's the same thing as when IE10 (InternetExplorer) ships actually a version of IE8 and IE9.)

If we subclass, we can never guarantee that we do NOT break the old version.

@kaste
Copy link
Contributor Author

kaste commented Dec 13, 2018

To be honest, backporting some features to the old Runner was maybe a complete waste of my time.

@randy3k
Copy link
Member

randy3k commented Dec 13, 2018

Ok. I argue we should only keep one version of the runners. It could be done by

  1. adding settings to specify DEFAULT_CONDITION_POLL_TIME, DEFAULT_CONDITION_TIMEOUT. Say condition_poll_time and condition_timeout.

  2. adding a setting to control how long the coroutine is deferred, yield_timeout (default 10). If yield_timeout is 0, it implies the use of

_continue_testing(deferred, throw_value=e)

instead of

defer(10, _continue_testing, deferred, send_value=send_value)

If you agree and don't want to get your hands dirty. I might be able to find some time this weekend to implement the above idea.

@kaste
Copy link
Contributor Author

kaste commented Dec 13, 2018

A plain yield does a timeout(fn, 0) this is not the same as calling fn() directly. It appends fn to the end of the current run and awaits the task queue.

In the PR I sometimes change defer(10, fn) -> defer(0, fn), but sometimes I do defer(10, fn) -> fn(). Note that DeferrableTestSuite often uses plain yield which I do not tackle right now. (I hope we can remove them completely in a next version.)

Also, I think these explosion of settings confuse users. There is actually no continuum of 0..10ms yield time. 0 has different semantics. For the poll time, you want it low but you probably don't want it to be faster than 60hz, so I took 17ms.

I don't think users should experiment with different settings here. They should switch one setting and get what we have right now plus what we might have later in DeferrableTestSuite.

Now, of course, I tried different things. But adding 10 if statements in one function, doesn't cut it, you need different implementation via subclassing or functional vodoo. I experimented a bit here, both subclassing and functional composition, in then end and to be honest it felt like Operette.

The code here is already difficult to reason about (and for lesser experienced readers even more so), sharing reusable bits makes it more complicated bc you have more delegation and late binding. We usually oscillate between DRY and KISS, and here with that code at hand, it is not simple enough to make it DRY.

I hope you have a really good idea.

try:
send_value = condition()
except Exception as e:
_continue_testing(deferred, throw_value=e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help me to understand. Why we cannot do defer(0, _continue_testing, deferred, throw_value=e)?

Copy link
Member

@randy3k randy3k Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, was it because of the coverage issue? I believe coverage would be ok with the sublime.set_timeout

Or it is because defer(0, ...) is slow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, you're trying to modify the behavior to make the generalization/abstraction easier. That's cheating.

Now, sure both runners look similar but the execution flow is completely different. For sync programs, we can look at conditional branching (e.g. ifs) or method delegation to get the 'flow'. But this is async code, only matters where we run and when. The code we're looking at is an 'executor' for a (test) program.

Here I want fn() behavior which is sync and blocking, defer(0, fn) is not sync but cooperative (in the sense that other tasks in the queue can run before fn) but still blocking (in the sense that all these tasks marked as to-be-run-immediate run in a chain without gaps). defer(10, fn) is not sync and not blocking.

Please look at the DeferrableTestSuite in suite.py. We should remove all plain yields in run(). What is your preferred way of writing this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might come down to our different expectations about synchronization of the test cases.
The major difference between the original runner and the new runner is whether we run the next test function/ test case in a deferred manner. It is also related to the plain yields in DeferrableTestSuite.

@@ -48,7 +50,7 @@ class DeferringTextTestRunner(TextTestRunner):
                     startTestRun()
                 try:
                     deferred = test(result)
-                    defer(10, _continue_testing, deferred)
+                    _continue_testing(deferred)

When I first wrote it, I thought it would be just easier to defer everywhere. I have to admit that it may be not very efficient and strictly necessary. The idea here is to make sure the UI has time to respond and the tests won't block the UI. The name of the runner also suggests that it runs tests in a deferring manner. This is not guaranteed that the code will be synchronous.

I understand your point of keeping the code clean and simple. But if we look at the diff of the original runner and the fast runner, there are actually not many lines which are different. It is hard to imagine why we should not refactor the runners to make it more customizable. The real issue here is whether we should defer between tests. The setting fast_timings actually does not explain explicitly what it is being faster. Perhaps we need a more specific setting to control whether the coroutine is executed asynchronously and by how much. I am still a bit leaning towards defer_timeout with the understanding that defer_timeout = 0 means fn() rather than timeout(fn, 0) in some of the lines.

The actual operation is actually hidden from the users. For the users, that number just means the number of seconds to wait. 0 just means "do not wait".

If you are worried about breakage, you have already protected yourself by writing the corresponding tests.

'Condition not fulfilled within {:.2f} seconds'
.format(timeout / 1000)
)
_continue_testing(deferred, throw_value=error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

else:
self.stream.write("\n")

_start_testing()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do defer(0, _start_testing)?

else:
self.stream.write("\n")

_start_testing()
Copy link
Member

@randy3k randy3k Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could replace the function calls above by defer(....), it is a strong reason that we should use the setting yield_timeout.

@kaste
Copy link
Contributor Author

kaste commented Dec 14, 2018

I'm not sure if we're discussing the features or the implementation here.

Now, I think that this PR introduces a feature set that is superior and preferable to the old version. I expect all users switch to the new version. So the implementation strategy is driven by that. It protects the old implementation to never break if we never touch it. The new 'runner' is basically as messy or cleanly written as the old one, and the only one we further improve.

Saying that, we can rewrite the new runner in a clean way without the risk to break the old one in probably subtle ways. We can also change our mind and doing the abstraction later because applying an abstraction is easier than reverting one.

Completely different, when you think that users make a reasonable (not lazy) choice for the old implementation because it is superior for UI testing and for example leads to fewer flaky tests, we can sure implement both as 'strategies'.

So that's the first point: I think the new implementation will replace the old one. The copy approach protects breaking changes.

Now, why this feature set? Which features?

Now I again must remind anyone that I started the PR with the title 'deterministic testing'. The speed comes for free.

Choosing the runner is actually an implementation detail but the option a user can switch applies globally to the whole test suite or package.

Current symptom: If you opt-in for 'deferred' even 'unittest.TestCase's run with wait time. Now a solid fundament for a test suite is lots of fine grained 'pure' test. Tests that either don't interact with the UI at all, or where these side-effects are actually mocked away.

That's the case for 'test_user_args.py' from SublimeLinter. Here I as a user can expect maximum throughput. But I also want blocking behavior to ensure other plugins and Sublime don't see and interact with my mocks. This could be done by optimizing for the usage of 'unittest.TestCase' but by writing these tests I found that I still want the option to do some 'yield's (e.g. in setUp) because a test might at least need a window or view. (Creating a view is fixture setup not UI interaction testing. The test code uses getters on the view not setters.)

That's the second point: For the probably vast majority of 'pure' tests you want sync and blocking.

Do we need waiting time between tests? I do not tackle this here because that code is in suite.py. Depending on the test we probably need waiting time between them. But suite uses plain 'yield's or double-'yield's which correspond to ~20+ms for master. But 10 or 20ms are meaningless here. You (usually) want either no waiting time, 0 to process the main thread tasks, or 'AWAIT_WORKER' to await all the async events. (Basically to run the task queues empty.) (Side note: Sublime does NOT generally paint within 10 or 20ms!)

Can we substitute yield 0 (or defer(0, fn)) with direct continuation (fn())? Here I don't know if I understand you correctly. In 'fast_runner' we have some limited usage of defer(0, ...) and only in _continue_testing, none of them can be transformed to direct calls. The common usage here is plain 'yield'. If we were to transform that into direct invocation

a()
yield
b()

would be the same as

a()
b()

which of course doesn't make sense.

When and why do I use direct invocation?

Most important when the 'condition' resolves either by throwing, timeout or successfully.

a()              A
yield condition  B
b()              C

Going from A -> B we add waiting time which is what the user intends anyway. Going from B -> C we flow directly. Between A and B the user expects that the world changes, but going from B to C we must do our best that the state of the world stays consistent. That's the idea here. What holds true in condition() must be true in C. If we add waiting time we sure cannot hold that.

Other than that I do it for _start_testing() and the initial _continue_testing(deferred) in '_start_testing()'. Both are not really relevant. Doing both with defer(10) semantics as in master doesn't make sense, the lines are really only executed at the beginning of the test run, they are used once. They basically run between the actual test code and the preparation code in test_package.py. An initial defer I would say could be useful so that the calling code in test_package falls through to the cleanup poller before we actually hit user test code. It can also ensure that we actually land as soon as possible on the main thread because in test_current we hop on the worker thread which is surprising and unfortunate.

(That makes it difficult: I think 1. both calls should be direct invocations as it is in my PR. 2. We shouldn't hop to the worker thread instead package_reloader which is right now a function that can only run on threads should accept a continuation function and do the thread hopping by itself. It is a normal function so it shouldn't care in what environment the user calls it. If it needs that environment it's the responsibility of the function to prepare this environment. 3. We shouldn't poll (on the main thread) for testRunner.finished instead we pass in a callback into testRunner.run() and the runner will therefore call us if it's done.)

@randy3k
Copy link
Member

randy3k commented Dec 14, 2018

TLDR, I am ok with the idea of replacing the old runner with your new runner. Even a little bit breakage is okay. TBH, not many people is using the deferred option. If necessary, I could do a quick search on GitHub and remind every repo which uses UnitTesting.

@kaste
Copy link
Contributor Author

kaste commented Dec 15, 2018

Okay, just let it sink, maybe I can come up with dryer version.

@randy3k
Copy link
Member

randy3k commented Dec 15, 2018

How about making the new runner the default and rename the original runner LegacyDeferringTestRunner and add a setting legacy_runner with default false. I don’t really mind some breakage here, in fact I think most users will not be effected by this change

Basically, the old runner is now called LegacyDeferringTextTestRunner,
and the former FastDeferringTextTestRunner is the new DeferrableTestCase.
@kaste
Copy link
Contributor Author

kaste commented Dec 19, 2018

Okay, enough testing. There is something in master that makes 'circleci: build-linux' flaky. (Or the combination of this PR plus master.)

@randy3k randy3k merged commit 9cbc3f1 into SublimeText:master Dec 20, 2018
@randy3k
Copy link
Member

randy3k commented Dec 20, 2018

it looks all good to me now. Merged now.

At some point, we might want to set the default value of legacy_runner to false.

@randy3k
Copy link
Member

randy3k commented Dec 20, 2018

I will make a new release in a few days if everything runs smoothly

@kaste
Copy link
Contributor Author

kaste commented Dec 20, 2018

Yeah, that's the plan. Write a good update message. Essentially we remove implicit waiting time so users have to make their needed waiting time explicit. 'AWAIT_WORKER' can be used here. Also users should almost always prefer yield <condition> over yield <delay>. Maybe deprecate until whatever April, 1st. Then flip the default, then April 15th remove the code.

Is it we're lucky that 'circleci' is green on master, or did you change some setting here?

@randy3k
Copy link
Member

randy3k commented Dec 21, 2018

I didn’t change any settings. Maybe you were being unlucky to catch some random glitches.

@kaste kaste deleted the next branch March 3, 2024 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants