Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to fully batched scheduling & add blocking task warning #127

Merged
merged 3 commits into from
Dec 1, 2016

Conversation

njsmith
Copy link
Contributor

@njsmith njsmith commented Nov 23, 2016

This PR makes curio's scheduling proceed in strict batches: each batch
starts with polling for I/O, and if a task becomes runnable now, it won't
actually run until the next batch. Basically this gives us the invariant
that no single task can be scheduled twice without checking for I/O in
between.

This prevents various obscure pathologies (see gh-112), and brings us in
line with how most (all?) other event loops work; see for example:
http://docs.libuv.org/en/v1.x/design.html#the-i-o-loop

It also adds a warning if any individual task blocks the event loop
for more than 50 ms. This value is pretty arbitrary, but at least it's
a start. Combined with the above change, this should at least be
sufficient to detect any gratuitous starvation issues.

@dabeaz
Copy link
Owner

dabeaz commented Nov 23, 2016

Any idea how this impacts performance?

@dabeaz
Copy link
Owner

dabeaz commented Nov 23, 2016

On the performance comment. One of my concerns is about surrounding the execution of each task with timer calls. In my experience, timer calls aren't free and I've been burned by this sort of thing in the past (accidentally creating code that becomes "timer-bound" if you will). The uvloop document referenced makes a note about avoiding too many timer-related system calls as well.

With that in mind, I might be inclined to make the timer-related warning optional.

@ZhukovAlexander
Copy link
Contributor

Maybe just relax the timer to only measure the total time of the loop cycle instead of per-task? Sure, we would lose some information, but this is less of an overhead IMHO.

@dabeaz
Copy link
Owner

dabeaz commented Nov 23, 2016

I did some looking at the source to gevent to see how it is handling this. So far as I can tell, it is NOT batching tasks around the event loop quite like this. If I'm reading the source correctly (the _run_callbacks() method in https://github.com/gevent/gevent/blob/master/src/gevent/libev/corecffi.py), it looks like it runs all available callbacks (including new ones added by run_callback()), but with a counter. After 1000 callbacks, it goes back to the event loop.

@dabeaz
Copy link
Owner

dabeaz commented Nov 23, 2016

Looking at asyncio, it appears to be doing some kind of batching similar to what's seen in this PR. The implementation is a bit simpler. It simply computes the length of the ready queue when it starts scheduling and only schedules that many tasks--regardless of whether or not more tasks get added to the queue in the process. I think I like that better (you don't need to manage two separate queue objects).

asyncio also includes some code to check for slow callbacks using timers. However, that code is only enabled if it's running in debug mode.

@ZhukovAlexander
Copy link
Contributor

ZhukovAlexander commented Nov 23, 2016

I think we're making a Kernel.run method pretty fat with all those changes to how we run a single task. Maybe it's time to refactor it into something like Kernel._run_single_task?

@ZhukovAlexander
Copy link
Contributor

I've created a wiki page where we can put some description about the curio's event loop internals. I think we need some generalisation to prevent the big chaotic fluctuations of the course in which the event loop should develop.

@dabeaz
Copy link
Owner

dabeaz commented Nov 23, 2016

Although the Kernel.run() method is a bit fat, I also consider it to be fairly performance critical. All things equal, I'd rather it be fast than overly abstracted.

@ZhukovAlexander
Copy link
Contributor

@dabeaz, ok, if we stick with keeping all functions in a local namespace, we can at least introduce something like _run_task similar to the _cleanup_task so we at least make sure this code live in one place. Otherwise, we might eventually end up with a barely maintainable code. Lets keep it fat but structured in some way. I deeply believe, that maintainability is crucial for curio.

This commit makes curio's scheduling proceed in strict batches: each
batch starts with polling for I/O, and if a task becomes runnable now,
it won't actually run until the next batch. Basically this gives us
the invariant that no single task can be scheduled twice without
checking for I/O in between.

This prevents various obscure pathologies (see dabeazgh-112), and brings us
in line with how most (all?) other event loops work; see for example:
   http://docs.libuv.org/en/v1.x/design.html#the-i-o-loop
There's a lot more we could do than this in terms of metrics and
debugging, but it's a start :-).

50 ms was chosen pretty much completely arbitrarily. I suspect it's
too high, but we can always fix it later.
Starting with some simple scheduler microbenchmarks.

For information about asv, see:
  https://asv.readthedocs.io/
  https://github.com/spacetelescope/asv

Or to get started:
  pip install asv
  cd benchmarks
  asv run
@njsmith
Copy link
Contributor Author

njsmith commented Nov 24, 2016

The implementation is a bit simpler. It simply computes the length of the ready queue when it starts scheduling and only schedules that many tasks--regardless of whether or not more tasks get added to the queue in the process.

Oo, that is clever. Rewrote this to use that trick.

Any idea how this impacts performance?

I just added a new commit to this PR with the beginnings of a benchmark suite using asv. Seems like a useful thing to have in general. And I added two tiny pure-scheduler microbenchmarks. (Sorry this PR is getting so unfocused -- I can split it up if you want.)

I then tried using these microbenchmarks to compare version with the time_monotonic calls versus the version without, and wasn't able to get any reliable effect -- the impact seems to be smaller than the trial-to-trial noise. I also tried using perf, which works harder to try and tease out small differences, and it told me that on average the version with the time_monotonic calls was faster. So, I can't put a precise number on it, but it seems that even on a microbenchmark that does nothing but scheduling the slowdown is pretty small (less than, say, 10%?), or I'd have been able to detect it :-).

@ZhukovAlexander
Copy link
Contributor

ZhukovAlexander commented Nov 24, 2016

I just added a new commit to this PR with the beginnings of a benchmark suite using asv

This actually looks really promising. Any idea on how we can incorporate the graphs with results in curio's documentation page or somewhere publicly available? Would be really helpful to know the impact that a each PR will have on performance before merging them (without actually running any benchmarks against the target branch in your local env to check this).

@njsmith
Copy link
Contributor Author

njsmith commented Nov 24, 2016 via email

@dabeaz
Copy link
Owner

dabeaz commented Dec 1, 2016

I'm going to merge this so that it doesn't languish on Github. I want to modify it so that the time check is optional though--only enabled if in a debugging mode or if set to a non-zero value.

@dabeaz dabeaz merged commit 720019b into dabeaz:master Dec 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants