Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance isolates #51603

Open
gmb119943 opened this issue Mar 2, 2023 · 3 comments
Open

performance isolates #51603

gmb119943 opened this issue Mar 2, 2023 · 3 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-isolate type-question A question about expected behavior or functionality

Comments

@gmb119943
Copy link

From a code performance point of view, is it better to use isolate pools to send unrelated tasks to isolates for execution? Or is it possible to create a new isolate for each task without loss of code performance?

@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-isolate type-question A question about expected behavior or functionality labels Mar 2, 2023
@a-siva
Copy link
Contributor

a-siva commented Mar 2, 2023

This is going to depend on a number of factors

@gmb119943
Copy link
Author

The scenario is roughly the following. There is a set of unrelated tasks. For each task, a new isolate is created and uses the exit function to pass the result without copying. Would it be better to keep a ready pool of isolates in this case, or is the cost of creating an isolate always minimal (if the number of isolates is less than the maximum limit, more than 16 isolates cannot be created on my PC)?

@lrhn
Copy link
Member

lrhn commented Mar 3, 2023

As @a-siva says, "that depends".

Which operation dominates the computation? And is it speed or memory which is more important?

If you use Isolate.run, you spend time creating a new isolate and sending the initial message. Then you do the computation. Then you copy the result back for free. And the isolate goes away when it's done and takes no more memory.

If you use an isolate pool, you spend no time creating an isolate, send the initial message, do the computation, then spend time copying back the result. And the isolate stays alive, taking up member, whether you use it again or not.

The sending of the initial message and doing the computation are fixed costs.

For small return values, not creating a new isolate is definitely faster.
For large return values, creating a new isolate, but getting free return shipping, is definitely faster.

To find the cut-off point, you will have to measure your program. The start-up time of an isolate will most likely depend, at least a little, on the size of the program it's being spawned from. Even with fast isolate spawning and sharing of immutable data, there will be some setup to make space for global mutable variables, which exist per-isolate.

The one further risk of an isolate pool is that you may get less parallelization. If you have 10 isolates in the pool, and you run 20 tasks, it will at most run 10 of those at a time. With 20 isolates, it can hypothetically run twice as fast. If the user has 20 CPU cores to run on, they're not doing anything else, and the stars are just right.
(And if you can spawn 20 isolates. If there is a limit on how many isolates one can create, then a pool can help avoiding that, but going too close to the limit might break other libraries which try to create their own isolates.)

But you can also use a growing isolate pool which creates new isolates so every concurrent request has its own isolate, then it reuses those isolates only when the computation is done.

Then there is the memory cost of keeping isolates alive when they aren't needed any more.
(And that's when one starts considering garbage-collecting isolates if usage drops for a while, or keep a hard maximum number of isolates, and all the other considerations you'd have for resource pools in general.)

There will be some "pool maintenance" cost, but that's likely to be negligible compared to the actual computations.

And there needs to be a pool strategy, which is at least:

  • How many isolates initially?
  • How many isolates max?
  • Are isolates GC'ed if usage drops?
  • Can more than one computation run on the same isolate at the same time? (Only works if the operations are async.)
  • If so, is there a limit on how many? (If yes, you'd queue further operations locally until an isolate becomes free. That adds latency.)
  • (If the pool is really overloaded, should you run the occasional asynchronous computation in the current isolate? You'd assume that some code in the local isolate is waiting for results for all the pending computations, so running a computation locally could actually make it progress faster.)

All these decisions factor into how efficient the pool will be.
So try, and measure. There is no one answer which fits all programs.

(One example of a load-balancing pool is, from the no-longer-maintained package:isolate, LoadBalancer. Whether it fits your goals depend on what those goals are.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-isolate type-question A question about expected behavior or functionality
Projects
None yet
Development

No branches or pull requests

3 participants