Scheduler Improvements 507#1166
Conversation
|
@akarnokd you can use the atomic field updaters :) http://normanmaurer.me/blog/2013/10/28/Lesser-known-concurrent-classes-Part-1/ |
|
Thanks, I barely remembered them. I'd go for them but I have doubts on their performance due to lot of security checks (and how Android would behave). I found this thread which says even Java 7 moved away. |
|
Oh ok. They use it in netty which also works on android IIRC. Might be wirth giving a shot and comparing GC. Also mem in your case will go down? |
|
I'll continue experimenting with the memory footprint tomorrow. |
I believe this also covers |
|
I thought about this and found a very common case which makes this still slow and/or a memory hog: let's schedule a task T1 with a long delay, then start scheduling immediate tasks. Because the T1 is not removed from the head of the queue, the dequeueing of subsequent tasks end up doing O(n) lookup. In addition, since the head pointer is pinned, once the tail wraps around, it will grow the queue. I have a few ideas to resolve this:
Unfortunately, |
|
RxJava-pull-requests #1086 SUCCESS |
|
Did some changes and experiments. Switching to The single call case got a tiny bit more expensive because the emission of a single item unsubscribes the worker with two subscription containers. The memory consumption was measured via heap dump when this program pauses: public class CompositeSubscriptionMemoryOverhead {
public static void main(String[] args) throws IOException {
int n = 1024 * 1024;
Worker w = Schedulers.computation().createWorker();
for (int i = 0; i < n; i++) {
if (i % (128 * 1024) == 0) {
System.out.println(i);
}
w.schedule(Actions.empty(), 1, TimeUnit.DAYS);
}
System.out.print("Press ENTER to quit");
System.in.read();
}
} |
|
@akarnokd did you also check with your proposed Unsafe call? Would be interesting how that changes in comparison to the atomic field updater :) |
There was a problem hiding this comment.
Don't know if it makes much difference, but if you switch these if-statements, there's one less method call if unsubscribed.
There was a problem hiding this comment.
If I swithc them, then if delayTime <= 0 there will be two isUnsubscribed call which is likely more expensive.
|
I started out with Unsafe on CompositeSubscription and gave 17.2Mops/s for add/remove, but the build started to complain about using it. I guessed @benjchristensen wouldn't want that and the performance difference is so small it was not worth it. |
|
I'll post a new PR to catch-up with master. |
NewThread/EventLoop scheduler improvements proposal.
NewThreadScheduler
There is no need to have an
innerSubscriptionthere as the underlying Executor knows what tasks are in its queue and ashutdownwill cancel them anyway. In the original version, I made a small mistake by leaving out aninnerSubscription.add(s)which was one of the main contribution to the speed improvements (did not affect correctness).EventLoopScheduler
An EventLoopScheduler needs to track its tasks so it can selectively cancel them in the NewThreadScheduler. Since an ELS worker is single threaded, the addition and removal of the completed tasks are more like queue operations; adding a last item and removing a first item in CompositeSubscription can be expensive even if the size is small due to the copying and state machine overhead.
SubscriptionQueue
Therefore, I've built a special array-based (ringbuffer) queue called
SubscriptionQueuewhich can resize itself as needed, similar to ArrayDeque, but behaves like a composite and queued items can be unsubscribed at once and provides the usual cancellation policy. When benchmarked with a simple loop of add/remove pair, it gives ~ 532 Mops/s whereasCompositeSubscriptiongives 16 Mops/s. It usessynchronizedas generally one needs to synchronize the producer(s) of tasks with the completion of the tasks on the worker thread, where the producer count is likely small. The initial queue capacity is 8 which favors fast tasks and the array fits nicely into a typical 64 byte cache line. One unique property is that it dequeues based on object identity and not the head of the queue. The reason for this is that when there are multiple producers, queueing their subscriptions might happen in different order than their tasks are scheduled (i.e., the head of the queue points to s1 while t2 gets scheduled first).Perf tests
I did some perf testing with
ComputationSchedulerPerfbut I ran into some trouble: the initial 512M memory for the benchmark doesn't seem to be enough, especially on a 4/8 core machine. Both the master and this PR goes really slow or fails with GC errors because the internal queues of the Executors get flooded with tasks. Each task is about 650 byte and having tens of thousands queued up consumes lot of memory. I run the perftests with 1300M which was enough although still pounding on the GC. (Btw, I don't understand the perf code: does the subscribeOn test do 1M one-time subscription, or subscribes to a stream of 1M elements? If the latter, what are the tasks that hammer the executor?)Master (https://gist.github.com/akarnokd/1fe0fb74f896c48c61a8)
This proposal (https://gist.github.com/akarnokd/6d9ba66761c5bdb8ecd7)
The ObserveOn test benefits minimally from the changes. For the 1 onSubscribe, it drops to 840ns. For the other sizes, I suspect hectic GC overhead so I can't declare a winner.
(Generally, since we use a lot of AtomicXYZ classes, they add 24 bytes to the memory footprint every time a Subscription is present. In order to get rid of them, one would need to replace it with volatile fields and
Unsafecalls to get the CAS functionality back.)Benchmarked on an i7 920 @ 2.66GHz, 4/8 hyperthreaded cores, 6GB total RAM, Windows 7 x64, Java 7u55 x64.