-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about CounterActor benchmark #4
Comments
Before getting your message I actually removed the previous versions of the Regarding the performance of the native vs Hopac versions, your analysis is The particular "CounterActor" benchmark is rather problematic in the sense I'll reply to the other points shortly. |
Hmm... It would seem that AltLock implementation is not correct. If I'm On Sun, Feb 23, 2014 at 1:18 AM, Anton Tayanovskyy <notifications@github.com
|
Yes I think my mistake is that I don't understand how to handle Cont properly - once fixed the performance will likely degrade and all will be well :) I got Must be some other API in there to |
BTW, for understanding spinlocks, I would recommend reading chapter 7 of I'm not at all opposed to tweaking the spinlock implementation used in Actually, probably the most viable alternative for the current TTAS On Sun, Feb 23, 2014 at 2:23 AM, Vesa Karvonen vesa.a.j.k@gmail.com wrote:
|
I think I'll write a document with some notes on the internal On Sun, Feb 23, 2014 at 2:56 AM, Anton Tayanovskyy <notifications@github.com
|
https://github.com/VesaKarvonen/Hopac/blob/master/Docs/Internals.md Will write more shortly. On Sun, Feb 23, 2014 at 2:56 AM, Anton Tayanovskyy <notifications@github.com
|
Thanks, this helps, the So it looks like the C# code basically inlining everything, from locking to data structures like Work queues, and minimizing the number of objects by pulling cramming things together to minimize memory/GC use.. Ok, this makes sense - not surprising it's hard to read then. I just thought this is like an advertisement for MLton. I never used it much, but the presentations claim that it gives free or almost free abstractions - something Hopac has to eliminate in C#/F# here for performance. Aha, so you are using TTAS "inlined" .. And that is generally better default than .NET lock/monitor? Very curious. I guess because of lower resource use and especially for not-contested case. Ok. |
So I attempted to fix my F#. Now I only run the continuation when the lock is Does this look correct now? It seems ok to me, but the performance results on the old CounterActor benchmark are baffling. This is now 5-6x faster than original Hopac Lock. namespace Hopac
module AltLock =
open System
open System.Threading
open Hopac
open Hopac.Core
[<AbstractClass>]
type Section() =
abstract Run : unit -> unit
[<Sealed>]
type Section<'T>(f: unit -> 'T, cont: Cont<'T>) =
inherit Section()
override self.Run() =
try
let r = f ()
Scheduler.Push(JobWork(Job.result r, cont))
with e ->
Scheduler.Push(FailWork(e, cont))
type private LockState =
| Free
| Locked of list<Section>
let private ( == ) (a: obj) (b: obj) =
Object.ReferenceEquals(a, b)
let inline private update (state: byref<'T>) (update: 'T -> 'T) =
let mutable loop = true
let mutable old = Unchecked.defaultof<_>
while loop do
old <- state
if old == Interlocked.CompareExchange(&state, update old, old) then
loop <- false
old
[<Sealed>]
type AltLock internal () =
[<VolatileField>]
let mutable state = Free
// Returns true if the lock is acquired;
// otherwise returns false and puts the critical section corresponding
// to `f` and `cont` into the wait list.
let enterLock f cont =
let st =
update &state (function
| Free -> Locked []
| Locked xs ->
let work = Section<_>(f, cont) :> Section
Locked (work :: xs))
match st with
| Free -> true
| _ -> false
// Releases the lock, but also runs critical sections of all
// waiters and schedules their continuations to execute.
let exitLock () =
let mutable loop = true
while loop do
let st =
update &state (function
| Locked [] -> Free
| Locked xs -> Locked []
| _ -> failwith "Impossible")
match st with
| Locked [] -> loop <- false
| Locked works ->
for w in works do
w.Run()
| _ -> failwith "Impossible"
member internal self.Protect<'R>(f: unit -> 'R) =
{
new Job<'R>() with
member __.DoJob(worker, cont) =
if enterLock f cont then
let mutable result = Unchecked.defaultof<_>
let mutable error = Unchecked.defaultof<_>
let mutable hasError = false
try
result <- f ()
with e ->
hasError <- true
error <- e
exitLock ()
if hasError then
cont.DoHandle(&worker, error)
else
cont.DoCont(&worker, result)
}
let create () = AltLock()
let protect (lock: AltLock) f = lock.Protect(f)
|
Interesting. Have you tried the AltLock implementation with the Chameneos Assuming the alternative lock is correct, then whether you push into the The current design of the "scheduler", which compromises both the worker There is an opportunity for optimization on the line
Here you could just use the Value field of the cont: cont.Value <- r Eliminates two allocations. On Sun, Feb 23, 2014 at 10:12 PM, Anton Tayanovskyy <
|
Heh... Improvement suggestions are more than welcome. :) BTW, just popped into my mind regarding AltLock, that the order in which On MLton... Yeah... It is something that I think is not widely appreciated On Sun, Feb 23, 2014 at 10:04 PM, Anton Tayanovskyy <
|
Darn it, I totally missed that, indeed! val inline duringJob: Lock -> Job<'a> -> Job<'a> Yes this would be harder. Long live MLTon! I was a bit unsure about pushing work into the same worker. With AltLock, when multiple light-weight threads come in at the same time they end jumping to one thread, which executes the critical sections. So I thought it's important to spawn the continuations back to have N threads out? I thought otherwise the lock might artificially reduce parallelism. But I have not looked at the scheduler much yet - if there is work stealing in the scheduler this is not to be worried about. Sounds like |
In the current implementation, the Worker.Push method pushes the given work |
Replaced this: override self.Run() =
try
let r = f ()
Scheduler.Push(JobWork(Job.result r, cont))
with e ->
Scheduler.Push(FailWork(e, cont)) with override self.Run(worker) =
try
cont.Value <- f ()
Worker.Push(&worker, cont)
with e ->
Scheduler.Push(FailWork(e, cont)) Gets weird. Obtained - no progress, all cores spinning, on "warmup n=10000". Deadlock on TTAS spin-locks? No it looks like Bug in my lock also possible. Let me think. |
I would assume a change elsewhere (some changes needed to pass the worker?) On Mon, Feb 24, 2014 at 1:41 AM, Anton Tayanovskyy <notifications@github.com
|
Oh, one thing. You are now messing with the very internals of Hopac so On Mon, Feb 24, 2014 at 2:01 AM, Vesa Karvonen vesa.a.j.k@gmail.com wrote:
|
Thanks for breaking my mental infinite loop. That's it I think. Should maybe have some kind of asserts throughout the code to allow compiling in Debug mode with assert checking? |
Did you make further progress with the lock? Just to clear a possible confusion, Hopac doesn't use the Hopac.Lock internally. It is an abstraction provided for clients of the library. Sometimes it is more natural to implement an algorithm in terms of locks than higher-level message passing primitives. You could compare it to something like a Monitor or Mutex in .Net / Windows - a type of synchronization object that potentially blocks the user job. Internally Hopac uses lower level mechanisms. |
Hey, I'm planning to get back to this. A lot going on, like Russia invaded my home country.. Yes, it's clear that we're talking about user-level locks here. --A |
I played with CounterActor benchmark a bit, I am getting about 10x better throughput on the one using native
lock
vs the Hopac lock. I guess this is as expected - Hopac lock advantage is that for long critical sections it does not waste a whole thread, right? There is also perhaps some extra overhead in how parallel-for is done, to account for this.A bit more surprising is that my first attempt at implementing lock API in F# is performing a bit better (1.5x). A bit strange - perhaps I'm not implementing it correctly, or the benchmark is not representative, or there's some other disadvantage.
If you have a spare moment could you give it a look?
I'm still not quite getting the DoJob/Worker/Cont protocol. Is the above basically how you use it? Is there some more stuff one can do with the Worker struct?
About backoff / Thread.SpinWait - I think I got the idea from Aaron Turon's work like Reagents and his thesis - can't confirm any benefit, but does not seem to hurt either. I have a 6-core machine. It might make a difference for scaling to more cores. Not sure what the mechanism is exactly that makes it work, but something bad is going on when many cores are reading the same memory address, so "backing off" to spin-wait reportedly helps.
Is there anywhere else to backoff? Through worker? Like, switch to a different task if contention on current one is too high?
The text was updated successfully, but these errors were encountered: