Race condition and unnecessary heap alloc in actor locking #2306

youngbupark · 2020-10-22T19:50:27Z

In what area(s)?

/area runtime

What version of Dapr?

master

Problem

When multiple invocation requests comes, we use lock to meet turn-based concurrency per actor and set the busy flag and create busy channel. these busy flag and busy channel are used to check the busy state of actor and signal to drain actors.

Drain actor logic

dapr/pkg/actors/actors.go

Lines 656 to 667 in b8e8eec

    
           if a.config.DrainRebalancedActors { 
        
           	// wait until actor isn't busy or timeout hits 
        
           	if actor.isBusy() { 
        
           		select { 
        
           		case <-time.After(a.config.DrainOngoingCallTimeout): 
        
           			break 
        
           		case <-actor.channel(): 
        
           			// if a call comes in from the actor for state changes, that's still allowed 
        
           			break 
        
           		} 
        
           	} 
        
           }

actor lock / unlock

dapr/pkg/actors/actor.go

Lines 47 to 66 in b8e8eec

    
           func (a *actor) lock() { 
        
           	atomic.AddInt32(&a.pendingLockCount, 1) 
        
           	diag.DefaultMonitoring.ReportCurrentPendingLocks(a.actorType, a.pendingLockCount) 
        
           	a.concurrencyLock.Lock() 
        
           	a.busy = true 
        
           	a.busyCh = make(chan bool, 1) 
        
           	a.lastUsedTime = time.Now().UTC() 
        
           } 
        
           func (a *actor) unLock() { 
        
           	if a.busy { 
        
           		a.busy = false 
        
           		close(a.busyCh) 
        
           	} 
        
           	a.concurrencyLock.Unlock() 
        
           	atomic.AddInt32(&a.pendingLockCount, -1) 
        
           	diag.DefaultMonitoring.ReportCurrentPendingLocks(a.actorType, a.pendingLockCount) 
        
           }

Let's assume that we have two requests, A and B, which invoke ActorID0 simultaneously. A holds lock and then B holds the same lock.
Once A releases the lock, A will mark busy flag to false and close the busy channel. B does not get lock yet

If ActorID0 is being drained at this time, draining logic can deactivate ActorID0 because busy channel is closed. but B will try to set the busy flag to true and set new channel. B can call application deactivation http url. So in this case, runtime actor table doesn't include ActorID0 because it is deleted, but B can call application method and SDK will try to activate it again.

In this case, ActorID0 status between runtime and application is not consistent.

Release Note

RELEASE NOTE: FIX race condition of busy flag and channel and unnecessary heap alloc in actor locking

youngbupark · 2020-10-22T19:52:37Z

@yaron2 @artursouza Please review this issue. If you think I misunderstand, please point it out. Otherwise, we should fix this in this milestone. I found this problem when I profile memory of dapr runtime. it busy channel creation allocates lots of memory. we need to improve this part as well.

…ion (#2332) * Fix race condition - #2306 * Do not process drained actor * Avoid unnecessary channel allocation * Rename member name to be explicit and add more comments * Add more tests

…ion (dapr#2332) * Fix race condition - dapr#2306 * Do not process drained actor * Avoid unnecessary channel allocation * Rename member name to be explicit and add more comments * Add more tests

youngbupark added the kind/bug Something isn't working label Oct 22, 2020

youngbupark added this to To do in 1.0.0 Milestone 1 via automation Oct 22, 2020

youngbupark added the area/runtime/actor label Oct 22, 2020

youngbupark removed this from To do in 1.0.0 Milestone 1 Oct 22, 2020

artursouza added this to To do in 1.0.0 Milestone 1 via automation Oct 22, 2020

artursouza added this to To do in 1.0.0 Milestone via automation Oct 22, 2020

artursouza added the P1 label Oct 22, 2020

youngbupark self-assigned this Oct 23, 2020

youngbupark moved this from To do to In progress in 1.0.0 Milestone 1 Oct 26, 2020

youngbupark moved this from To do to In progress in 1.0.0 Milestone Oct 26, 2020

youngbupark mentioned this issue Oct 27, 2020

Fix race condition while disposing actor and avoid unecessary allocation #2332

Merged

7 tasks

youngbupark closed this as completed in #2332 Oct 28, 2020

1.0.0 Milestone 1 automation moved this from In progress to Review Oct 28, 2020

1.0.0 Milestone automation moved this from In progress to Done Oct 28, 2020

artursouza moved this from Review to Done in 1.0.0 Milestone 1 Oct 29, 2020

artursouza moved this from Done to Release in 1.0.0 Milestone 1 Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition and unnecessary heap alloc in actor locking #2306

Race condition and unnecessary heap alloc in actor locking #2306

youngbupark commented Oct 22, 2020

youngbupark commented Oct 22, 2020

Race condition and unnecessary heap alloc in actor locking #2306

Race condition and unnecessary heap alloc in actor locking #2306

Comments

youngbupark commented Oct 22, 2020

In what area(s)?

What version of Dapr?

Problem

Release Note

youngbupark commented Oct 22, 2020