Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition and unnecessary heap alloc in actor locking #2306

Closed
youngbupark opened this issue Oct 22, 2020 · 1 comment · Fixed by #2332
Closed

Race condition and unnecessary heap alloc in actor locking #2306

youngbupark opened this issue Oct 22, 2020 · 1 comment · Fixed by #2332

Comments

@youngbupark
Copy link

In what area(s)?

/area runtime

What version of Dapr?

master

Problem

When multiple invocation requests comes, we use lock to meet turn-based concurrency per actor and set the busy flag and create busy channel. these busy flag and busy channel are used to check the busy state of actor and signal to drain actors.

Drain actor logic

dapr/pkg/actors/actors.go

Lines 656 to 667 in b8e8eec

if a.config.DrainRebalancedActors {
// wait until actor isn't busy or timeout hits
if actor.isBusy() {
select {
case <-time.After(a.config.DrainOngoingCallTimeout):
break
case <-actor.channel():
// if a call comes in from the actor for state changes, that's still allowed
break
}
}
}

actor lock / unlock

dapr/pkg/actors/actor.go

Lines 47 to 66 in b8e8eec

func (a *actor) lock() {
atomic.AddInt32(&a.pendingLockCount, 1)
diag.DefaultMonitoring.ReportCurrentPendingLocks(a.actorType, a.pendingLockCount)
a.concurrencyLock.Lock()
a.busy = true
a.busyCh = make(chan bool, 1)
a.lastUsedTime = time.Now().UTC()
}
func (a *actor) unLock() {
if a.busy {
a.busy = false
close(a.busyCh)
}
a.concurrencyLock.Unlock()
atomic.AddInt32(&a.pendingLockCount, -1)
diag.DefaultMonitoring.ReportCurrentPendingLocks(a.actorType, a.pendingLockCount)
}

Let's assume that we have two requests, A and B, which invoke ActorID0 simultaneously. A holds lock and then B holds the same lock.
Once A releases the lock, A will mark busy flag to false and close the busy channel. B does not get lock yet

If ActorID0 is being drained at this time, draining logic can deactivate ActorID0 because busy channel is closed. but B will try to set the busy flag to true and set new channel. B can call application deactivation http url. So in this case, runtime actor table doesn't include ActorID0 because it is deleted, but B can call application method and SDK will try to activate it again.

In this case, ActorID0 status between runtime and application is not consistent.

Release Note

RELEASE NOTE: FIX race condition of busy flag and channel and unnecessary heap alloc in actor locking

@youngbupark youngbupark added the kind/bug Something isn't working label Oct 22, 2020
@youngbupark youngbupark added this to To do in 1.0.0 Milestone 1 via automation Oct 22, 2020
@youngbupark
Copy link
Author

@yaron2 @artursouza Please review this issue. If you think I misunderstand, please point it out. Otherwise, we should fix this in this milestone. I found this problem when I profile memory of dapr runtime. it busy channel creation allocates lots of memory. we need to improve this part as well.

@youngbupark youngbupark removed this from To do in 1.0.0 Milestone 1 Oct 22, 2020
@artursouza artursouza added this to To do in 1.0.0 Milestone 1 via automation Oct 22, 2020
@artursouza artursouza added this to To do in 1.0.0 Milestone via automation Oct 22, 2020
@artursouza artursouza added the P1 label Oct 22, 2020
@youngbupark youngbupark self-assigned this Oct 23, 2020
@youngbupark youngbupark moved this from To do to In progress in 1.0.0 Milestone 1 Oct 26, 2020
@youngbupark youngbupark moved this from To do to In progress in 1.0.0 Milestone Oct 26, 2020
1.0.0 Milestone 1 automation moved this from In progress to Review Oct 28, 2020
1.0.0 Milestone automation moved this from In progress to Done Oct 28, 2020
youngbupark pushed a commit that referenced this issue Oct 28, 2020
…ion (#2332)

* Fix race condition - #2306
* Do not process drained actor
* Avoid unnecessary channel allocation
* Rename member name to be explicit and add more comments
* Add more tests
@artursouza artursouza moved this from Review to Done in 1.0.0 Milestone 1 Oct 29, 2020
vinayada1 pushed a commit to vinayada1/dapr that referenced this issue Nov 2, 2020
…ion (dapr#2332)

* Fix race condition - dapr#2306
* Do not process drained actor
* Avoid unnecessary channel allocation
* Rename member name to be explicit and add more comments
* Add more tests
@artursouza artursouza moved this from Done to Release in 1.0.0 Milestone 1 Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants