Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2539] Core: Add deadlock tracking feature #835

Closed
wants to merge 1 commit into from

Conversation

craigcondit
Copy link
Contributor

@craigcondit craigcondit commented Apr 4, 2024

What is this PR for?

Replaces sync.{RW}Mutex with internal locking.{RW}Mutex implementations. The new implementation wraps the go-deadlock library with logic to conditionally enable deadlock detection based on the presence of environment variables:

To enable the feature:

  • DEADLOCK_DETECTION_ENABLED=true

To customize the timeout before potential deadlocks are logged (default is 60 seconds):

  • DEADLOCK_TIMEOUT_SECONDS=60

See https://github.com/sasha-s/go-deadlock for more details.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2539

How should this be tested?

Added basic unit tests and benchmarks.

Impact

Performance impact is negligible when the feature is disabled (~ 4 ns per lock). When enabled, locks are considerably slower due to the tracking required; however even on a 5-year old laptop I still see ~ 800,000 locks per second. This should still be usable for short periods of diagnosing even relatively busy clusters.

$ go test -v -run '^Benchmark' -bench . ./pkg/locking/...
goos: darwin
goarch: amd64
pkg: github.com/apache/yunikorn-core/pkg/locking
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
# Golang - sync.Mutex / sync.RWMutex

BenchmarkSyncMutex-12                 	100000000	        10.83 ns/op
BenchmarkSyncRWMutexRead-12           	100000000	        10.79 ns/op
BenchmarkSyncRWMutexWrite-12          	55220977	        21.92 ns/op
# Internal wrapper (tracking disabled)

BenchmarkUntrackedMutex-12            	83157864	        14.32 ns/op
BenchmarkUntrackedRWMutexRead-12      	87364526	        14.31 ns/op
BenchmarkUntrackedRWMutexWrite-12     	47517205	        25.22 ns/op
# Go-Deadlock (tracking enabled)

BenchmarkGoDeadlockMutex-12           	  908439	      1321 ns/op
BenchmarkGoDeadlockRWMutexRead-12     	  852348	      1301 ns/op
BenchmarkGoDeadlockRWMutexWrite-12    	  903925	      1312 ns/op
# Internal wrapper (tracking enabled)

BenchmarkTrackedMutex-12              	  886326	      1324 ns/op
BenchmarkTrackedRWMutexRead-12        	  886848	      1306 ns/op
BenchmarkTrackedRWMutexWrite-12       	  909826	      1311 ns/op

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@codecov-commenter
Copy link

codecov-commenter commented Apr 4, 2024

Codecov Report

Attention: Patch coverage is 44.89796% with 27 lines in your changes are missing coverage. Please review.

Project coverage is 79.22%. Comparing base (5716f46) to head (9b2e15a).

Files Patch % Lines
pkg/locking/locking.go 35.71% 26 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #835      +/-   ##
==========================================
- Coverage   79.40%   79.22%   -0.18%     
==========================================
  Files          82       83       +1     
  Lines       11317    11361      +44     
==========================================
+ Hits         8986     9001      +15     
- Misses       2009     2036      +27     
- Partials      322      324       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Replaces sync.{RW}Mutex with internal locking.{RW}Mutex implementations.
The new implementation wraps the go-deadlock library with logic to
conditionally enable deadlock detection based on the presence of
environment variables:

To enable the feature:

- DEADLOCK_DETECTION_ENABLED=true

To customize the timeout before potential deadlocks are logged (default
is 60 seconds):

- DEADLOCK_TIMEOUT_SECONDS=60

See https://github.com/sasha-s/go-deadlock for more details.
Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Looks like a straightforward change.

craigcondit added a commit that referenced this pull request Apr 5, 2024
Replaces sync.{RW}Mutex with internal locking.{RW}Mutex implementations.
The new implementation wraps the go-deadlock library with logic to
conditionally enable deadlock detection based on the presence of
environment variables:

To enable the feature:

- DEADLOCK_DETECTION_ENABLED=true

To customize the timeout before potential deadlocks are logged (default
is 60 seconds):

- DEADLOCK_TIMEOUT_SECONDS=60

See https://github.com/sasha-s/go-deadlock for more details.

Closes: #835
@craigcondit craigcondit deleted the YUNIKORN-2539 branch May 6, 2024 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants