Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

goroutine leak in charon-perf-1 #2439

Closed
dB2510 opened this issue Jul 17, 2023 · 3 comments
Closed

goroutine leak in charon-perf-1 #2439

dB2510 opened this issue Jul 17, 2023 · 3 comments
Assignees
Labels
bug Something isn't working protocol Protocol Team tickets

Comments

@dB2510
Copy link
Contributor

dB2510 commented Jul 17, 2023

馃悶 Bug Report

Description

We have observed a goroutine leak in charon-perf-1 cluster for peers faithful-sports and cute-cycle.

馃敩 Minimal Reproduction

Run a 3 of 4 cluster with commit hash = 79116f9 and with 2 nodes down.

馃敟 Error

Here's pprof profile of goroutine of faithful-sports:
charon-perf-1-faithful-sports-goroutine.gz

Check the above profile with the following command:

go tool pprof -http=: charon-perf-1-faithful-sports-goroutine.gz

What version of Charon are you running? (Which release)

79116f9
@github-actions github-actions bot added the protocol Protocol Team tickets label Jul 17, 2023
@dB2510 dB2510 added the bug Something isn't working label Jul 17, 2023
@corverroos
Copy link
Contributor

Seems to be due to gorougtines getting stuck in consensus "waiting for running consensus to exit", this is probably because it already exitted and the Propose method is being called again.

goroutine profile: total 758
454 @ 0x454156 0x41f11d 0x41ec18 0x10ea5f7 0x10ea272 0x1034f43 0x102fd54 0x1023e78 0x103652a 0x4886e1
#	0x10ea5f6	github.com/obolnetwork/charon/core/consensus.(*Component).propose+0x296	/app/charon/core/consensus/component.go:318
#	0x10ea271	github.com/obolnetwork/charon/core/consensus.(*Component).Propose+0x71	/app/charon/core/consensus/component.go:271
#	0x1034f42	github.com/obolnetwork/charon/core.WithTracing.func1.3+0x102		/app/charon/core/tracing.go:72
#	0x102fd53	github.com/obolnetwork/charon/core.WithTracking.func1.2+0x53		/app/charon/core/tracking.go:24
#	0x1023e77	github.com/obolnetwork/charon/core.WithAsyncRetry.func1.3.1+0x37	/app/charon/core/retry.go:32
#	0x1036529	github.com/obolnetwork/charon/app/retry.(*Retryer[...]).DoAsync+0x8c9	/app/charon/app/retry/retry.go:118

obol-bulldozer bot pushed a commit that referenced this issue Jul 18, 2023
Error when either `Participate` or `Propose` is called multiple times for the same duty since this results in goroutines hanging forever. 

Also removes debug logs.

category: refactor
ticket: #2439
@dB2510
Copy link
Contributor Author

dB2510 commented Jul 19, 2023

Here's the complete analysis and fix of the bug: https://docs.google.com/document/d/16Y2TSx1IsAMJMxVwNQKCIMb0MIuuSWMWDvParJZ1I6g/edit

obol-bulldozer bot pushed a commit that referenced this issue Jul 19, 2023
If consensus has been started because enough messages have been received, don't also start it if exchangeTimeout is reached.

category: bug
ticket: #2439
obol-bulldozer bot pushed a commit that referenced this issue Jul 20, 2023
Adds a `ShouldRun` method to `instanceIO` to check if the instance was actually "running" rather than returning boolean from `getInstanceIO`.

category: bug
ticket: #2439
@dB2510
Copy link
Contributor Author

dB2510 commented Jul 24, 2023

closed by #2452

@dB2510 dB2510 closed this as completed Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working protocol Protocol Team tickets
Projects
None yet
Development

No branches or pull requests

3 participants