Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Goroutine leak in plugin/storage/es/ #5083

Open
yurishkuro opened this issue Jan 6, 2024 · 6 comments
Open

[Bug]: Goroutine leak in plugin/storage/es/ #5083

yurishkuro opened this issue Jan 6, 2024 · 6 comments
Labels
bug help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Jan 6, 2024

What happened?

Unrelated PR failed with goleak:

Runtime error
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 106 in state select, with github.com/olivere/elastic.(*Client).healthchecker on top of the stack:
github.com/olivere/elastic.(*Client).healthchecker(0xc0002561a0)
	/home/runner/go/pkg/mod/github.com/olivere/elastic@v6.2.37+incompatible/client.go:1060 +0x229
created by github.com/olivere/elastic.DialContext in goroutine 102
	/home/runner/go/pkg/mod/github.com/olivere/elastic@v6.2.37+incompatible/client.go:404 +0xcfa
 Goroutine 107 in state select, with github.com/olivere/elastic.(*bulkWorker).work on top of the stack:
github.com/olivere/elastic.(*bulkWorker).work(0xc000476180, {0x1de9fb8, 0x2906f40})
	/home/runner/go/pkg/mod/github.com/olivere/elastic@v6.2.37+incompatible/bulk_processor.go:480 +0x1c5
created by github.com/olivere/elastic.(*BulkProcessor).Start in goroutine 102
	/home/runner/go/pkg/mod/github.com/olivere/elastic@v6.2.37+incompatible/bulk_processor.go:336 +0x5dc
]

Steps to reproduce

Unknown, this appears to be a spurious condition when ES clients are not being stopped properly (they kick off a bg healthcheck go routine). It's possible that there is a race condition between starting and stopping the client, since we already spent time fixing these go leaks.

Expected behavior

No go leaks in the package.

@akagami-harsh
Copy link
Member

I'll look into it

@akagami-harsh
Copy link
Member

i think, this error could be because of Elasticsearch client is not properly closed at some parts of code in plugin/storage/es/factory_test.go

The Close() method of the Factory struct wasn't invoked at the end of some tests. While calling f.Close() might resolve the issue, I'm unable to confirm this without a reliable method of reproducing the error.

@yurishkuro yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Jan 7, 2024
@yurishkuro
Copy link
Member Author

yurishkuro commented Jan 21, 2024

@akagami-harsh where do you see specific places where Close is not called?

@yurishkuro
Copy link
Member Author

One possible approach is to switch the tests to use this constructor from olivere lib:

func NewSimpleClient(options ...ClientOptionFunc) (*Client, error) {

It does not start any bg routines so would be more suitable for tests, but it will also introduce an extra complexity (e.g. in pkg/es) and divergence between prod and test behavior. Ideally we should track down if we're leaking a client when passwords are being rotated (the failures started after that code was merged).

@akagami-harsh
Copy link
Member

@akagami-harsh where do you see specific places where Close is not called?

in TestArchiveEnabled

but the test passes without calling f.Close()

@yurishkuro
Copy link
Member Author

Yes, there should be f.Close there, but since it's using mock client it shouldn't affect anything (still worth adding close)

yurishkuro added a commit that referenced this issue Jan 22, 2024
## Which problem is this PR solving?
- Part of #5083

## Description of the changes
- Add goleak checks to individual tests suspected of causing the leaks,
this might help confirming which ones are at fault, since the main
VerifyGoLeak check cannot distinguish between tests in the package.

## How was this change tested?
```
$ go test -count 100 ./plugin/storage/es
ok  	github.com/jaegertracing/jaeger/plugin/storage/es	8.246s
```

---------

Signed-off-by: Yuri Shkuro <github@ysh.us>
yurishkuro added a commit that referenced this issue May 21, 2024
## Which problem is this PR solving?
- Part of #5083

## Description of the changes
- Added f.close() in TestArhiveEnabled based on
#5083 (comment)

## How was this change tested?
-  CI workflow

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [ ] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Vamshi Maskuri <gwcchintu@gmail.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Signed-off-by: Vamshi Maskuri <117595548+varshith257@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

No branches or pull requests

2 participants