Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: tsdbrelay socket: too many open files #2462

Closed
langerma opened this issue Mar 4, 2020 · 6 comments
Closed

Bug: tsdbrelay socket: too many open files #2462

langerma opened this issue Mar 4, 2020 · 6 comments

Comments

@langerma
Copy link

langerma commented Mar 4, 2020

Expected behaviour

tsdbrelay should forward all connections to opentsdb instantly

Current behaviour

we get a huge bunge of messages like this from tsdbrelay:
2020/03/04 11:28:51 http: Accept error: accept tcp [::]:14242: accept4: too many open files; retrying in 5ms 2020/03/04 11:28:51 http: proxy error: dial tcp 127.0.0.1:4242: socket: too many open files 2020/03/04 11:28:51 http: proxy error: dial tcp 127.0.0.1:4242: socket: too many open files 2020/03/04 11:28:51 http: Accept error: accept tcp [::]:14242: accept4: too many open files; retrying in 5ms 2020/03/04 11:28:51 http: Accept error: accept tcp [::]:14242: accept4: too many open files; retrying in 10ms 2020/03/04 11:28:51 http: Accept error: accept tcp [::]:14242: accept4: too many open files; retrying in 20ms 2020/03/04 11:28:51 http: Accept error: accept tcp [::]:14242: accept4: too many open files; retrying in 5ms 2020/03/04 11:28:51 http: proxy error: dial tcp 127.0.0.1:4242: socket: too many open files 2020/03/04 11:28:51 http: proxy error: dial tcp 127.0.0.1:4242: socket: too many open files 2020/03/04 11:28:51 http: Accept error: accept tcp [::]:14242: accept4: too many open files; retrying in 5ms 2020/03/04 11:28:51 http: proxy error: dial tcp 127.0.0.1:4242: socket: too many open files 2020/03/04 11:28:51 http: Accept error: accept tcp [::]:14242: accept4: too many open files; retrying in 5ms 2020/03/04 11:28:51 http: proxy error: dial tcp 127.0.0.1:4242: socket: too many open files

Steps to reproduce

we are forwarding like: 80k datapoints per sec / which was very irregular
and since we took out tsdbrelay the ingestion rate stabilized.

Context

we have 6 tsdbs on our hadoop/hbase data nodes
each of 'em has 96 gb of ram from which tsdbs get 8 hadoop(hdfs) gets 4 and hbase gets 32
all of the nodes are equipped with ssds
on each tsdb async network io is turned to false.

tsdbrelay is configured like this:
/app/tsdbrelay/tsdbrelay -b bosun:8070 -t 127.0.0.1:4242 -l :14242 -redis redis:6379 -db 0

Logs

see above

@langerma langerma added the bug label Mar 4, 2020
@muffix
Copy link
Member

muffix commented Mar 4, 2020

Thanks for reporting. Can you please add a bit more detail about your setup? What system are you running your Bosun on and if it's a Linux, please can you post the contents of /proc/sys/fs/file-max and the output of ulimit -n?

@langerma
Copy link
Author

langerma commented Mar 5, 2020

good morning,

ulimit -n:
65535
cat /proc/sys/fs/file-max:
8143410

we are running on rhel 7. and opentsdb 2.4

kind regards

@langerma
Copy link
Author

any news on that?

@muffix
Copy link
Member

muffix commented May 10, 2020

I'm suspecting it might have to do with the HTTP client and the maximum nymber of idle connections that it allows. 🤔 The default seems to be 2:

const DefaultMaxIdleConnsPerHost = 2
https://golang.org/pkg/net/http/#pkg-constants

We could try increasing MaxIdleConnsPerHost, i.e. something like this:
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = higher_value, but I'll need to look at the implications and whether that would actually fix your problem.

@langerma
Copy link
Author

okay....so if you have anything to test i would push it to my infra :-)

@stale
Copy link

stale bot commented May 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label May 6, 2021
@stale stale bot closed this as completed Jun 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants