Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushgateway Read timed out #60

Closed
Drewster727 opened this issue Apr 13, 2020 · 2 comments
Closed

Pushgateway Read timed out #60

Drewster727 opened this issue Apr 13, 2020 · 2 comments

Comments

@Drewster727
Copy link

Drewster727 commented Apr 13, 2020

I'm experiencing an odd issue where my spark workers will randomly begin reporting that it cannot connect to my push gateway.

2020-04-13 11:41:55 ERROR ScheduledReporter:184 - Exception thrown from Reporter#report. Exception was suppressed.
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
	at io.prometheus.client.exporter.PushGateway.doRequest(PushGateway.java:315)
	at io.prometheus.client.exporter.PushGateway.pushAdd(PushGateway.java:182)
	at com.banzaicloud.spark.metrics.sink.PrometheusSink$Reporter.report(PrometheusSink.scala:98)
	at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:242)
	at com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:182)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I have verified the pushgateway is up and running and I can connect to it without issue.
However, I did notice that my pushgateway piles up in memory usage. The only piece that is sending metrics are my spark workers via this package/library.

I thought perhaps it could be a pushgateway issue, but then I found this issue on the pushgateway repo:
prometheus/pushgateway#340

That seems to indicate that something is pushing metrics into the gateway (this lib) and is not disposing of the connection properly?

Any assistance would greatly be appreciated. The error is not blocking my workers but it is very annoying causing logs to get spammed and instability in the pushgateway.

jars+versions

collector-0.12.0.jar
metrics-core-4.1.2.jar
simpleclient-0.8.1.jar
simpleclient_common-0.8.1.jar
simpleclient_dropwizard-0.8.1.jar
simpleclient_pushgateway-0.8.1.jar
snakeyaml-1.16.jar
spark-metrics_2.11-2.3-3.0.1.jar

Thanks!

@stoader
Copy link
Member

stoader commented Apr 14, 2020

spark-metrics pushes metrics to Pushgateway using the pushgateway client library: https://github.com/banzaicloud/spark-metrics/blob/2.3-3.0.1/src/main/scala/com/banzaicloud/spark/metrics/sink/PrometheusSink.scala#L98 --> https://github.com/prometheus/client_java/blob/parent-0.8.1/simpleclient_pushgateway/src/main/java/io/prometheus/client/exporter/PushGateway.java#L181

If there is any connection leak it must be in the pushgateway client lib, however looking at the source code the client lib always disconnects when returns: https://github.com/prometheus/client_java/blob/parent-0.8.1/simpleclient_pushgateway/src/main/java/io/prometheus/client/exporter/PushGateway.java#L328

The increased memory usage of your Pushgateway instance might be caused by https://www.robustperception.io/common-pitfalls-when-using-the-pushgateway which can be avoided through the use of custom group keys: #46 which do not include the instance field.

cc @sancyx @baluchicken

@Drewster727
Copy link
Author

Not sure what was causing this, but disabling consistency checks per prometheus/pushgateway#340 resolved my issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants