KAFKA-4134: log ConnectException at WARN #1829

cotedm · 2016-09-06T19:18:25Z

Simply log the connection refused instance. If we're worried about spamming users, I can add a flag to make sure we only log this exception once, but the initial change is to simply log what we're given. @ijuma looks like you were last to touch this code, would you mind having a look?

ewencp · 2016-11-29T17:52:28Z

@cotedm This is still potentially very spammy, right? Default reconnect backoff is 50ms which means you'll get 20 of these (with stacktrace) per second that you have a connectivity issue.

We'd probably benefit from some helper that can rate limit certain log calls without permanently turning them off after the first time one is logged. Otherwise you can be missing information at a much later time due to a very short transient network issue at some completely unrelated time.

cotedm · 2016-11-30T20:36:24Z

@ewencp I should think ConnectException would only happen if the remote process stopped listening, but you'd still be spamming users in that case. Let me add a general wrapper with a cooldown for log messages and use it here. Then if we find other spammy things, we can reuse it hopefully.

asfbot · 2016-12-12T15:56:57Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/75/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2016-12-12T15:57:31Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/76/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2016-12-12T15:57:33Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/77/
Test FAILed (JDK 8 and Scala 2.11).

cotedm · 2016-12-12T16:02:19Z

@ewencp I've added a LogRateLimiter utility with the method I would need as a wrapper. I think it's probably useful to have it for other scenarios so I kept it in commons/util, but if that doesn't make sense please let me know. I also hardcoded 1000 messages as the number we'll skip since I don't see a reason to make it configurable, but it might make sense to alert the user that the log message has been suppressed, what do you think?

ewencp · 2016-12-20T05:16:04Z

clients/src/main/java/org/apache/kafka/common/utils/LogRateLimiter.java

+     * @param max maximum number of times we try to log the message before suggesting a count reset
+     * @return boolean indicating if the count should be reset or not
+     */
+    public static boolean warn(Logger logger, String format, Object arg1, Object arg2,


You might want to swap the order of the logger args and the count/max args. There are multiple overloads in slf4j and this is only one of them. There's also (String), (String, Object), (String, Object...), and (String, Throwable). The (String, Object...) version is actually the one you want to use as the most general version, the others exist purely as an optimization.

ewencp · 2016-12-20T05:24:06Z

clients/src/main/java/org/apache/kafka/common/network/Selector.java

+                if (e instanceof ConnectException) {
+                    if (LogRateLimiter.warn(log, "Cannot connect to {}", desc, e,
+                            loggerCount, loggerCountMax))
+                        loggerCount++;


Is this doing what you expect it to? You're returning true from the LogRateLimiter when the count > max. But that won't be true early on, which means you'll keep resetting the loggerCount to 0. Won't this just keep the loggerCount at 0 permanently?

Also, not sure if we can make this easy, but it'd really be ideal if we could keep the logging line as a single line of code without requiring additional logic by the caller. (Maybe even if this requires allocating a special object to do that.)

Also see https://github.com/Swrve/rate-limited-logger

ewencp · 2016-12-20T05:26:37Z

clients/src/main/java/org/apache/kafka/common/utils/LogRateLimiter.java

+ */
+public class LogRateLimiter {
+
+    /**


Would it be worth having the rate limiter have 2 levels: the first level logs every error and the second logs every nth error?

My concern is that simply cutting off logs entirely once we hit a certain # of messages can mask later messages. You don't want to mask those entirely, you just want to cut them off. I think any approach using timestamps is probably going to get too complicated. But I think still printing every 1000th message would be useful so you eventually see the problem.

log ConnectException at WARN

4ad6990

adding a log rate limiter utility

2f87d84

ewencp reviewed Dec 20, 2016

View reviewed changes

ewencp added the connect label Feb 28, 2018

cotedm closed this Dec 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-4134: log ConnectException at WARN #1829

KAFKA-4134: log ConnectException at WARN #1829

cotedm commented Sep 6, 2016

ewencp commented Nov 29, 2016

cotedm commented Nov 30, 2016

asfbot commented Dec 12, 2016

asfbot commented Dec 12, 2016

asfbot commented Dec 12, 2016

cotedm commented Dec 12, 2016

ewencp Dec 20, 2016

ewencp Dec 20, 2016

ewencp Dec 20, 2016

ijuma Dec 20, 2016

ewencp Dec 20, 2016

KAFKA-4134: log ConnectException at WARN #1829

KAFKA-4134: log ConnectException at WARN #1829

Conversation

cotedm commented Sep 6, 2016

ewencp commented Nov 29, 2016

cotedm commented Nov 30, 2016

asfbot commented Dec 12, 2016

asfbot commented Dec 12, 2016

asfbot commented Dec 12, 2016

cotedm commented Dec 12, 2016

ewencp Dec 20, 2016

Choose a reason for hiding this comment

ewencp Dec 20, 2016

Choose a reason for hiding this comment

ewencp Dec 20, 2016

Choose a reason for hiding this comment

ijuma Dec 20, 2016

Choose a reason for hiding this comment

ewencp Dec 20, 2016

Choose a reason for hiding this comment