Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging of slow cluster state tasks #10907

Merged
merged 1 commit into from May 1, 2015

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Apr 30, 2015

Closes #10874

Logs how long each cluster state update task took on the debug level. If a task took longer than 30sec (dynamically configurable using cluster.service.slow_task_logging_threshold) this task is logged on the WARN level.

@kimchy
Copy link
Member

kimchy commented Apr 30, 2015

@imotov can we log the times not as ms but as TimeValue? Thats how we do it in other places

@@ -523,6 +538,12 @@ public void run() {
}
}

private void warnAboutSlowTaskIfNeeded(long executionTime, String source) {
if (executionTime >= slowTaskLoggingThreshold.getMillis()) {
logger.warn("cluster state update task [{}] took too long: {}ms", source, executionTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we log what the limit here? took X above the warn threshold of Y? Also, I wonder if this should be warn compared to info, don't have strong feelings, but it might be ok that it takes more than 10 seconds sometimes, depending on the cluster state task (like the initial gateway reach to discover which shards are around)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimchy it's 30 seconds by default and can be changed dynamically. So we can either 1) increase the default value to something that we feel comfortable with or 2) make it logged on info level. Not sure if the second option makes a lot of sense since we are already logging timing on the debug level. So, I thought the whole idea to log long running tasks was to "warn" users about a potential issue.

@imotov
Copy link
Contributor Author

imotov commented Apr 30, 2015

@kimchy I pushed a new version that logs time as TimeValue, changes the wording in the warning and fixes a race condition in tests that I discovered while working on these changes.

logger.debug("processing [{}]: no change in cluster_state", source);
TimeValue executionTime = TimeValue.timeValueMillis(Math.max(0, System.currentTimeMillis() - startTime));
logger.debug("processing [{}]: took {} no change in cluster_state", source, executionTime);
warnAboutSlowTaskIfNeeded(executionTime, source);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the other block, we count the time after calling onAllNodesAck and clusterStateProcessed, maybe we should do the same here?

@kimchy
Copy link
Member

kimchy commented Apr 30, 2015

lets another small comment, otherwise LGTM

@imotov imotov force-pushed the issue-10874-log-slow-cluster-state branch from 3fdcba3 to c165afb Compare May 1, 2015 00:15
@imotov imotov merged commit c165afb into elastic:master May 1, 2015
@kevinkluge kevinkluge removed the review label May 1, 2015
@clintongormley
Copy link

@imotov could you add a description of the changes to this PR please?

@imotov
Copy link
Contributor Author

imotov commented May 4, 2015

@clintongormley updated the description in the first comment.

@clintongormley clintongormley changed the title Logging: Add logging of slow cluster state tasks Add logging of slow cluster state tasks May 30, 2015
@imotov imotov deleted the issue-10874-log-slow-cluster-state branch May 1, 2020 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log slow cluster state tasks
4 participants