New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logging around gateway shard allocation #9562
Conversation
@@ -79,7 +81,7 @@ public LocalGatewayAllocator(Settings settings, | |||
this.listGatewayStartedShards = listGatewayStartedShards; | |||
this.listShardStoreMetaData = listShardStoreMetaData; | |||
|
|||
this.listTimeout = componentSettings.getAsTime("list_timeout", TimeValue.timeValueSeconds(30)); | |||
this.listTimeout = componentSettings.getAsTime("list_timeout", TimeValue.timeValueSeconds(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grr. fixing.
This commit adds more logs around the gateway shard allocation. Any errors while reaching out to nodes to list the local shards are logged in `WARN`. Shard info loading time is logged under DEBUG. Also, we log a `WARN` message if an exception forces a full checksum check during reading the store metadata
a5c6771
to
4673de9
Compare
@@ -203,7 +203,7 @@ private void onFailure(int idx, String nodeId, Throwable t) { | |||
logger.debug("failed to execute on node [{}]", t, nodeId); | |||
} | |||
if (accumulateExceptions()) { | |||
responses.set(idx, new FailedNodeException(nodeId, "Failed node [" + nodeId + "]", t)); | |||
responses.set(idx, new FailedNodeException(nodeId, "Failed node [" + nodeId + "][" + t.getMessage() + "]", t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use ExceptionsHelper#detailedMessage
to now miss when this get wrapped in inner exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal here was to keep it brief. We already get a full trace in the log message above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Am concerned that we will miss the actual message because its wrapped, my vote is the detailed message one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case we change the log message on logListActionFailures to log the entire exception. It's the difference between detailedMessage and just a message. But putting the detail in the message the choice is gone. Do agree with solving this on the logging side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, in that case, let's not add the addition message, and solve it on the logging side m
@kimchy pushed another commit |
@@ -421,6 +413,25 @@ public boolean apply(DiscoveryNode node) { | |||
return shardStates; | |||
} | |||
|
|||
private void logListActionFailures(MutableShardRouting shard, String actionType, FailedNodeException[] failures) { | |||
if (failures == null || failures.length == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we require this to be non-null and just iterate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check. Tried to be defensive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK. removed the check.
left some minor things otherwise looks good to me |
} | ||
if (totalWarnFailures > 0) { | ||
logger.warn("{}: [{}] failures when trying to list shard {} on nodes.", shard.shardId(), totalWarnFailures, actionType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this total message is not needed, since the previous ones are in WARN, so we have an indication already, we just repeat it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
left more comments |
@s1monw @kimchy pushed another commit. @kimchy I updated the code based on my suggestion. An error now looks as follows. Let me know if it doesn't work/need to be extended:
|
@s1monw the last commit I pushed add special handling in the store for IndexNotFoundException (https://github.com/elasticsearch/elasticsearch/pull/9562/files#diff-cdc68fb92150d348c209021260e9c50dR626 ) . It happens when we start a replica on an empty folder (the exception is caught and ignored higher up). Can you double check me on that? |
The |
@s1monw I don't catch it, but rethrow it without doing |
you are right! sorry LGTM |
This commit adds more logs around the gateway shard allocation. Any errors while reaching out to nodes to list the local shards are logged in `WARN`. Shard info loading time is logged under DEBUG. Also, we log a `WARN` message if an exception forces a full checksum check during reading the store metadata Closes #9562
This commit adds more logs around the gateway shard allocation. Any errors while reaching out to nodes to list the local shards are logged in `WARN`. Shard info loading time is logged under DEBUG. Also, we log a `WARN` message if an exception forces a full checksum check during reading the store metadata Closes #9562
} finally { | ||
TimeValue took = new TimeValue(System.currentTimeMillis() - startTime); | ||
if (exists) { | ||
logger.debug("loaded store meta data for {} (took [{}])", shardId, took); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this log message, and the next one, we are not consistent with our logging messages, when we log something in the context of a shard, its the first thing in the log line, can we please fix it and do: logger.debug("{} loaded store meta data for, took [{}]", shardId, took);
@bleskes sorry for the late comment, but it would be great if we can be consistent with the way we log |
@kimchy no need to be sorry. It's a valid point. |
return new StoreFilesMetaData(true, shardId, store.getMetadataOrEmpty().asMap()); | ||
} finally { | ||
store.decRef(); | ||
logger.trace("listing store meta data for {}", shardId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here as well :)
This commit adds more logs around the gateway shard allocation. Any errors while reaching out to nodes to list the local shards are logged in `WARN`. Shard info loading time is logged under DEBUG. Also, we log a `WARN` message if an exception forces a full checksum check during reading the store metadata Closes #9562
This commit adds more logs around the gateway shard allocation. Any errors while reaching out to nodes to list the local shards are logged in `WARN`. Shard info loading time is logged under DEBUG. Also, we log a `WARN` message if an exception forces a full checksum check during reading the store metadata Closes elastic#9562
This commit adds more logs around the gateway shard allocation. Any errors while reaching out to nodes to list the local shards are logged in `WARN`. Shard info loading time is logged under DEBUG. Also, we log a `WARN` message if an exception forces a full checksum check during reading the store metadata Closes elastic#9562
This commit adds more logs around the gateway shard allocation. Any errors while reaching out to nodes to list the local shards are logged in
WARN
. Shard info loading time is logged under DEBUG. Also, we log aWARN
message if an exception forces a full checksum check during reading the store metadata.Examples: