Email find first error in logs #979

Merged
merged 12 commits into from Apr 6, 2016

Conversation

Projects
None yet
3 participants
@Calvinp
Contributor

Calvinp commented Apr 1, 2016

This improves upon the previously existing logic to show the bottom few lines of a file when sending the task complete email.
Now, instead, the first error will be searched out. What is shown is the number of bytes we are allowed to show starting at where the error is found.
It is not necessarily easy to define what an error is, so this choice is left up to the user using a tiered system of regexes:

  • If the user is lazy and doesn't specify anything anywhere, a default regex of "ERROR" is used. I'm open to discussion on exactly what this default regex should be.
  • The user may override this default by providing a regex named taskLogErrorRegex in the smtp config.
  • The user may override the smtp config in an individual request (very useful if that request will be running in a different language than most, with different logging conventions - but also useful for any number of other reasons that I can't even guess at)

If no error is found in the logs (which would happen if the task is killed by singularity, for example, or if it completed successfully) then the bottom of the log file is shown instead.

The motivation for choosing "ERROR" as the default regex is that in many common Java logging schemes, logging at the error level will produce a line starting with "ERROR". It is probably best to keep the regex as simple as possible to avoid situations where the word error or exception is logged for some expected reason (eg. "Set up exception handling"), then found instead of an actual error. Therefore using "ERROR" will catch as many errors as possible without catching too many things that aren't errors.

Calvinp added some commits Mar 17, 2016

Emails tail from the bottom of the file, unless there's an error in t…
…he file. In that case they tail from the first error.

However, the regex that is matched to determine what is and isn't an error is constant; forthcoming is the ability to specify it
Implement the ability to override default task log error regex with o…
…ne in the smtp config

Still forthcoming: Ability to override the default task log error and/or the one in smtp config with one in a request
Add the ability to specify a request-specific regex to search for. If…
… specified, this will override the deploy configuration and the global default regex
+ long offset = 0;
+ String regex = getLogErrorRegex(task);
+ long length = logLength + regex.length(); // Get extra so that we can be sure to find the error
+ Pattern pattern = Pattern.compile(regex);

This comment has been minimized.

@ssalinas

ssalinas Apr 1, 2016

Member

What happens if an invalid pattern is supplied? We should probably catch this and default to returning the tail of the log file (like old behavior)

@ssalinas

ssalinas Apr 1, 2016

Member

What happens if an invalid pattern is supplied? We should probably catch this and default to returning the tail of the log file (like old behavior)

@@ -34,6 +36,7 @@
private static final String REQUEST_LINK_FORMAT = "%s/request/%s";
private static final String LOG_LINK_FORMAT = "%s/task/%s/tail/%s";
private static final String DEFAULT_TIMESTAMP_FORMAT = "MMM dd h:mm:ss a zzz";
+ private static final String DEFAULT_TASK_LOG_ERROR_REGEX = "ERROR";

This comment has been minimized.

@ssalinas

ssalinas Apr 1, 2016

Member

Going back and forth on if the default should be to search for errors in the logs, or if that's something the operator should need to actively configure. @tpetr any thoughts there on a sensible default behavior?

@ssalinas

ssalinas Apr 1, 2016

Member

Going back and forth on if the default should be to search for errors in the logs, or if that's something the operator should need to actively configure. @tpetr any thoughts there on a sensible default behavior?

- return Optional.absent();
+ Optional<MesosFileChunkObject> previous = Optional.absent();
+
+ while (true) { //Continue until we reach the end of the file, at which point we return from within the while loop

This comment has been minimized.

@ssalinas

ssalinas Apr 1, 2016

Member

Two things for this loop:

  • In the case that the file is massive (for example, a wrongly configured service running for a few hours can spew GBs upon GBs of logs into a file), should we limit the amount of log file we will search?
  • while true always makes me a bit uneasy that we could be stuck here forever. Maybe we should check the ending offset of the log before starting the loop, so that even if something goes wrong here, we can still have a defined end and no chance of ever getting stuck here?
@ssalinas

ssalinas Apr 1, 2016

Member

Two things for this loop:

  • In the case that the file is massive (for example, a wrongly configured service running for a few hours can spew GBs upon GBs of logs into a file), should we limit the amount of log file we will search?
  • while true always makes me a bit uneasy that we could be stuck here forever. Maybe we should check the ending offset of the log before starting the loop, so that even if something goes wrong here, we can still have a defined end and no chance of ever getting stuck here?
+ long end = previous.get().getOffset() + previous.get().getData().length();
+ return Optional.of(end - logLength);
+ }
+ return Optional.absent();

This comment has been minimized.

@ssalinas

ssalinas Apr 1, 2016

Member

returning an absent here would revert us to the previous behavior of returning the beginning portion of the file (absent offset). If we can't find anything, wouldn't we still want to see the tail like we implemented in the previous PR?

@ssalinas

ssalinas Apr 1, 2016

Member

returning an absent here would revert us to the previous behavior of returning the beginning portion of the file (absent offset). If we can't find anything, wouldn't we still want to see the tail like we implemented in the previous PR?

This comment has been minimized.

@Calvinp

Calvinp Apr 4, 2016

Contributor

Unless I missed something, this absent will only be returned if the log file is empty. Otherwise previous will be present and it'll return end-logLength, which will be a pointer to the end of the file.

@Calvinp

Calvinp Apr 4, 2016

Contributor

Unless I missed something, this absent will only be returned if the log file is empty. Otherwise previous will be present and it'll return end-logLength, which will be a pointer to the end of the file.

+ private Optional<Long> getMaybeOffset(final String slaveHostname, final String fullPath, final Long logLength, Optional<SingularityTask> task) {
+ long offset = 0;
+ String regex = getLogErrorRegex(task);
+ long length = logLength + regex.length(); // Get extra so that we can be sure to find the error

This comment has been minimized.

@ssalinas

ssalinas Apr 1, 2016

Member

if we are looping through the file why do we need this extra length?

@ssalinas

ssalinas Apr 1, 2016

Member

if we are looping through the file why do we need this extra length?

This comment has been minimized.

@Calvinp

Calvinp Apr 4, 2016

Contributor

If we fetch just the amount we're supposed to, then it's possible that the match will be half in one fetched chunk and half in the next, in which case we wouldn't match it.
For example, we're looking for 'ERROR'. If we fetch 10 bytes, and the log file is 'xxxxxxxxERRORxxxxxxx', we would fetch 'xxxxxxxxER', not match, then fetch 'RORxxxxxxx', and still not match. Whereas with the extra length we'd fetch 'xxxxxxxxERRORxx' and correctly find the match.

@Calvinp

Calvinp Apr 4, 2016

Contributor

If we fetch just the amount we're supposed to, then it's possible that the match will be half in one fetched chunk and half in the next, in which case we wouldn't match it.
For example, we're looking for 'ERROR'. If we fetch 10 bytes, and the log file is 'xxxxxxxxERRORxxxxxxx', we would fetch 'xxxxxxxxER', not match, then fetch 'RORxxxxxxx', and still not match. Whereas with the extra length we'd fetch 'xxxxxxxxERRORxx' and correctly find the match.

Calvinp added some commits Apr 4, 2016

@@ -34,6 +37,8 @@
private static final String REQUEST_LINK_FORMAT = "%s/request/%s";
private static final String LOG_LINK_FORMAT = "%s/task/%s/tail/%s";
private static final String DEFAULT_TIMESTAMP_FORMAT = "MMM dd h:mm:ss a zzz";
+ private static final Optional<String> DEFAULT_TASK_LOG_ERROR_REGEX = Optional.of("ERROR");
+ private static final Long MAX_LOG_SEARCH_OFFSET = 1000000L;

This comment has been minimized.

@Calvinp

Calvinp Apr 4, 2016

Contributor

I am very open to ideas on the best value for this constant (the maximum number of bytes to search through for an error before giving up and showing the bottom of the file) as well. 1,000,000 bytes was completely arbitrary and isn't necessarily the best value. We can put this into the smtp config too if we would like, but we'll still need to choose a default.

@Calvinp

Calvinp Apr 4, 2016

Contributor

I am very open to ideas on the best value for this constant (the maximum number of bytes to search through for an error before giving up and showing the bottom of the file) as well. 1,000,000 bytes was completely arbitrary and isn't necessarily the best value. We can put this into the smtp config too if we would like, but we'll still need to choose a default.

This comment has been minimized.

@tpetr

tpetr Apr 4, 2016

Member

Yep, this should be configured in SMTPConfiguration. 100kb is a good default.

@tpetr

tpetr Apr 4, 2016

Member

Yep, this should be configured in SMTPConfiguration. 100kb is a good default.

@@ -70,6 +70,10 @@
@JsonProperty
private List<String> taskEmailTailFiles = Arrays.asList("stdout", "stderr");
+ @NotNull
+ @JsonProperty
+ private String taskLogErrorRegex = "";

This comment has been minimized.

@tpetr

tpetr Apr 4, 2016

Member

we should make this an Optional<String> (i.e. make this functionality opt-in)

@tpetr

tpetr Apr 4, 2016

Member

we should make this an Optional<String> (i.e. make this functionality opt-in)

@@ -34,6 +37,8 @@
private static final String REQUEST_LINK_FORMAT = "%s/request/%s";
private static final String LOG_LINK_FORMAT = "%s/task/%s/tail/%s";
private static final String DEFAULT_TIMESTAMP_FORMAT = "MMM dd h:mm:ss a zzz";
+ private static final Optional<String> DEFAULT_TASK_LOG_ERROR_REGEX = Optional.of("ERROR");

This comment has been minimized.

@tpetr

tpetr Apr 4, 2016

Member

Any defaults for configuration items should live in the configuration class (i.e. https://github.com/HubSpot/Singularity/pull/979/files#diff-c39ca42e7478ecb6adcf0bf05ac23373R75)

@tpetr

tpetr Apr 4, 2016

Member

Any defaults for configuration items should live in the configuration class (i.e. https://github.com/HubSpot/Singularity/pull/979/files#diff-c39ca42e7478ecb6adcf0bf05ac23373R75)

@@ -142,16 +160,69 @@ public MailTemplateHelpers(SandboxManager sandboxManager, SingularityConfigurati
}
}
- private Optional<Long> getMaybeOffset(String slaveHostname, String fullPath, String filename, Optional<String> directory) {
+ // Searches through the file, looking for first error
+ private Optional<Long> getMaybeOffset(final String slaveHostname, final String fullPath, final Long logLength, Optional<SingularityTask> task) {

This comment has been minimized.

@tpetr

tpetr Apr 4, 2016

Member

can you give this method a clearer name? what exactly does it do?

@tpetr

tpetr Apr 4, 2016

Member

can you give this method a clearer name? what exactly does it do?

Better method names, max offset to search through is now configurable…
…, defaults for new config items are now in SMTPConfiguration.java

@ssalinas ssalinas modified the milestone: 0.6.0 Apr 5, 2016

@Calvinp Calvinp added hs_staging and removed hs_staging labels Apr 5, 2016

Calvinp added some commits Apr 6, 2016

Add ability to decide whether task log error regex should be case sen…
…sitive or not. Defaults to not case sensitive

@Calvinp Calvinp merged commit 4c92142 into email-tail-logs-from-bottom Apr 6, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@tpetr tpetr removed the hs_staging label Apr 6, 2016

@ssalinas ssalinas deleted the email-find-first-error-in-logs branch Sep 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment