-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal correctly with errors when archiving logfiles #27
Conversation
Before logfiles are rotated (i.e., removed) a series of processing tasks are performed onto them. Among these, the first one is to archive the logfile into the NGAS server itself, if the server has been configured for this. During normal operations this is not a problem, but exactly on the first try, when the janitor process has just been created and the HTTP server might not be bound yet, it might result on an ECONNREFUSED error. This produced big error messages on the logs, while in reality this is a transient error that should disappear on the next try. A second, more general problem, was found while inspecting this code: logfiles were not renamed back to have their original ".unsaved" extensions when errors happened. This meant that when errors in general were found (and in particular when ECONNREFUSED was raised) logfiles were not picked up by successive janitor cycles. This commit acknowledges these problems, improving the handling of the ECONNREFUSED error in particular, and of errors in general. On the on hand, when the ECONNREFUSED error is encountered we simply issue a warning log statement instead of letting the exception to propagate up through the stack. On the other hand, if *any* error happens during archiving we rename the file back to its original *.unsaved name so it gets picked up again in the next janitor cycle. To make code and error handling a bit simpler I took the chance of moving the archiving of files into a separate try_archiving() function, whose invocation is then surrounded by the error handling block. Additionally I also added a sorted() call to process unsaved logfiles in time order, which until now wasn't guaranteed (and is a nice property to have). This commit addresses #26. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
mvFile(unsaved, fname) | ||
|
||
# Connect to the server and send a pull ARCHIVE request | ||
if cfg.getArchiveRotatedLogfiles(): | ||
file_uri = "file://" + fname | ||
host, port = srvObj.get_self_endpoint() | ||
proto = srvObj.get_server_access_proto() | ||
ngamsPClient.ngamsPClient(host, port, proto=proto).archive( | ||
file_uri, 'ngas/nglog') | ||
try: | ||
mvFile(unsaved, fname) | ||
try_archiving(cfg, srvObj, fname) | ||
except Exception as e: | ||
mvFile(fname, unsaved) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our conversation, is it necessary to first mvFile
then on a possible exception restore? Is it more worthwhile to mvFile
only after the successful archive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I absolutely agree that the first move is not required in principle, and would make exception handling simpler. However, the ngamsPClient.archive
method doesn't allow to pass down arbitrary filenames, and will always use the basename of the file on disk that is being archived.
As discussed, I'll just go ahead then with these changes, but will create an issue to remember implementing this improvement in the future.
Before logfiles are rotated (i.e., removed) a series of processing tasks are performed onto them. Among these, the first one is to archive the logfile into the NGAS server itself, if the server has been configured for this. During normal operations this is not a problem, but exactly on the first try, when the janitor process has just been created and the HTTP server might not be bound yet, it might result on an ECONNREFUSED error. This produced big error messages on the logs, while in reality this is a transient error that should disappear on the next try.
A second, more general problem, was found while inspecting this code: logfiles were not renamed back to have their original ".unsaved" extensions when errors happened. This meant that when errors in general were found (and in particular when ECONNREFUSED was raised) logfiles were not picked up by successive janitor cycles.
This commit acknowledges these problems, improving the handling of the ECONNREFUSED error in particular, and of errors in general. On the on hand, when the ECONNREFUSED error is encountered we simply issue a warning log statement instead of letting the exception to propagate up through the stack. On the other hand, if any error happens during archiving we rename the file back to its original *.unsaved name so it gets picked up again in the next janitor cycle.
To make code and error handling a bit simpler I took the chance of moving the archiving of files into a separate try_archiving() function, whose invocation is then surrounded by the error handling block. Additionally I also added a sorted() call to process unsaved logfiles in time order, which until now wasn't guaranteed (and is a nice property to have).
This addresses #26.