-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic compression of backup log files #619
Conversation
Sorry for the extra commits but I don't have all Python versions on my dev env. |
|
||
def handle_compression(sfn, dfn): | ||
with tarfile.open(dfn, "w:gz") as archive: | ||
archive.add(sfn, arcname=os.path.basename(sfn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This single operation can potentially block supervisord
for many seconds, depending on the size of the log file being compressed, the system it's running on, etc. supervisord
is an asynchronous program. We can't block for long periods like this in supervisord
because it will become unresponsive, and may destabilize programs running under it since it can't read their pipes if it's blocked.
I don't see an easy solution to this given the way logging is currently implemented. I think it may require compressing the log in pieces while returning control the event loop, or coming up with a way to run the compressor in an eventlistener subprocess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't consider that as I'm not 100% familiar with Supervisor's architecture.
Sure a subprocess could easily handle the the compression without blocking the daemon. I can provide such implementation if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be possible to do this inside supervisord
in a non-blocking way but I'm not sure how at the moment. Doing it in an eventlistener subprocess may be more straightforward and has no risk of blocking supervisord
during compression. If you haven't written an eventlistener before, there's some examples in the Superlance package.
An eventlistener can get info about all processes using the getAllProcessInfo()
method of the XML-RPC interface, and this includes the path to the current log files of each process.
The eventlistener wouldn't have a way to know when log rotation occurred. This could be addressed by adding a new event to supervisord
to indicate when a rotation has occurred. It may not be necessary to do this, the eventlistener could just glob every so often on a TICK_
event. If a logfile.2
exists but no logfile.1.gz
, time to gzip.
I have been thinking about the proposed alternative with eventlistener subprocess and it seems a bit overengineered to me. If we don't want to block supervisor with CPU bound jobs the easiest way to do so is with a separate Python process and a Queue. The daemon should simply take care of correctly terminate the process once stopped. Once we rotate, is enough to put the file path onto the queue and the dedicated process will take care of the compression. To detect rare cases in which the amount of spammed log is so much the compression process cannot cope with, we simply have to poll the Queue state. If greater than, say 10 * logfile_backups, we can print warnings on the daemon log allowing the owner to spot the issue. |
There are many things that can go wrong with running a subprocess. Making sure it terminates is one of them. It also might not start, or might die unexpectedly while it's running, and all sorts of other things. The eventlistener may seem "overengineered" but it allows us to handle all these conditions. If we spawn subprocesses in another way, then we have a parallel system where we have to deal with all these issues again. Supervisor users are accustomed to running eventlisteners to add features since many are running things like memmon already. |
This is indeed true, I'll proceed as agreed then. |
Closing for the reasons explained above. |
I started developing the event based plugin and I'm half way through but some questions came into my mind. ATM I'm implementing it under superlance as a plugin but there are few things which I don't like. The Supervisord is in charge of controlling the rotation. It would notify the happening with a "LOG_ROTATION" event with as extra parameter the path of the old logfile. The superlance plugin would react to it and compress the logfile. Now the problem comes when it's time to delete the old logfiles. The Supervisor logger module is not aware (and it shouldn't) of any plugin. Yet the file has been renamed and its deletion would fail. The question is which system would be responsible of rotating, compressing and deleting the log files. Possible solutions would be:
|
Not all plugins need to be included in Superlance (most aren't). We maintain a list of plugins and there are other logging related plugins in that list. New plugins are added to the list by making a pull request list source. |
The problem is not whether to put the plugin in Superlance or in an indipendent project. The problem is that as the code is within an external code-base, Supervisor cannot see that the logs it's rotating are changing name and format. Therefore deleting them will fail. And, in my opinion, Supervisor should not behave differently according to which plugins are in use or not. That's one of the advantages of event driven architectures. It allows to de-couple the modules. |
I think the way people are handling this currently is to configure Supervisor with the regular non-rotating logger. A program like logrotate is used and sends SIGUSR2 to Supervisor to signal it to reopen the logs. |
This attemts to fix issue #322
I included tests and documentation. Please let me know if anything else is required.