Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic compression of backup log files #619

Closed
wants to merge 5 commits into from
Closed

Automatic compression of backup log files #619

wants to merge 5 commits into from

Conversation

noxdafox
Copy link

This attemts to fix issue #322

I included tests and documentation. Please let me know if anything else is required.

@noxdafox
Copy link
Author

Sorry for the extra commits but I don't have all Python versions on my dev env.


def handle_compression(sfn, dfn):
with tarfile.open(dfn, "w:gz") as archive:
archive.add(sfn, arcname=os.path.basename(sfn))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This single operation can potentially block supervisord for many seconds, depending on the size of the log file being compressed, the system it's running on, etc. supervisord is an asynchronous program. We can't block for long periods like this in supervisord because it will become unresponsive, and may destabilize programs running under it since it can't read their pipes if it's blocked.

I don't see an easy solution to this given the way logging is currently implemented. I think it may require compressing the log in pieces while returning control the event loop, or coming up with a way to run the compressor in an eventlistener subprocess.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't consider that as I'm not 100% familiar with Supervisor's architecture.

Sure a subprocess could easily handle the the compression without blocking the daemon. I can provide such implementation if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be possible to do this inside supervisord in a non-blocking way but I'm not sure how at the moment. Doing it in an eventlistener subprocess may be more straightforward and has no risk of blocking supervisord during compression. If you haven't written an eventlistener before, there's some examples in the Superlance package.

An eventlistener can get info about all processes using the getAllProcessInfo() method of the XML-RPC interface, and this includes the path to the current log files of each process.

The eventlistener wouldn't have a way to know when log rotation occurred. This could be addressed by adding a new event to supervisord to indicate when a rotation has occurred. It may not be necessary to do this, the eventlistener could just glob every so often on a TICK_ event. If a logfile.2 exists but no logfile.1.gz, time to gzip.

@noxdafox
Copy link
Author

noxdafox commented Jun 8, 2015

I have been thinking about the proposed alternative with eventlistener subprocess and it seems a bit overengineered to me.

If we don't want to block supervisor with CPU bound jobs the easiest way to do so is with a separate Python process and a Queue. The daemon should simply take care of correctly terminate the process once stopped. Once we rotate, is enough to put the file path onto the queue and the dedicated process will take care of the compression.

To detect rare cases in which the amount of spammed log is so much the compression process cannot cope with, we simply have to poll the Queue state. If greater than, say 10 * logfile_backups, we can print warnings on the daemon log allowing the owner to spot the issue.

@mnaberez
Copy link
Member

mnaberez commented Jun 8, 2015

If we don't want to block supervisor with CPU bound jobs the easiest way to do so is with a separate Python process and a Queue. The daemon should simply take care of correctly terminate the process once stopped.

There are many things that can go wrong with running a subprocess. Making sure it terminates is one of them. It also might not start, or might die unexpectedly while it's running, and all sorts of other things. The eventlistener may seem "overengineered" but it allows us to handle all these conditions. If we spawn subprocesses in another way, then we have a parallel system where we have to deal with all these issues again. Supervisor users are accustomed to running eventlisteners to add features since many are running things like memmon already.

@noxdafox
Copy link
Author

noxdafox commented Jun 9, 2015

This is indeed true, I'll proceed as agreed then.

@mnaberez
Copy link
Member

mnaberez commented Aug 4, 2015

Closing for the reasons explained above.

@mnaberez mnaberez closed this Aug 4, 2015
@noxdafox
Copy link
Author

I started developing the event based plugin and I'm half way through but some questions came into my mind.

ATM I'm implementing it under superlance as a plugin but there are few things which I don't like.

The Supervisord is in charge of controlling the rotation. It would notify the happening with a "LOG_ROTATION" event with as extra parameter the path of the old logfile. The superlance plugin would react to it and compress the logfile.

Now the problem comes when it's time to delete the old logfiles. The Supervisor logger module is not aware (and it shouldn't) of any plugin. Yet the file has been renamed and its deletion would fail.

The question is which system would be responsible of rotating, compressing and deleting the log files.

Possible solutions would be:

  • do not rename the compressed files. I don't like it as it would be misleading for the end users
  • notify back supervisor with the new filename. I don't like this solution either as it would be quite messy from an architectural point of view (a logger doing event-based communication?).
  • integrate the plugin within supervisor itself so the logger would be in control. In such case I need more information on how to correctly do that.

@mnaberez
Copy link
Member

ATM I'm implementing it under superlance as a plugin but there are few things which I don't like.

Not all plugins need to be included in Superlance (most aren't). We maintain a list of plugins and there are other logging related plugins in that list. New plugins are added to the list by making a pull request list source.

@noxdafox
Copy link
Author

The problem is not whether to put the plugin in Superlance or in an indipendent project.

The problem is that as the code is within an external code-base, Supervisor cannot see that the logs it's rotating are changing name and format. Therefore deleting them will fail.

And, in my opinion, Supervisor should not behave differently according to which plugins are in use or not. That's one of the advantages of event driven architectures. It allows to de-couple the modules.

@mnaberez
Copy link
Member

I think the way people are handling this currently is to configure Supervisor with the regular non-rotating logger. A program like logrotate is used and sends SIGUSR2 to Supervisor to signal it to reopen the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

2 participants