A command line log parsing/reporting tool for Apache combined format logs.
Python
Switch branches/tags
Nothing to show

README

===================
Overview
===================

Loghetti is a command-line program, written in Python, that can help you pull data from your apache logs without having to write complex regular expressions. It takes your log lines, puts them through a strainer, and leaves you with the bits you actually want -- kinda like spaghetti, but for your log files :-)

For example:

./loghetti.py --code=404 --file=access.log

will return only those lines whose HTTP response code is 404.

While there are still many features to be added, and the aspirations are much greater than this, there is already support for querying logs for query string parameters, specific dates and times, client IP addresses, HTTP methods, referrer, and most other data contained in apache's combined format logs. Here's a more complex example:

./loghetti.py --ip=192.168.1.8 --code=200 --month=1 --day=31 --hour=13 --urldata=user:foo --file=access.log

This pulls all lines generated by user "foo" (from a url parameter like "http://yourdomain.com/index.php?user=foo"), at IP address 192.168.1.8, with a response code of 200, on Jan 31 between 1-1:59PM.

===================
Status and Requirements
===================
Currently, loghetti is a beta product. It requires argparse, which is not in the standard library, but is slated for addition in the 3.x tree (so, I do plan to port this to Python 3). It comes with, and requires, a heavily modified apachelogs.py module, the original of which was written by Kevin Scott, which can be used independently for whatever other Apache log parsing needs you may have.

It has been tested with Python 2.4-2.6 on Linux and Mac OS X. It may/should work on other platforms with 2.4-2.6, and may even work with older Python versions. It requires that your logs are in the Apache 'combined' format (for the moment) :-)

===================
TODO
===================

There are tons of features I'd like to support. A couple of them are sort of stubbed out in the code already, but most are not. If you'd like to help, join the project and lend a hand!!

I'd like to see support for queries using date ranges, negated queries, queries using comparison operators other than "=" (so, "--month>10" for example), and a summary reporting option that can, for example, group 404 responses by the URL being requested, or the client IP.

Of course, it might also be nice to be able to either accept a format string of some sort to handle logs that aren't exactly in "combined" format, or accept a format name on the command line, and go get the format from the httpd.conf file. This stuff will probably be added after we get the feature set right for one format.

Of course, others may have other ideas, and I'd like to entertain those ideas as well!

I'd love this to become a tool that is something like a log reporting framework. Something I can use to very quickly, using maybe a few lines of code (or the right options to the existing code), generate statistical reports, or split up log files, or do something else I can't think of right now. Loghetti isn't a "100% no-brainer" yet, but it is very useful as-is. I just have much greater aspirations for it. Give it a go, and let me know your thoughts!

brian.