A simple syslog-to-elasticsearch bridge with pluggable processing.
Calf expects a log format in the form of:
DATE(RFC3339) logsource program[pid]: message
In Rsyslog, this can be achieved with the
Calf requires Python 3.4 or greater.
pip install calf
--cluster: path to the ElasticSearch cluster. Default:
--path: path to syslog file to tail. Default:
--processors: list of processors for building a log processing pipeline. Can be from the built-in processors, or custom processors available in your Python path.
json: tries to decode the message as JSON.
Writing custom processors
Each processor is a Python function that takes two argument: an
and the raw log message.
The processor must return the event dict or
returned by a processor, the event is dropped.
def my_processor(event_dict, log_line): return event_dict
Once your processors are written, pass them to
calf by using the dotted
path to your functions. E.g. assuming the processor above was written to
calf --processors custom_processors.my_processor
Registering all processors at once
It is also possible to point to a list of processors to avoid the tedious process of passing all your processors in the command line. Simply construct a list at the end of your custom processors definition:
all_processors = [ my_processor, my_other_processor, ]
And pass this list to
calf --processors custom_processors.all_processors
Base event data
If you plan to do some heavier processing for specific messages (e.g. HTTP logs), it can be useful to look at the base event data to avoid expensive processing on irrelevant logs. The following attributes are available:
source_host: the host on which
program: the originating program
message: the actual message after syslog prefixes
_type: by default
relp. You can set it to something else if that eases processing.
logsource: host from which the log message was issued
logsource2(not always present): sometimes, extended version of
pid(not always present): the program PID, when available.
For instance, to parse HTTP logs:
def http_parser(event_dict, _): if event_dict['program'] != 'nginx': return event_dict else: # parse log & return a richer event_dict event_dict['_type'] = 'nginx'
If you add more data to a class of events (e.g. by program), it is recommended
to alter the event's
_type to something unique for this events class.
Elasticsearch infers its mappings for each types and this avoids type
conflicts (e.g. if a field name
transaction_id is either a
long or a
string for different event types).