Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Log File Rotation Error #1

Closed
scyclops opened this Issue May 24, 2013 · 7 comments

Comments

Projects
None yet
2 participants

I've been using logcabin for a few weeks and it's great but every time it tries to rotate a log file it runs into an error trying to output the fileroll event on line 75 of outputs/file.py because self.output is None. I noticed other places in the code check if self.output is set before using it but there isn't a check in file.py.

I am using the File class via a custom subclass so I'm not sure if that's affecting the behavior or not. In case it might be the culprit, I'm including my CustomFile class code below (it just modifies the File class behavior to run a custom function on each event before it's output).

class CustomFile(File):
    def __init__(self, filter=None, *args, **kwargs):
        super(CustomFile, self).__init__(*args, **kwargs)
        self.filter = filter or (lambda ev: ev.to_json())

    def process(self, event):
        filename = event.format(self.filename)
        if self.max_size and os.path.exists(filename) and os.path.getsize(filename) > self.max_size:
            self._rotate(filename, event)

        dirname = os.path.dirname(filename)
        if dirname and not os.path.exists(dirname):
            os.makedirs(dirname)

        # uses default python umask which is rw r r
        with file(filename, 'a') as fout:
            print >>fout, self.filter(event)
Contributor

barnybug commented May 24, 2013

Hi, thanks for the report. Could I check you're using the latest release? (0.1b12)

Also, could you produce a unit test case demonstrating the issue with the vanilla File output?

thanks

I am using the latest release. I've fixed this in my version by setting
self.output to an object with a put method that does nothing so that the
fileroll event output call in file.py will be a no-op.

On Fri, May 24, 2013 at 2:11 AM, Barnaby Gray notifications@github.comwrote:

Hi, thanks for the report. Could I check you're using the latest release?
(0.1b12)

Also, could you produce a unit test case demonstrating the issue with the
vanilla File output?

thanks


Reply to this email directly or view it on GitHubhttps://github.com/artirix/logcabin/issues/1#issuecomment-18391931
.

Contributor

barnybug commented May 24, 2013

Perhaps related to your configuration - could you provide this?

Sure, it's below. I'm using mostly custom classes to achieve the behavior I'm looking for.

from flow import Fanout, Switch

from input_subzeromq import SubZeromq

from filter_msgpack import MsgPack

from output_customfile import CustomFile
from output_loggly import Loggly
from output_splunk import Splunk


LOG_FILE_SIZE = 20 * 1024 * 1024


def msg_filter(ev):
    """ Remove _logfn field and handle string only logs. """

    if '_msg' in ev:
        return ev._msg

    if '_logfn' in ev:
        del ev._logfn

    return ev.to_json()


# input creates events with a data field contaiting the data received
# using a SUB socket so that messages are dropped instead of blocking
# when in exceptional states (eg. hwm reached)
SubZeromq(address='tcp://127.0.0.1:2271',
          sync_address='tcp://127.0.0.1:5562')

# parses msgpack string from the default event data field that the
# Zeromq input sets - removes the data field and adds all the fields
# from the msgpack encoded dict to the event
MsgPack(field='data', consume=True)

# output events in parallel
with Fanout():
    with Switch() as case:
        with case('_logfn != None and _logfn'):
            # logs to logfile specified in the event
            CustomFile(filename='/var/log/logcabin/{_logfn}.log',
                       max_size=LOG_FILE_SIZE,
                       compress='gz',
                       filter=msg_filter)

        with case.default:
            CustomFile(filename='/var/log/logcabin/default.log',
                       max_size=LOG_FILE_SIZE,
                       compress='gz',
                       filter=msg_filter)

    Loggly(endpoint="https://logs.loggly.com/inputs/xyz")

    Splunk(project_id='123',
           access_token='abc')
Contributor

barnybug commented Jun 1, 2013

I can only guess it may be related to some of the custom code you're using. If you could produce an example reproducing with the built-ins, then I can look into it more.

Contributor

barnybug commented Jun 1, 2013

Also, it'd be great if you feel keen for a patch for Loggly/Splunk functionality to incorporate into master.

scyclops commented Jun 3, 2013

I've done some monkey-patching in my custom code to handle the error I was running into (by setting self.output to a fake object so it's not None).

I don't have time to clean up and make my output classes suitable for general consumption but below is the existing code if anyone else wants to incorporate it. There are a couple things specific to my usage of logcabin (the special _logfn field for example).

output_splunk.py

from logcabin.outputs.output import Output

import gevent
import gevent.monkey
gevent.monkey.patch_socket()

import requests
from ujson import dumps, loads
import platform

from requests.packages.urllib3 import connectionpool


# prevent urlllib3 logs from polluting the log output
class FakeLog(object):
    def info(*args, **kwargs):
        pass
    warning = info
    warn = info
    debug = info
    error = info
    exception = info

connectionpool.log = FakeLog()


# TODO: check if splunk supports gzipped posts and submit
#       logs compressed for speed if it does

class Splunk(Output):
    RETRIES = 10

    def __init__(self, project_id, access_token):
        super(Splunk, self).__init__()

        self.url = 'https://api.splunkstorm.com/1/inputs/http'
        self.project_id = project_id
        self.access_token = access_token

        self.host = platform.node()
        self.sourcetype = 'logcabin'

    def process(self, event):
        data = dumps(event)

        params = {'project': self.project_id,
                  'sourcetype': self.sourcetype}

        if self.host:
            params['host'] = self.host

        if '_logfn' in event:
            # not popping the event _logfn b/c i'm not sure
            # if i'm allowed to mutate it at this stage
            params['source'] = event['_logfn']

        delay = 1.0
        for retry in xrange(self.RETRIES):
            try:
                r = requests.post(self.url, params=params, data=data,
                                  auth=('x', self.access_token))
                r.raise_for_status()
                d = loads(r.text)

                if d['bytes'] > 0:  # == len(data):
                    # sometimes returns different length than expected..?
                    #self.logger.warn('splunk success returned: %s' % d)
                    break

                self.logger.warn('unexpected splunk return: %s' % repr(d))

            except Exception as ex:
                delay *= 2.0
                self.logger.warn('Splunk log error: %s, retrying in %ds' %
                                 (ex, delay))
                gevent.sleep(delay)

output_loggly.py

from logcabin.outputs.output import Output

import gevent
import gevent.monkey
gevent.monkey.patch_socket()

import urllib2
from ujson import dumps, loads


# TODO: check if loggly supports gzipped posts and submit
#       logs compressed for speed if it does

class Loggly(Output):
    RETRIES = 10

    def __init__(self, endpoint):
        super(Loggly, self).__init__()
        self.endpoint = endpoint

    def process(self, event):
        data = dumps(event)

        delay = 1.0
        for retry in xrange(self.RETRIES):
            try:
                res = urllib2.urlopen(self.endpoint, data=data)
                d = loads(res.read())

                if d['response'] == 'ok':
                    break

                self.logger.warn('unknown loggly return: %s' % repr(d))

            except Exception as ex:
                delay *= 2.0
                self.logger.warn('Loggly log error: %s, retrying in %ds' %
                                 (ex, delay))
                gevent.sleep(delay)

@scyclops scyclops closed this Jun 3, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment