-
Notifications
You must be signed in to change notification settings - Fork 0
VBaratham/general_log_plus
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
general_log_plus is a framework for processing/expanding MySQL's general_log. With general_log_plus, one can easily add columns or process/clean/reduce or otherwise massage the data in the general_log. SELECTORS Selectors are strings which are placed in the WHERE clause of the query that pulls data from the general_log. PREFILTERS Prefilters are functions that take a row of the raw general_log and return True or False. If a prefilter returns False on a row, that row will be skipped. Prefilters can always be converted into Processors that raise SkipRowExceptions, but skipping rows earlier can be more efficient. There are several functions for generating certain prefilter functions in Prefilters.py, or you can defined them yourself with 'def' or 'lambda' (the normal ways for defining callables in Python). PROCESSORS The heart of general_log_plus, Processors are objects that have 2 attributes: 1.) inputs - A list of input column names (taken from either the raw general_log, or the output of a previous Processor) 2.) outputs - A list of output column names, which will be added onto each row (columns are not deleted from the record until the final step), unless they are overwritten. and one function process(), which takes the values of the input columns for a row and returns the computed values of the output columns. process() may raise a SkipRowException if the row being processed should be skipped (removed from the log). Processors that need to process the data in a certain order may specify this order in one of two ways: 1.) With the sql_sort_by attribute 2.) With the sort_column and sort_reverse attributes, which are passed to Python's sorted() builtin function Only one sorted Processor in a given job can use 1.), and it must be used before any other sorted Processor (sorting is performed between steps). Sorted Processors which keep track of state can be used to to achieve the following interesting things: - Naive estimation of query performance (by checking time elapsed since the previous query in the session) - Annotating log entries with the database used, by looking at 'Connect' log entries and 'USE <database>' queries Some Processor classes can be found in Processors.py JOB A Job defines the selectors, Prefilters, Processors, and final output columns (a subset of the union of output columns of each step, and the raw general_log columns). To run a job, construct a Job object and invoke its run() method. NOTE: for now, we assume tables get reduced all at once. UPDATING When more queries are run on your system (and the general_log increases in size), running the Job again will select only those queries newer than the newest query in the processed log. In order for the state of all Processors to be retained, they must be Pickled. The file extension for a general_log_plus job is .glpjob TODO: If the sort order is not by event_time, this will be a problem^ TODO: Decide on serialization paradigm
About
A framework for processing/expanding MySQL's general_log
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published