Navigation Menu

Skip to content

Commit

Permalink
Re-arranging and added a bunch of pipes
Browse files Browse the repository at this point in the history
  • Loading branch information
Jonathan Moss committed Jul 29, 2012
1 parent eca767a commit fd92ca2
Show file tree
Hide file tree
Showing 21 changed files with 489 additions and 4 deletions.
4 changes: 0 additions & 4 deletions README.md

This file was deleted.

5 changes: 5 additions & 0 deletions README.rst
@@ -0,0 +1,5 @@
=====
Pious
=====

A python package for dealing with basic input/output processes
Binary file added docs/Pious.pdf
Binary file not shown.
29 changes: 29 additions & 0 deletions docs/Pious.rst
@@ -0,0 +1,29 @@
=====
PIOUS
=====

.. contents::

.. section-numbering::

.. page::

Introduction
============

A python package for dealing with basic input/output processes. Pious came from
the observation that the processing data almost always boils down to a series
of discreet transforms. Splitting these transforms out into reusable modules is
not only a useful way to build data processing pipelines but also provide a
convenient way to design and describe the process to others.

Pious aims to provide the tools needed to build complex data transformation
pipeline in a simple and easily understood manner. It also try to provide the
tools necessary to inspect data as it passes through the pipeline to make
debugging processes as easy as possible.

.. page::

.. include:: pious_sources.rst
.. include:: pious_pipes.rst
.. include:: pious_consumers.rst
4 changes: 4 additions & 0 deletions docs/build.sh
@@ -0,0 +1,4 @@
#!/bin/sh
rst2pdf Pious.rst -e preprocess
rm Pious.rst.build_temp
rm *~
22 changes: 22 additions & 0 deletions docs/pious_consumers.rst
@@ -0,0 +1,22 @@
==============
Pious Consumer
==============

Consumers are the termination points of pipelines. They consume the final
output of the pipeline. For example they may write to a file in various formats
or insert/replace into a database table or tables.

Existing Consumers
==================

Consumer
The base class for all Pious consumers. This is the class from which your
own custom consumers should derive.

DbTable
Inserts/Replaces values into a table. Also has options for truncating
the table prior to the insert etc.

CsvFile
Writes the data to a CSV file with optional headers and configurable
configurable separators and escape characters
31 changes: 31 additions & 0 deletions docs/pious_pipes.rst
@@ -0,0 +1,31 @@
===========
Pious Pipes
===========

Pipes are similar to data sources in that they act as iterators. The different
is that they take a datasource (or another pipe) as an input. Pipes therefore
are the building blocks of your transformation pipeline.

Pious provides a goodly collection of pipe with which to build your pipelines
providing filters, field transformers along with more complex pipes like
fork which takes a pipeline and splits it into 2 or multiple parallel pipes.

Existing Pipes
==============

Pipe
The base pipe class that all others extend. This should be the basis of
your own pipe implementations

Ensure
Ensure that the dict passing through has certain keys and that if they are
not present then we set a default value

Filter
Skips items that match certain criteria

Fork
Splits the pipe into 2 parallel pipes - i.e. the output of the bound
iterator is fed into the input of 2 pipelines.


32 changes: 32 additions & 0 deletions docs/pious_sources.rst
@@ -0,0 +1,32 @@
==================
Pious Data Sources
==================

At the base of all Pious transform pipelines are data sources, which as the
name would imply are sources of the raw data you wish to transform.

Pious provides a number of data source primatives along with the provisions to
build upon those primatives to fit your own requires.

Data Source Primitives
======================

Source
The base class for all data sources. This is the class from which you
should extend you own custom data sources.

CSVFile
A Character separated values file. You can specify the separator, escape character etc.

XMLFile
The build blocks of a SAX based XML file parser

FlatXMLFile
This extends the XMLFile primitive to provide a simple flat xml importer.
By flat XML I mean one that bascially imitates a CSV, it doesn't use
attributes, just flat un-nested elements.

DbIterator
Iterators the results of an SQL query


Empty file added pious/__init__.py
Empty file.
Empty file added pious/models.py
Empty file.
Empty file added pious/process/__init__.py
Empty file.
Empty file added pious/transform/__init__.py
Empty file.
Empty file added pious/transform/consumers.py
Empty file.
3 changes: 3 additions & 0 deletions pious/transform/errors.py
@@ -0,0 +1,3 @@

class InvalidIterator(Exception):
pass
17 changes: 17 additions & 0 deletions pious/transform/log.py
@@ -0,0 +1,17 @@
class Logger():

template = "Watch item encountered:\n\tBefore: %s\n\tAfter: %s"

def log_transform(self, data_in, data_out):
self._emit(self._get_message(data_in, data_out))

def _get_message(self, data_in, data_out):
return self.template % (data_in, data_out)

def _emit(self, msg):
pass

class ConsoleLogger(Logger):

def _emit(self, msg):
print(msg)
136 changes: 136 additions & 0 deletions pious/transform/pipes.py
@@ -0,0 +1,136 @@
from pious.transform.errors import InvalidIterator

class Pipe(object):
"""
The base object that should be extended by your own
pipes
"""

def __init__(self):
self.watches = []
self.logger = None

def bind(self, iterator):
"""
Binds an iterator (anything that provides __iter__)
to this pipe
"""
if not getattr(iterator, '__iter__'):
raise InvalidIterator()
self.iterator = iterator.__iter__()

def set_logger(self, logger):
"""
"""
self._logger = logger

def add_watch(self, watch):
self.watches.append(watch)

def _apply(self, data):
"""
This is the method called to perform the
translation and should therefore be overridden
in derived classes
"""
return data

def _is_watched(self, data):
for watch in self.watches:
if watch(data):
return True
return False

def _log_watched(self, data_in, data_out):
self.logger.log_transform(data_in, data_out)

def __iter__(self):
return self

def next(self):
data_in = self.iterator.next()
data_out = self._apply(data_in)
if self._is_watched(data_in) or self._is_watched(data_out):
self._log_watched(data_in, data_out)
return data_out


class Ensure(Pipe):
"""
Ensure that the dict passing through has certain keys and that if they
are not present then we set a default value
"""
def __init__(self, map):
super(Ensure, self).__init__()
self.map = map

def _apply(self, data):
return dict(self.map.items() + data.items())


class Filter(Pipe):
"""
Skips items that the passed in matcher
tells it to
matcher should be a function that returns
True if the items should be filtered out
"""
def __init__(self, matcher):
self.matcher = matcher
super(Filter, self).__init__()

def next(self):
data_out = super(Filter, self).next()
while self.matcher(data_out):
data_out = super(Filter, self).next()
return data_out

class Rename(Pipe):
"""
Renames keys based on the passed in key_map
The key_map should be in the form:
{'old key name': 'new key name',}
"""
def __init__(self, key_map):
self.key_map = key_map
super(Rename, self).__init__()

def _apply(self, data):
for key in self.key_map:
if key in data:
data[self.key_map[key]] = data[key]
del data[key]
return data

class Winnow(Pipe):
"""
Removes unwanted keys from the data passing through
"""
def __init__(self, keys):
self.keys = keys
super(Winnow, self).__init__()

def _apply(self, data):
for key in self.keys:
if key in data:
del data[key]
return data

class AutoIncrement(Pipe):
"""
Added a unique (for this pipe) auto-incrementing
number to to the specified key
"""
def __init__(self, key, start_value = 0, interval = 1):
super(AutoIncrement, self).__init__()
self.key = key
self.counter_value = start_value
self.interval = interval

def _apply(self, data):
self.counter_value += self.interval
data[self.key] = self.counter_value
return data
Empty file added pious/transform/sources.py
Empty file.
30 changes: 30 additions & 0 deletions setup.py
@@ -0,0 +1,30 @@
#!/usr/bin/env python
"""
Installation script:
To release a new version to PyPi:
- Run: python setup.py sdist upload
"""

from setuptools import setup, find_packages

setup(name='pious',
version='0.0.1',
url='https://github.com/a-musing-moose/pious',
author="Jonathan Moss",
author_email="jonathan.moss@tangentsnowball.com.au",
description="A python package for dealing with basic input/output processes",
long_description=open('README.rst').read(),
keywords="Data, Processing, Pipelines",
license='BSD',
platforms=['linux'],
packages=find_packages(exclude=["*.tests"]),
include_package_data = True,
install_requires=[],
# See http://pypi.python.org/pypi?%3Aaction=list_classifiers
classifiers=['Environment :: Console',
'Intended Audience :: Developers',
'License :: OSI Approved :: BSD License',
'Operating System :: Unix',
'Programming Language :: Python']
)
Empty file added tests/__init__.py
Empty file.
17 changes: 17 additions & 0 deletions tests/log_tests.py
@@ -0,0 +1,17 @@
from unittest import TestCase

from pious.transform.log import ConsoleLogger

class ConsoleLoggerTest(TestCase):

def test_has_log_transform(self):
l = ConsoleLogger()
try:
getattr(l, 'log_transform');
except AttributeError:
self.fail("Console logger is missing a log_transform method")

i = { 'a': 'b' }
o = { 'a': 'c'}
l.log_transform(i, o)

0 comments on commit fd92ca2

Please sign in to comment.