Skip to content

theraphim/pyflowtools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python extension module for working with flow-tools' data
====================================================

Home: http://code.google.com/p/pyflowtools/
  
This extension module gives you a simple python interface to NetFlow
data as stored by flow-tools package (http://flow-tools.googlecode.com/).

It contains 2 public classes - FlowSet and FlowPDU.

FlowSet can read and write data from/to a given file (or from standard
input/output).  FlowPDU parses "raw" netflow PDU (UDP packet).

In reader mode, FlowSet provides an iterator interface
to access the individual flow records as instances of a second
class called Flow. A Flow provides access to its data through
attribute references.

In writer mode, FlowSet allows us to write FlowPDU instances to a stream by
calling FlowSetInstance.write(FlowPDUinstance).

This release candidate doesn't allow to write streams to stdout, yet.

FlowPDU object is constructed by passing the exporter IP and the UDP packet
payload to the FlowPDU() constructor. You can then iterate over the FlowPDU
instance to get individual flows (like with FlowSet),

It is possible to query header fields of parsed PDU. It is also possible to
compare 2 PDUs.  The module will try to do its best to figure out which PDU was
issued earlier in time (if they aren't equal).  You can call
FlowPDUInstance.is_next(FlowPDUInstance2) to figure out if Instance2 is
immediate successor of Instance.

There are plans to write proper documentation for this module, apparently,
nobody has done it, yet.

Example of its use:

    ---------------------------------------------------
    import flowtools

    set = flowtools.FlowSet( "-" ) # Read from stdin

    for flow in set:
        print "%s %s" % ( flow.srcaddr, flow.dstaddr ) 
    ---------------------------------------------------

Given a Flow, you can access all fields contained in the NetFlow
data (see beginning of flowtools.c for a list of valid attribute
names).

Notes:

    - All flow attributes containing an IP address return their
    values as strings as default. To get an IP as a long integer,
    append "_raw" to the attribute's name (e.g. "srcaddr_raw"). 

    - The attributes "first" and "last" return times as standard
    Unix timestamps (i.e. seconds since 1970-01-01 00:00:00). To get
    the real values as found in the NetFlow data, use "first_raw"
    and "last_raw", respectivly (these values are based on the
    router's SysUptime).
    
    - There's an additional method "Flow.getID( bidir = 0 )" which 
    returns a string identifying a flow. It's constructed from
    source address/port/interface, destination
    address/port/interface and IP protocol. If bidir==1, the tuple
    is sorted such that two flows which only differ by direction get
    the same ID (this assumes symmetric routing).

    - There is an example script called "flowprint-full" which
    prints all flow fields.

Installing:

    If you want to use stable version consider using pyflowtools package from
    your distribution.

Building:

    Requirements:
        - Python >= 2.4 
        - Headers and libraries from the flow-tools package
           
    Tested on:
        - Fedora 7 (primary development platform)

After unpacking the tar file and changing into the contained directory do:

    - python setup.py build_ext -I </dir/of/ftlib.h>  -L </dir/of/libft.a>
    - python setup.py install

About

Automatically exported from code.google.com/p/pyflowtools

Resources

License

Stars

Watchers

Forks

Packages

No packages published