A C++ framework for data analytics pipelines
C++ Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Doxygen
examples fixed pagerank Mar 27, 2018
pico
tests
.gitignore
COPYING
COPYING.LESSER
README.md

README.md

PiCo

PiCo stands for Pipeline Composition.

The main entity in PiCo is the Pipeline, basically a graph-composition of processing elements.

This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages.

The DSL we propose is entirely implemented in C++11, exploiting the FastFlow library as runtime.

How to compile

To build any PiCo application, the FastFlow library should be provided. FastFlow can be downloaded from the GitHub repository and then add a symbolic link to the FastFlow library into the PiCo directory:

$ ln -s /path/to/FastFlow/ff /PiCo/root/dir

Examples

In the examples directory can be found an optimized word count, a stock-market and a page rank benchmarks. The word count example can be executed as the following:

$ cd word-count

$ make

$ ./pico_wc <input file> <output file>

and the input data can be generated with the application in the word-count/testdata directory. Input text is generated by utilizing an english dictionary dictionary.txt. An input text can be generated as follows:

$ cd testdata

$ make

$ ./generate_text <dictionary file> <n. of lines> >> <output.file>

the parallel degree can be set manually with the environment variable PARDEG.

Graph Visualization

Each example produces a .dot file for a graphical representation of the application pipeline coded in Graphviz. To visualize these graphs, use the command:

dot -Tpng filename.dot -o outfile.png