Skip to content

Suite of Benchmark Applications for Stream Processing Systems

License

LGPL-3.0, MIT licenses found

Licenses found

LGPL-3.0
LICENSE.LGPL
MIT
LICENSE.MIT
Notifications You must be signed in to change notification settings

ParaGroup/StreamBenchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: LGPL v3 License: MIT Hits

StreamBenchmarks

This repository contains a set of stream processing applications taken from the literature, and from existing repositories (e.g., here), which have been cleaned up properly. The applications can be run in a homogeneous manner and their execution collects statistics of throughput and latency in different ways.

Below we list the applications with the availability in different Stream Processing Engines and Libraries. We consider Apache Storm, Apache Flink and WindFlow (link):

Application Acronym Apache Storm Apache Flink WindFlow
FraudDetection FD Yes Yes Yes
SpikeDetection SD Yes Yes Yes
TrafficMonitoring TM Yes Yes Yes
WordCount WC Yes Yes Yes
Yahoo! Streaming Benchmark YSB Yes Yes Yes
LinearRoad LR Yes Yes Yes
VoipStream VS Yes Yes Yes
SentimentAnalysis SA No No Yes
LogProcessing LP No No Yes
MachineOutlier MO No No Yes
ReinforcementLearner RL No No Yes

This repository also contains small datasets used to run the applications except for LinearRoad and VoipStream. For these two applications, datasets can be generated as described in 1 and 2. Once generated, please copy the dataset files in the Datasets/LR and Datasets/VS folders respectively. The datasets are used by all versions of the same application in all the supported frameworks. For the Yahoo! Streaming Benchmark (YSB) and ReinforcementLearner (RL) no dataset is actually required by the present implementation (synthetic data are continously generated by Sources).

This repository is not totally cleaned and there is a certain duplication of code. The reason is because each application, for each framework, is designed to be a separated standalone project. Refer to the README file within each subfolder (application/framework) for further information about how to run each application and for the required dependencies.

How to Cite

This repository uses the applications that we have recently added to a larger benchmark suite of streaming applications called DSPBench available on GitHub at the following link. If our applications revealed useful for your research, we kindly ask you to give credit to our effort by citing the following paper:

@article{DSPBench,
 author={Bordin, Maycon Viana and Griebler, Dalvan and Mencagli, Gabriele and Geyer, Cláudio F. R. and Fernandes, Luiz Gustavo L.},
 journal={IEEE Access},
 title={DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing Systems},
 year={2020},
 volume={8},
 number={},
 pages={222900-222917},
 doi={10.1109/ACCESS.2020.3043948}
}

Contributors

The main developer and maintainer of this repository is Gabriele Mencagli. Other authors of the source code are Alessandra Fais, Andrea Cardaci and Cosimo Agati.