Skip to content

A streaming benchmark supporting Flink and Spark

Notifications You must be signed in to change notification settings

Flink-zhisheng/StreamingBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Streaming_benchmark

Streaming Benchmark is designed to measure the performance of stream processing system such as flink and spark. Three use cases are simulated (User Visit Session Analysis, Evaluation of Real-time Advertising and Shopping Record Analysis). Raw data is generated and stored in Kafka. Streams map into streaming tables and queries act on these tables.

Building

mvn clean package

Prerequisites

You should have Apache Kafka, Apache zookeeper, Apache Spark and Blink installed in your cluster.

Setup

  1. Clone the project into your master.
  2. Update conf/benchmarkConf.yaml (The properties of Kafka, Zookeeper, benchmark...)
streambench.zkHost                       ip1:2181,ip2:2181,ip3:2181...
streambench.kafka.brokerList             ip1:port1,ip1:port2...
streambench.kafka.consumerGroup          benchmark(default)
  1. Update flink/conf/benchmarkConf.yaml (The properties of flink)
streambench.flink.checkpointDuration     5000
streambench.flink.timeType               EventTime(Use EventTime or ProcessTime)
  1. Update conf/dataGenHosts (The hosts where data will be generated; suggest to generate data on kafka node)
ip1
ip2
...
  1. Update conf/queriesToRun (The queries will be run)
q1.sql
q2.sql
q3.sql
...
  1. Update conf/env
export DATAGEN_TIME=100 (Running time for each query)
export THREAD_PER_NODE=10(The number of threads on each node to generate data)
export FLINK_HOME={FLINK_HOME}
export SPARK_HOME={SPARK_HOME}
  1. Copy the project to every node which will generate data (the same hosts in conf/dataGenHosts) and ensure that the master node can log in these hosts without password.

Run Benchmark

Start Zookeeper, kafka, Spark, Blink first. Run flink benchmark: sh bin/runFlinkBenchmark.sh. Run spark benchmark: sh bin/runSparkBenchmark.sh. Run both flink and spark benchmark: sh bin/runAll.sh.

Result

The results will be save on flink/result/result.log and spark/result/result.log and the format of result is just like below:

Finished time: 2019-10-30 19:07:26; q1.sql  Runtime: 58s TPS:10709265
Finished time: 2019-10-30 19:08:37; q2.sql  Runtime: 57s TPS:8061793
Finished time: 2019-10-30 19:09:51; q5.sql  Runtime: 57s TPS:4979921

About

A streaming benchmark supporting Flink and Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published