A very simple example of using streaming data by kafka & spark streaming & mongodb & bokeh
We produce some simulated streaming data and put them into kafka. Spark streaming comsume streaming data and insert data into mongodb. Then we use boken to display streaming data dynamically.
- Kafka: Kafka is used for building real-time data pipelines and streaming apps.
- Spark streaming: Spark streaming process streaming data.
- MongoDB: MongoDB is used for storing data.
- bokeh: bokeh display data~
In your shell,
-
clone this repo.
git@github.com:cnlinxi/kafka_spark_streaming.git
-
produce data into kafka.
python producer.py
-
receive data by spark streaming and put data into mongodb.
spark-submit --packages <spark-streaming-kafka java package> receiver.py
in my case (spark2.1.1), it is:
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.1 receiver.py
or you can follow this debug spark streaming by pycharm.
-
display data by bokeh.
bokeh serve data_display.py
- data_display.py: display data by bokeh.
- global_vals.py: global variable.
- mongo_utils.py: tools of mongodb.
- producer.py: produce data into kafka
- producer_without_kafka.py: simulator of producing data without kafka for debug
- receiver.py: receive data coming from kafka (producer.py) and insert data into mongodb.
blog: WinterColor blog
engoy it~