Skip to content

Latest commit

 

History

History
68 lines (44 loc) · 2.36 KB

README.md

File metadata and controls

68 lines (44 loc) · 2.36 KB

python producer.pykafka_spark_streaming

A very simple example of using streaming data by kafka & spark streaming & mongodb & bokeh

We produce some simulated streaming data and put them into kafka. Spark streaming comsume streaming data and insert data into mongodb. Then we use boken to display streaming data dynamically.

Get Started

main dependencies

  • Kafka: Kafka is used for building real-time data pipelines and streaming apps.
  • Spark streaming: Spark streaming process streaming data.
  • MongoDB: MongoDB is used for storing data.
  • bokeh: bokeh display data~

run it

In your shell,

  • clone this repo.

    git@github.com:cnlinxi/kafka_spark_streaming.git
  • produce data into kafka.

python producer.py
  • receive data by spark streaming and put data into mongodb.

    spark-submit --packages <spark-streaming-kafka java package> receiver.py
    

    in my case (spark2.1.1), it is:

    spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.1 receiver.py
    

    or you can follow this debug spark streaming by pycharm.

  • display data by bokeh.

    bokeh serve data_display.py
    

File Structure

Connect

cnmengnan@gmail.com

blog: WinterColor blog

engoy it~