Skip to content

cnlinxi/kafka_spark_streaming

Repository files navigation

python producer.pykafka_spark_streaming

A very simple example of using streaming data by kafka & spark streaming & mongodb & bokeh

We produce some simulated streaming data and put them into kafka. Spark streaming comsume streaming data and insert data into mongodb. Then we use boken to display streaming data dynamically.

Get Started

main dependencies

  • Kafka: Kafka is used for building real-time data pipelines and streaming apps.
  • Spark streaming: Spark streaming process streaming data.
  • MongoDB: MongoDB is used for storing data.
  • bokeh: bokeh display data~

run it

In your shell,

  • clone this repo.

    git@github.com:cnlinxi/kafka_spark_streaming.git
  • produce data into kafka.

python producer.py
  • receive data by spark streaming and put data into mongodb.

    spark-submit --packages <spark-streaming-kafka java package> receiver.py
    

    in my case (spark2.1.1), it is:

    spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.1 receiver.py
    

    or you can follow this debug spark streaming by pycharm.

  • display data by bokeh.

    bokeh serve data_display.py
    

File Structure

Connect

cnmengnan@gmail.com

blog: WinterColor blog

engoy it~

About

A very simple example of using streaming data by kafka & spark streaming & mongodb & bokeh

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages