Skip to content

Demonstrating Spark Structured Streaming using Twitter API, Apache Spark and Apache Kafka.

Notifications You must be signed in to change notification settings

Wolvarun9295/SparkStructuredStreaming--TwitterAPI-PySpark-Kafka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Steps to run Spark Structured Streaming

  • Generate your access keys from Twitter Dev site.
  • Add credentials in the tweetConfig.py file.
  • Start the Zookeeper server on terminal followed by Kafka server.
  • Run tweetListener.py file.
  • Run tweetAnalyzer.py file.

Output

  • Running tweetListener.py will fetch the tweets in real-time:

  • Running tweetAnalyzer.py will generate the following output from time to time in batches:

Error Solving

  • While execution, you might get errors which are mainly compatibility issues. To tackle those, use the anything less than Python 3.8 and also install PySpark==2.4.6, and py4j==1.10.7.
  • If you get an error saying, Cannot load data source: Kafka or refer Structured Streaming + Kafka Integration Guide, download the jars given here and extract them in $SPARK_HOME/jars/.

About

Demonstrating Spark Structured Streaming using Twitter API, Apache Spark and Apache Kafka.

Topics

Resources

Stars

Watchers

Forks

Languages