Skip to content

Latest commit

 

History

History
24 lines (16 loc) · 1.11 KB

File metadata and controls

24 lines (16 loc) · 1.11 KB

Steps to run Spark Structured Streaming

  • Generate your access keys from Twitter Dev site.
  • Add credentials in the tweetConfig.py file.
  • Start the Zookeeper server on terminal followed by Kafka server.
  • Run tweetListener.py file.
  • Run tweetAnalyzer.py file.

Output

  • Running tweetListener.py will fetch the tweets in real-time:

  • Running tweetAnalyzer.py will generate the following output from time to time in batches:

Error Solving

  • While execution, you might get errors which are mainly compatibility issues. To tackle those, use the anything less than Python 3.8 and also install PySpark==2.4.6, and py4j==1.10.7.
  • If you get an error saying, Cannot load data source: Kafka or refer Structured Streaming + Kafka Integration Guide, download the jars given here and extract them in $SPARK_HOME/jars/.