SparkStructuredStreaming--TwitterAPI-PySpark-Kafka/README.md at master · Wolvarun9295/SparkStructuredStreaming--TwitterAPI-PySpark-Kafka · GitHub

Steps to run Spark Structured Streaming

Generate your access keys from Twitter Dev site.
Add credentials in the tweetConfig.py file.
Start the Zookeeper server on terminal followed by Kafka server.
Run tweetListener.py file.
Run tweetAnalyzer.py file.

Output

Running tweetListener.py will fetch the tweets in real-time:

Running tweetAnalyzer.py will generate the following output from time to time in batches:

Error Solving

While execution, you might get errors which are mainly compatibility issues. To tackle those, use the anything less than Python 3.8 and also install PySpark==2.4.6, and py4j==1.10.7.
If you get an error saying, Cannot load data source: Kafka or refer Structured Streaming + Kafka Integration Guide, download the jars given here and extract them in $SPARK_HOME/jars/.