GitHub - Wolvarun9295/SparkStructuredStreaming--TwitterAPI-PySpark-Kafka: Demonstrating Spark Structured Streaming using Twitter API, Apache Spark and Apache Kafka.

Steps to run Spark Structured Streaming

Running tweetAnalyzer.py will generate the following output from time to time in batches:

While execution, you might get errors which are mainly compatibility issues. To tackle those, use the anything less than Python 3.8 and also install PySpark==2.4.6, and py4j==1.10.7.
If you get an error saying, Cannot load data source: Kafka or refer Structured Streaming + Kafka Integration Guide, download the jars given here and extract them in $SPARK_HOME/jars/.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Output		Output
README.md		README.md
tweetAnalyzer.py		tweetAnalyzer.py
tweetReadListener.py		tweetReadListener.py
twitterConfig.py		twitterConfig.py