- Generate your access keys from Twitter Dev site.
- Add credentials in the tweetConfig.py file.
- Start the Zookeeper server on terminal followed by Kafka server.
- Run tweetListener.py file.
- Run tweetAnalyzer.py file.
- Running tweetListener.py will fetch the tweets in real-time:
- Running tweetAnalyzer.py will generate the following output from time to time in batches:
- While execution, you might get errors which are mainly compatibility issues. To tackle those, use the anything less than Python 3.8 and also install PySpark==2.4.6, and py4j==1.10.7.
- If you get an error saying, Cannot load data source: Kafka or refer Structured Streaming + Kafka Integration Guide, download the jars given here and extract them in $SPARK_HOME/jars/.