Twitter-Sentiment-Analysis-in-Python

Update

I wrote a blog post with some additional information on the project here https://traintestsplit.com/twitter-sentiment-analysis-in-python-with-code/

Data Pipeline Architecture of Streaming Twitter Data into Apache Kafka cluster, performing simple sentiment analysis with TextBlob module, and subsequently storing the data into MongoDB. Finally, the results are presented in a dashboard that updates live built with Plotly and Dash. The high level architecture can be seen in the following picture.

Pipeline Explanation

The pipeline makes use of the twitter streaming API in order to stream tweets in Python in real time, according to specified keywords. The tweets are pushed into a topic in the Kafka cluster by the aforementioned twitter producer. The streaming twitter data are in JSON format.

A selection of the relevant fields, as well as simple sentiment analysis, are done during the streaming process. The code is implemented in such a way that the text field of tweets, as well as retweets, is not truncated.

On the other end, a MongoDB consumer consumes the streaming data and stores them in MongoDB for future use. MongoDB was chosen because of the JSON format of the data.

Finally, the text of the tweets, as well as simple statistics are presented in an interactive Dash dashboard that works with live updates.

Code Explanation

The kafkaTwitterStreaming.py script implements the twitter producer. The kafkaMongoConsumer.py implements the MongoDB consumer. Finally, the DashboardFinal.py script connects to the Mongo database and presents live results on a simple web dashboard.

Instructions on Running the Code

The following steps should be followed in order to run the code:

Machine Preparation

Install and configure Apache Kafka.
Install and configure MongoDB.
Install and configure Python 3.

Run the Code

Start zookeeper on one terminal.
Start kafka on another.
Create a kafka topic with desired number of partitions and other configurations.
Start mongo on a third terminal.
Run the mongodb consumer on one console, specifying the name of the topic you created, on the script.
Run the twitter stream producer on a second console, after specifying the desired keyword list to fetch tweets by in the script. At this point, the tweets should be streaming from twitter into the Kafka topic and subsequently getting stored into MongoDB.
Run the dashboard script. The results should be showing on an interactive webpage.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
DashboardFinal.py		DashboardFinal.py
README.md		README.md
kafkaMongoConsumer.py		kafkaMongoConsumer.py
kafkaTwitterStreaming.py		kafkaTwitterStreaming.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Twitter-Sentiment-Analysis-in-Python

Update

Pipeline Explanation

Code Explanation

Instructions on Running the Code

Machine Preparation

Run the Code

About

Uh oh!

Releases

Packages

Languages

amChristonasis/Twitter-Sentiment-Analysis-in-Python

Folders and files

Latest commit

History

Repository files navigation

Twitter-Sentiment-Analysis-in-Python

Update

Pipeline Explanation

Code Explanation

Instructions on Running the Code

Machine Preparation

Run the Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages