A Real-Time Twitter Sentiment Analysis and Visualization Framework
Production-ready end-to-end system combining Apache Kafka streaming, multi-model sentiment classification (EPS, Polarity, Emoticon), and interactive Django dashboard visualizations.
GitHub Repository: github.com/JamunaSMurthy/TwitSenti
Research Paper: TwitSenti: a real-time Twitter sentiment analysis and visualization framework - Journal of Information & Knowledge Management, Vol. 18, No. 2, 2019
TwitSenti implements the complete architecture from the research paper with three specialized sentiment classifiers:
- Data Pipeline: Apache Kafka (pub/sub) + PySpark (stream processing)
- Sentiment Classifiers:
SentiWordNetClassifier- Lexicon-based (SentiWordNet)PolarityClassifier- ML-based (Naive Bayes, SVM, Logistic Regression, MLP)EmoticonClassifier- Hybrid emoji + SVM approach
- Web API: Flask Micro-Server (REST endpoints) ✅ IMPLEMENTED
- Cache & Storage: Redis (NoSQL cache) ✅ IMPLEMENTED
- Web Dashboard: Django (visualizations)
- Streaming: Twitter API / Kafka / PySpark
✅ Real-time sentiment classification
✅ Multi-classifier ensemble approach
✅ Emoji/emoticon-aware analysis
✅ Geographic heat maps & trend visualizations
✅ Word clouds & semantic analysis
✅ Scalable streaming architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ TwitSenti System Architecture │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ DATA COLLECTION LAYER │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Twitter Feed │ │Twitter API │ │ Related │ │
│ │ │ │ Streaming │ │ Tweets │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────┼──────────────────┼───────────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌──────────▼──────────┐
│ Apache Kafka │
│ (Distributed │
│ Messaging) │
│ Pub-Sub Topics │
└──────────┬──────────┘
│
┌─────────────────────────────▼─────────────────────────────┐
│ DATA PRE-PROCESSING LAYER │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ • Detection & Analysis of Slangs/Abbreviations │ │
│ │ • Lemmatization & Correction │ │
│ │ • Stop Words Removal │ │
│ │ • Emoji/Emoticon Extraction & Preservation │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────┬──────────────────────────────┘
│
┌─────────────────────────────▼──────────────────────────────┐
│ SENTIMENT ANALYSIS LAYER (EPS Pipeline) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 📊 Emoticon Classifier │ │
│ │ (Emoji Sentiment Lexicon + SVM) │ │
│ │ Accuracy: ~88-92% (emoji-rich tweets) │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ 📈 Polarity Classifier │ │
│ │ (NB, SVM, LR, MLP + TF-IDF/BoW/Word2Vec) │ │
│ │ Best: SVM + TF-IDF (~89% accuracy) │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ 📝 SentiWordNet Classifier │ │
│ │ (Lexicon-based SentiWordNet scoring) │ │
│ │ Fast, no training required │ │
│ └──────────────────────────────────────────────────────┘ │
│ ▼ │
│ Consensus Vote: P/N/Neutral │
└─────────────────────────────┬──────────────────────────────┘
│
┌─────────────────────────────▼──────────────────────────────┐
│ STREAMING & STORAGE LAYER │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Apache Spark/PySpark Real-Time Processing │ │
│ │ • Real-Time Tweets Stream │ │
│ │ • User Timeline Tweets Stream │ │
│ │ • Text Query Tweets Stream │ │
│ │ (4 distributed worker nodes) │ │
│ └────────────────────────────────────────────────────┘ │
│ ▼ │
│ Redis Cache │
│ (NoSQL Data Store) │
└─────────────────────────────┬──────────────────────────────┘
│
┌─────────────────────────────▼──────────────────────────────┐
│ WEB APPLICATION & VISUALIZATION LAYER │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Django Dashboard (8 Interactive Visualizations): │ │
│ │ │ │
│ │ 🗺️ Heat Map - Geographic sentiment distribution │ │
│ │ 🌍 Regional Map - Country-level polarity │ │
│ │ 📄 Raw Tweets - Live tweet feed display │ │
│ │ ☁️ Word Cloud - Most frequent terms │ │
│ │ 🥧 Comparison - Sentiment distribution (pie) │ │
│ │ 💬 Trending Now - Trending topics & bubble chart │ │
│ │ 🌐 The World Now - Global sentiment snapshot │ │
│ │ 📈 Time Line - Sentiment trends over time │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ Flask Micro-Server with Redis Integration │
│ • REST API (8 endpoints for all visualizations) │
│ • Real-time feeds (positive/negative/neutral) │
│ • Trending hashtags & word frequencies │
│ • Geographic sentiment distribution │
│ • Sentiment timeline (hourly aggregation) │
│ Port: 5000 | Docs: /api/dashboard/data │
└─────────────────────────────┬──────────────────────────────┘
│
┌────────────────────────────────┼────────────────────────────────┐
│ │ │
┌────▼─────────┐ ┌──────────▼────────────┐ ┌────────────▼──┐
│ Django │ │ User Web Interface │ │ External Apps│
│ Dashboard │ │ (Browser/Dashboard) │ │ (API clients)│
│127.0.0.1:8000 │ 127.0.0.1:8000 │ │ Mobile/etc │
└──────────────┘ └───────────────────────┘ └──────────────┘
Text + Emojis → Emoji Extraction → Sentiment Lexicon → SVM Classification
- 150+ emojis with predefined sentiment scores (-1 to +1)
- Hybrid approach: emoji lexicon filtering + SVM for ambiguous cases
- 26.7% accuracy improvement over text-only models
- Best for: Emoji-rich social media content
Text → Preprocessing → Feature Extraction → ML Classifier → Polarity
Feature Methods: Bag-of-Words, TF-IDF, Word2Vec
Classifiers: Naive Bayes, SVM, Logistic Regression, Multi-Layer Perceptron
Best Combo: SVM + TF-IDF (~89% accuracy)
- Best for: Diverse text domains, tunable precision/recall
Text → Tokenization → POS Tagging → SentiWordNet Lookup → Sentiment Score
- No training required, immediate deployment
- Combines emoticon + polarity + sentiwordnet scoring
- Fast inference, interpretable results
- Best for: Quick baseline, real-time constraints
TwitSenti/
├── README.md (this file)
├── requirements.txt
├── zk-single-kafka-single.yml (Docker Compose for Kafka/Zookeeper)
│
├── Django-Dashboard/ # Main web application
│ ├── manage.py
│ ├── db.sqlite3
│ ├── BigDataProject/ # Django config
│ │ ├── settings.py
│ │ ├── urls.py
│ │ ├── wsgi.py
│ │ └── static/
│ │ ├── css/
│ │ ├── js/
│ │ └── imgs/
│ │
│ ├── dashboard/ # Dashboard app
│ │ ├── views.py (8 chart endpoints)
│ │ ├── models.py
│ │ ├── urls.py
│ │ ├── admin.py
│ │ ├── consumer_user.py (Kafka integration)
│ │ └── templates/dashboard/
│ │ ├── index.html
│ │ ├── classify.html
│ │ └── base.html
│ │
│ └── migrations/
│
├── Kafka-PySpark/ # Streaming pipeline
│ ├── producer-validation-tweets.py # Data ingestion
│ ├── consumer-pyspark.py # Stream processing + sentiment
│ ├── twitter_validation.csv
│ └── twitter_training.csv
│
├── SentiWordNetClassifier/ # Lexicon-based classifier
│ ├── __init__.py
│ ├── classifier.py
│ ├── preprocessor.py
│ ├── example_usage.py
│ └── README.md
│
├── PolarityClassifier/ # ML-based classifier
│ ├── __init__.py
│ ├── polarity_classifier.py
│ ├── classifiers.py (NB, SVM, LR, MLP)
│ ├── feature_extractor.py (BoW, TF-IDF, Word2Vec)
│ ├── preprocessor.py
│ ├── example_usage.py
│ └── README.md
│
├── EmoticonClassifier/ # Emoji + SVM classifier
│ ├── __init__.py
│ ├── emoticon_classifier.py
│ ├── emoji_lexicon.py (~150 emojis)
│ ├── feature_extractor.py
│ ├── preprocessor.py
│ ├── example_usage.py
│ └── README.md
│
├── Flask-Server/ # REST API + Redis Cache
│ ├── app.py (Flask application with 8 API endpoints)
│ ├── redis_cache.py (Redis wrapper & utility methods)
│ ├── config.py (Configuration for Redis/Flask)
│ ├── requirements.txt
│ ├── .env.example
│ └── README.md (Flask setup & API documentation)
│
├── Kafka-PySpark/consumer-pyspark-redis.py # Redis-aware consumer
│
├── ML PySpark Model/ # Model training
│ ├── Big_Data.ipynb
│ ├── twitter_training.csv
│ └── twitter_validation.csv
│
└── imgs/ # Documentation assets
# Install system dependencies
brew install kafka zookeeper python3 # macOS with Homebrew
# OR use Docker Compose for Kafka/Zookeeper# 1. Clone repository
git clone https://github.com/<your-username>/TwitSenti.git
cd TwitSenti
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install Python dependencies
pip install -r requirements.txt
pip install -r Flask-Server/requirements.txt # Flask + Redis client
# 4. Start Redis server (if not running)
brew services start redis # macOS
# OR Docker: docker run -d -p 6379:6379 redis:latest
# Verify: redis-cli ping (should return PONG)
# 5. Train all sentiment classifiers (CRITICAL - creates models/)
python train_classifiers.py
# Creates:
# - models/sentiwordnet.pkl
# - models/polarity_model.pkl
# - models/emoticon_model.pkl
# Expected output: "✓ All classifiers trained successfully!"
# 6. Start Kafka + Zookeeper (Docker)
docker compose -f zk-single-kafka-single.yml up -d
# 7. Start Flask API server (Terminal 1)
cd Flask-Server
python app.py
# Flask running on http://localhost:5000
# Test health: curl http://localhost:5000/api/health
# 8. (Optional) Run producer to collect tweets (Terminal 2)
python Kafka-PySpark/producer-validation-tweets.py --tokens YOUR_TWITTER_TOKENS
# 9. Run PySpark consumer → Redis (Terminal 3)
python Kafka-PySpark/consumer-pyspark-redis.py
# This consumes Kafka tweets, classifies with ensemble ML, sends to Flask API → Redis
# 10. Start Django dashboard (Terminal 4)
cd Django-Dashboard
python manage.py migrate
python manage.py runserver
# Django running on http://localhost:8000
# 11. Open browser
# Dashboard: http://127.0.0.1:8000/dashboard/
# Flask API: http://127.0.0.1:5000/api/dashboard/data# Check Redis
redis-cli ping # Should return: PONG
redis-cli info # Server statistics
# Check Flask API
curl http://localhost:5000/api/health
curl http://localhost:5000/api/sentiment/stats
# Monitor Kafka consumer
kafka-console-consumer.sh --topic tweets --bootstrap-servers localhost:9092The interactive Django dashboard provides:
- 8 real-time visualization charts
- Live tweet stream display
- Sentiment statistics and trending topics
- Geographic heat maps of sentiment distribution
- Word frequency clouds and semantic analysis
- Responsive design with Flask API integration
from SentiWordNetClassifier import SentiWordNetClassifier
clf = SentiWordNetClassifier()
result = clf.analyze_sentiment("This movie is fantastic!")
print(result['sentiment']) # 'Positive'
print(result['score']) # 0.75from PolarityClassifier import PolarityClassifier
clf = PolarityClassifier(classifier_type='svm', feature_method='tfidf')
clf.train(train_texts, train_labels)
prediction = clf.predict("Great product!")
probabilities = clf.predict_proba("Great product!")from EmoticonClassifier import HybridEmoticonClassifier
clf = HybridEmoticonClassifier(emoji_threshold=0.5)
clf.train(texts_with_emojis, labels)
prediction = clf.predict("Love this! 😍❤️") # Uses emoji lexicon + SVMBased on validation data (twitter_validation.csv):
| Classifier | Feature Method | Accuracy | Best For |
|---|---|---|---|
| Emoticon + SVM | Emoji + TF-IDF | 91-92% | Emoji-rich tweets |
| SVM | TF-IDF | 89-90% | High accuracy |
| Logistic Regression | TF-IDF | 85-87% | Speed/interpretability |
| Naive Bayes | BoW | 82-84% | Sparse features |
| SentiWordNet | Lexicon | 78-82% | No training needed |
| MLP | Word2Vec | 86-88% | Complex patterns |
- Throughput: 2,500+ messages/sec (Kafka + Spark)
- Latency: <500ms per tweet (preprocessing + classification)
- Dashboard refresh: <4 seconds (real-time updates)
- Memory: ~2GB (Redis + PySpark)
- SentiWordNetClassifier: Lexicon-based, no training, instant deployment
- PolarityClassifier: 4 ML algorithms × 3 feature methods (12 combinations)
- EmoticonClassifier: Emoji sentiment lexicon + SVM hybrid approach
- Apache Kafka for data distribution
- PySpark for distributed processing (4 worker nodes)
- Flask REST API for data retrieval
- Redis cache for <100ms response times
- Heat Map - Geographic sentiment by region (
/api/locations/sentiment) - Regional Map - Country-level sentiment distribution (
/api/locations/sentiment) - Raw Tweets - Live tweet stream display (
/api/tweets/feed/live) - Word Cloud - Frequent terms & topics (
/api/words/frequency) - Comparison Chart - Positive/Negative/Neutral distribution (
/api/sentiment/stats) - Trending Now - Bubble chart of trending topics (
/api/trending/hashtags) - The World Now - Global sentiment heatmap (
/api/locations/sentiment) - Time Line - Sentiment trends over time (
/api/sentiment/timeline)
- Modular classifier design (3 independent packages)
- Easy model swapping & ensemble voting
- Containerized deployment (Docker)
- Redis cache with 24-hour TTL
- Flask REST API with CORS enabled
- Django dashboard frontend
- Fully async Kafka → Spark → Flask pipeline
Each classifier includes comprehensive examples:
# Test SentiWordNetClassifier
python SentiWordNetClassifier/example_usage.py
# Test PolarityClassifier (all models + features)
python PolarityClassifier/example_usage.py
# Test EmoticonClassifier (emoji lexicon + SVM)
python EmoticonClassifier/example_usage.pyThis implementation is based on the research paper:
Murthy, J. S., Siddesh, G. M., & Srinivasa, K. G. (2019). TwitSenti: a real-time Twitter sentiment analysis and visualization framework. Journal of Information & Knowledge Management, 18(02), 1950013.
https://www.worldscientific.com/doi/abs/10.1142/S0219649219500138
✓ EPS Pipeline: Emoticon + Polarity + SentiWordNet ensemble
✓ Architecture: Kafka → Spark → Redis → Django visualization
✓ Data Flow: Collection → Preprocessing → Classification → Storage → Display
✓ Performance: Optimized for low-latency, scalable throughput
- Transformer models (BERT/RoBERTa) for improved accuracy
- Multi-language support (Arabic, Spanish, Chinese)
- Elasticsearch + Kibana dashboard integration
- Kubernetes deployment (EKS/GKE/AKS)
- Advanced NER (Named Entity Recognition)
- Aspect-based sentiment analysis
- Sarcasm & irony detection
MIT License - See LICENSE file for details
This project implements the architecture and concepts from the original research paper:
Citation (BibTeX):
@article{murthy2019twitsenti,
title={TwitSenti: a real-time Twitter sentiment analysis and visualization framework},
author={Murthy, Jamuna S and Siddesh, GM and Srinivasa, KG},
journal={Journal of Information \& Knowledge Management},
volume={18},
number={02},
pages={1950013},
year={2019},
publisher={World Scientific}
}- Apache Kafka & Spark communities
- Django & Flask frameworks
- SentiWordNet lexicon project
- Twitter Streaming API documentation
- Redis caching engine
Author: Jamuna Srinivasa Murthy
Email: jamunamurthy.s@gmail.com
For issues, feature requests, or questions:
- Open GitHub Issues
- Check existing documentation in each classifier's README
- Review example_usage.py in each module
- Contact: jamunamurthy.s@gmail.com
Happy sentiment analyzing! 🎉

