An AI-powered system that detects offensive language in Urdu and Roman Urdu text in real-time, provides preventive "ReThink" warnings, and continuously improves through user feedback.
- ๐ค AI-Powered: SVM classifier trained on 40,000+ samples
- โก Real-Time: <100ms prediction time
- ๐ Bilingual: Supports Urdu script and Roman Urdu
- ๐ก๏ธ ReThink Warnings: Preventive intervention before posting
- ๐ High Accuracy: 88.7% test accuracy
- ๐ Continuous Learning: Feedback loop for improvement
- ๐ป Web Interface: Modern, user-friendly UI
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| SVM | 88.7% | 88.65% | 85.38% | 86.98% |
| Naรฏve Bayes | 84.35% | 82.57% | 86.96% | 84.71% |
| Random Forest | 80.29% | 91.07% | 67.02% | 77.22% |
- Python 3.8 or higher
- pip package manager
- Clone/Download the project
cd "d:/Semesters/BSE-6/FYP 2/FYP_Project"- Install dependencies
pip install pandas numpy scikit-learn nltk flask openpyxl- Run the application
python app.py- Open your browser
http://localhost:5000
That's it! ๐
FYP_Project/
โโโ app.py # Flask backend API
โโโ data_loader.py # Data preprocessing
โโโ train_ml_models.py # Model training
โโโ feature_extraction.py # TF-IDF implementation
โโโ requirements.txt # Dependencies
โโโ PROJECT_DOCUMENTATION.md # Full documentation
โโโ models/ # Trained models
โ โโโ svm.pkl # Best model (SVM)
โ โโโ tfidf_vectorizer.pkl # TF-IDF vectorizer
โ โโโ ...
โโโ templates/
โ โโโ index.html # Frontend UI
โโโ static/
โโโ style.css # Styling
โโโ script.js # Frontend logic
- Type your message in Urdu or Roman Urdu
- Click "Check Message"
- View results:
- โ Safe: Message is non-offensive
โ ๏ธ Warning: ReThink modal appears for offensive content
- Choose action:
- Edit Message
- Post Anyway
- Cancel
import requests
response = requests.post('http://localhost:5000/predict',
json={'text': 'your message here'})
print(response.json())Response:
{
"text": "your message here",
"prediction": "offensive",
"confidence": 0.85,
"should_warn": true,
"timestamp": "2026-01-22T22:00:00"
}requests.post('http://localhost:5000/feedback', json={
'text': 'the message',
'predicted_label': 'offensive',
'actual_label': 'non-offensive',
'user_action': 'posted'
})For complete documentation, see PROJECT_DOCUMENTATION.md
Topics covered:
- Introduction & Problem Statement
- Literature Review
- Methodology & System Architecture
- Implementation Details
- Results & Evaluation
- Scope & Limitations
- Future Enhancements
# 1. Load and preprocess data
python data_loader.py
# 2. Train models
python train_ml_models.py
# 3. Models will be saved in models/ directory- Total Samples: 47,082 (41,845 after deduplication)
- Offensive: 24,516 (58.6%)
- Non-Offensive: 17,329 (41.4%)
- Sources:
- Hate Speech Roman Urdu (HS-RU-20): 5,000
- Dataset of Urdu Abusive Language: 12,083
- Roman Urdu 30K: 29,999
- Backend: Python, Flask
- ML/NLP: scikit-learn, NLTK
- Frontend: HTML5, CSS3, JavaScript
- Data Processing: pandas, numpy
- Deep Learning models (CNN, LSTM, Transformers)
- Mobile application (Android/iOS)
- Browser extension
- Multi-class classification (hate speech, profanity, etc.)
- Explainability (highlight offensive words)
- Multilingual support (Punjabi, Pashto, Sindhi)
This is a Final Year Project. For suggestions or improvements, please contact the author.
This project is developed as part of academic research.
Final Year Project
- Institution: [Your University]
- Year: 2026
- Supervisor: [Supervisor Name]
For questions or feedback:
- Email: [Your Email]
- GitHub: [Your GitHub]
- Dataset providers
- scikit-learn community
- Flask framework
- ReThink project inspiration
โญ If you find this project useful, please give it a star!
Developed with โค๏ธ for promoting civil online discourse in Urdu


