AI-Powered Offensive Language Detection System

Real-Time Detection for Urdu & Roman Urdu

🎯 Overview

An AI-powered system that detects offensive language in Urdu and Roman Urdu text in real-time, provides preventive "ReThink" warnings, and continuously improves through user feedback.

✨ Key Features

🤖 AI-Powered: SVM classifier trained on 40,000+ samples
⚡ Real-Time: <100ms prediction time
🌐 Bilingual: Supports Urdu script and Roman Urdu
🛡️ ReThink Warnings: Preventive intervention before posting
📊 High Accuracy: 88.7% test accuracy
🔄 Continuous Learning: Feedback loop for improvement
💻 Web Interface: Modern, user-friendly UI

📊 Performance

Model	Accuracy	Precision	Recall	F1-Score
SVM	88.7%	88.65%	85.38%	86.98%
Naïve Bayes	84.35%	82.57%	86.96%	84.71%
Random Forest	80.29%	91.07%	67.02%	77.22%

🚀 Quick Start

Prerequisites

Python 3.8 or higher
pip package manager

Installation

Clone/Download the project

cd "d:/Semesters/BSE-6/FYP 2/FYP_Project"

Install dependencies

pip install pandas numpy scikit-learn nltk flask openpyxl

Run the application

python app.py

Open your browser

http://localhost:5000

That's it! 🎉

📁 Project Structure

FYP_Project/
├── app.py                      # Flask backend API
├── data_loader.py              # Data preprocessing
├── train_ml_models.py          # Model training
├── feature_extraction.py       # TF-IDF implementation
├── requirements.txt            # Dependencies
├── PROJECT_DOCUMENTATION.md    # Full documentation
├── models/                     # Trained models
│   ├── svm.pkl                # Best model (SVM)
│   ├── tfidf_vectorizer.pkl   # TF-IDF vectorizer
│   └── ...
├── templates/
│   └── index.html             # Frontend UI
└── static/
    ├── style.css              # Styling
    └── script.js              # Frontend logic

🎨 Screenshots

Main Interface

ReThink Warning

Results Display

🔧 Usage

Web Interface

Type your message in Urdu or Roman Urdu
Click "Check Message"
View results:
- ✅ Safe: Message is non-offensive
- ⚠️ Warning: ReThink modal appears for offensive content
Choose action:
- Edit Message
- Post Anyway
- Cancel

API Usage

Predict Endpoint

import requests

response = requests.post('http://localhost:5000/predict', 
    json={'text': 'your message here'})
print(response.json())

Response:

{
  "text": "your message here",
  "prediction": "offensive",
  "confidence": 0.85,
  "should_warn": true,
  "timestamp": "2026-01-22T22:00:00"
}

Feedback Endpoint

requests.post('http://localhost:5000/feedback', json={
    'text': 'the message',
    'predicted_label': 'offensive',
    'actual_label': 'non-offensive',
    'user_action': 'posted'
})

📚 Documentation

For complete documentation, see PROJECT_DOCUMENTATION.md

Topics covered:

Introduction & Problem Statement
Literature Review
Methodology & System Architecture
Implementation Details
Results & Evaluation
Scope & Limitations
Future Enhancements

🧪 Training Your Own Model

# 1. Load and preprocess data
python data_loader.py

# 2. Train models
python train_ml_models.py

# 3. Models will be saved in models/ directory

📊 Dataset

Total Samples: 47,082 (41,845 after deduplication)
Offensive: 24,516 (58.6%)
Non-Offensive: 17,329 (41.4%)
Sources:
- Hate Speech Roman Urdu (HS-RU-20): 5,000
- Dataset of Urdu Abusive Language: 12,083
- Roman Urdu 30K: 29,999

🛠️ Technology Stack

Backend: Python, Flask
ML/NLP: scikit-learn, NLTK
Frontend: HTML5, CSS3, JavaScript
Data Processing: pandas, numpy

🔮 Future Enhancements

Deep Learning models (CNN, LSTM, Transformers)
Mobile application (Android/iOS)
Browser extension
Multi-class classification (hate speech, profanity, etc.)
Explainability (highlight offensive words)
Multilingual support (Punjabi, Pashto, Sindhi)

🤝 Contributing

This is a Final Year Project. For suggestions or improvements, please contact the author.

📝 License

This project is developed as part of academic research.

👨‍💻 Author

Final Year Project

Institution: [Your University]
Year: 2026
Supervisor: [Supervisor Name]

📧 Contact

For questions or feedback:

Email: [Your Email]
GitHub: [Your GitHub]

🙏 Acknowledgments

Dataset providers
scikit-learn community
Flask framework
ReThink project inspiration

⭐ If you find this project useful, please give it a star!

Developed with ❤️ for promoting civil online discourse in Urdu

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
instance		instance
mobile_app		mobile_app
results		results
.gitignore		.gitignore
Dockerfile		Dockerfile
PROJECT_DOCUMENTATION.md		PROJECT_DOCUMENTATION.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
RETRAINING_RESULTS.md		RETRAINING_RESULTS.md
analyze_labels.py		analyze_labels.py
analyze_labels_v2.py		analyze_labels_v2.py
app.py		app.py
bad_words.json		bad_words.json
check_t1.py		check_t1.py
data_loader.py		data_loader.py
debug_import.py		debug_import.py
debug_models_all.py		debug_models_all.py
deep_analyze_labels.py		deep_analyze_labels.py
docker-compose.yml		docker-compose.yml
evaluate_models.py		evaluate_models.py
feature_extraction.py		feature_extraction.py
inspect_30k.py		inspect_30k.py
inspect_data.py		inspect_data.py
inspect_remaining.py		inspect_remaining.py
inspect_remaining_v2.py		inspect_remaining_v2.py
inspect_t123.py		inspect_t123.py
overrides.json		overrides.json
requirements.txt		requirements.txt
restore_models.py		restore_models.py
retrain_models.py		retrain_models.py
simulate_logic.py		simulate_logic.py
test_api.py		test_api.py
test_endpoint.py		test_endpoint.py
test_ips.py		test_ips.py
test_predict.py		test_predict.py
test_svm.py		test_svm.py
train_dl_models.py		train_dl_models.py
train_ml_models.py		train_ml_models.py
train_w2v.py		train_w2v.py
verify_auth.py		verify_auth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Offensive Language Detection System

Real-Time Detection for Urdu & Roman Urdu

🎯 Overview

✨ Key Features

📊 Performance

🚀 Quick Start

Prerequisites

Installation

📁 Project Structure

🎨 Screenshots

Main Interface

ReThink Warning

Results Display

🔧 Usage

Web Interface

API Usage

Predict Endpoint

Feedback Endpoint

📚 Documentation

🧪 Training Your Own Model

📊 Dataset

🛠️ Technology Stack

🔮 Future Enhancements

🤝 Contributing

📝 License

👨‍💻 Author

📧 Contact

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Offensive Language Detection System

Real-Time Detection for Urdu & Roman Urdu

🎯 Overview

✨ Key Features

📊 Performance

🚀 Quick Start

Prerequisites

Installation

📁 Project Structure

🎨 Screenshots

Main Interface

ReThink Warning

Results Display

🔧 Usage

Web Interface

API Usage

Predict Endpoint

Feedback Endpoint

📚 Documentation

🧪 Training Your Own Model

📊 Dataset

🛠️ Technology Stack

🔮 Future Enhancements

🤝 Contributing

📝 License

👨‍💻 Author

📧 Contact

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages