Skip to content

Khan-Harry/Offense-Guard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AI-Powered Offensive Language Detection System

Real-Time Detection for Urdu & Roman Urdu

Python Flask scikit-learn Accuracy

๐ŸŽฏ Overview

An AI-powered system that detects offensive language in Urdu and Roman Urdu text in real-time, provides preventive "ReThink" warnings, and continuously improves through user feedback.

โœจ Key Features

  • ๐Ÿค– AI-Powered: SVM classifier trained on 40,000+ samples
  • โšก Real-Time: <100ms prediction time
  • ๐ŸŒ Bilingual: Supports Urdu script and Roman Urdu
  • ๐Ÿ›ก๏ธ ReThink Warnings: Preventive intervention before posting
  • ๐Ÿ“Š High Accuracy: 88.7% test accuracy
  • ๐Ÿ”„ Continuous Learning: Feedback loop for improvement
  • ๐Ÿ’ป Web Interface: Modern, user-friendly UI

๐Ÿ“Š Performance

Model Accuracy Precision Recall F1-Score
SVM 88.7% 88.65% 85.38% 86.98%
Naรฏve Bayes 84.35% 82.57% 86.96% 84.71%
Random Forest 80.29% 91.07% 67.02% 77.22%

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Installation

  1. Clone/Download the project
cd "d:/Semesters/BSE-6/FYP 2/FYP_Project"
  1. Install dependencies
pip install pandas numpy scikit-learn nltk flask openpyxl
  1. Run the application
python app.py
  1. Open your browser
http://localhost:5000

That's it! ๐ŸŽ‰

๐Ÿ“ Project Structure

FYP_Project/
โ”œโ”€โ”€ app.py                      # Flask backend API
โ”œโ”€โ”€ data_loader.py              # Data preprocessing
โ”œโ”€โ”€ train_ml_models.py          # Model training
โ”œโ”€โ”€ feature_extraction.py       # TF-IDF implementation
โ”œโ”€โ”€ requirements.txt            # Dependencies
โ”œโ”€โ”€ PROJECT_DOCUMENTATION.md    # Full documentation
โ”œโ”€โ”€ models/                     # Trained models
โ”‚   โ”œโ”€โ”€ svm.pkl                # Best model (SVM)
โ”‚   โ”œโ”€โ”€ tfidf_vectorizer.pkl   # TF-IDF vectorizer
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ templates/
โ”‚   โ””โ”€โ”€ index.html             # Frontend UI
โ””โ”€โ”€ static/
    โ”œโ”€โ”€ style.css              # Styling
    โ””โ”€โ”€ script.js              # Frontend logic

๐ŸŽจ Screenshots

Main Interface

Main Interface

ReThink Warning

ReThink Warning

Results Display

Results

๐Ÿ”ง Usage

Web Interface

  1. Type your message in Urdu or Roman Urdu
  2. Click "Check Message"
  3. View results:
    • โœ… Safe: Message is non-offensive
    • โš ๏ธ Warning: ReThink modal appears for offensive content
  4. Choose action:
    • Edit Message
    • Post Anyway
    • Cancel

API Usage

Predict Endpoint

import requests

response = requests.post('http://localhost:5000/predict', 
    json={'text': 'your message here'})
print(response.json())

Response:

{
  "text": "your message here",
  "prediction": "offensive",
  "confidence": 0.85,
  "should_warn": true,
  "timestamp": "2026-01-22T22:00:00"
}

Feedback Endpoint

requests.post('http://localhost:5000/feedback', json={
    'text': 'the message',
    'predicted_label': 'offensive',
    'actual_label': 'non-offensive',
    'user_action': 'posted'
})

๐Ÿ“š Documentation

For complete documentation, see PROJECT_DOCUMENTATION.md

Topics covered:

  • Introduction & Problem Statement
  • Literature Review
  • Methodology & System Architecture
  • Implementation Details
  • Results & Evaluation
  • Scope & Limitations
  • Future Enhancements

๐Ÿงช Training Your Own Model

# 1. Load and preprocess data
python data_loader.py

# 2. Train models
python train_ml_models.py

# 3. Models will be saved in models/ directory

๐Ÿ“Š Dataset

  • Total Samples: 47,082 (41,845 after deduplication)
  • Offensive: 24,516 (58.6%)
  • Non-Offensive: 17,329 (41.4%)
  • Sources:
    • Hate Speech Roman Urdu (HS-RU-20): 5,000
    • Dataset of Urdu Abusive Language: 12,083
    • Roman Urdu 30K: 29,999

๐Ÿ› ๏ธ Technology Stack

  • Backend: Python, Flask
  • ML/NLP: scikit-learn, NLTK
  • Frontend: HTML5, CSS3, JavaScript
  • Data Processing: pandas, numpy

๐Ÿ”ฎ Future Enhancements

  • Deep Learning models (CNN, LSTM, Transformers)
  • Mobile application (Android/iOS)
  • Browser extension
  • Multi-class classification (hate speech, profanity, etc.)
  • Explainability (highlight offensive words)
  • Multilingual support (Punjabi, Pashto, Sindhi)

๐Ÿค Contributing

This is a Final Year Project. For suggestions or improvements, please contact the author.

๐Ÿ“ License

This project is developed as part of academic research.

๐Ÿ‘จโ€๐Ÿ’ป Author

Final Year Project

  • Institution: [Your University]
  • Year: 2026
  • Supervisor: [Supervisor Name]

๐Ÿ“ง Contact

For questions or feedback:

  • Email: [Your Email]
  • GitHub: [Your GitHub]

๐Ÿ™ Acknowledgments

  • Dataset providers
  • scikit-learn community
  • Flask framework
  • ReThink project inspiration

โญ If you find this project useful, please give it a star!


Developed with โค๏ธ for promoting civil online discourse in Urdu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors