Phishing Detection System

|Model Knowledge Base Architecture|

A machine learning-based system to detect phishing attempts by analyzing URLs, email content, and website features. Includes a REST API backend and a browser extension for real-time protection.

Features

URL Analysis: Detect phishing URLs using ML-based feature extraction
Email Classification: Identify phishing emails based on content analysis
Browser Extension: Real-time protection while browsing
REST API: Easy integration with other applications

Project Structure

phishing_detection/
├── backend/
│   ├── app.py                 # FastAPI server
│   ├── model.pkl              # Trained ML model (generated)
│   ├── feature_extractor.py   # Feature extraction logic
│   ├── train_model.py         # Model training script
│   └── requirements.txt       # Python dependencies
├── frontend/
│   ├── index.html             # Web application UI
│   ├── styles.css             # Styling
│   ├── script.js              # Frontend logic
│   └── README.md              # Frontend documentation
├── extension/
│   ├── manifest.json          # Extension configuration
│   ├── background.js          # Background service worker
│   ├── content_script.js      # Content script for warnings
│   ├── popup.html             # Extension popup UI
│   ├── popup.css              # Popup styling
│   └── popup.js               # Popup logic
├── data/
│   ├── phishing_urls.csv      # Phishing URL dataset
│   ├── legitimate_urls.csv    # Legitimate URL dataset
│   └── emails.csv             # Email dataset
├── README.md                  # This file
├── QUICKSTART.md              # Quick setup guide
└── START_ALL.bat              # Start both backend and frontend

Setup Instructions

Quick Start (All-in-One)

Windows:

# Start both backend and frontend
START_ALL.bat

Then open your browser to: http://localhost:8080

Backend Setup

Install Python dependencies:

cd phishing_detection/backend
pip install -r requirements.txt

Train the ML model:

python train_model.py

This will:

Load training data from ../data/ directory
Extract features from URLs
Train a Random Forest classifier
Save the model as model.pkl
Display accuracy metrics

Start the API server:

python app.py

The API will be available at http://localhost:8000

Frontend Web App Setup

Start the frontend server:

cd phishing_detection/frontend
python -m http.server 8080

Open in browser:
- Navigate to http://localhost:8080
- Or double-click index.html to open directly
Start using:
- Analyze URLs in the URL Analysis tab
- Check emails in the Email Analysis tab
- View history of your analyses
- Learn more in the About page

Features:

Modern, responsive web interface
Real-time URL and email analysis
Analysis history with filtering
No build tools required (pure HTML/CSS/JS)
Mobile-friendly design

Browser Extension Setup

Open Chrome Extensions page:
- Navigate to chrome://extensions/
- Enable "Developer mode" (toggle in top right)
Load the extension:
- Click "Load unpacked"
- Select the phishing_detection/extension folder
The extension is now active:
- Icon will appear in the toolbar
- Automatic URL analysis on navigation
- Click icon to see analysis results

Usage

API Endpoints

Analyze URL

POST http://localhost:8000/analyze/url
Content-Type: application/json

{
  "url": "http://suspicious-site.com/login"
}

Response:

{
  "classification": "phishing",
  "score": 85.5,
  "features": {
    "url_length": 35,
    "has_ip": 0,
    "suspicious_keywords": 1,
    ...
  },
  "message": "⚠️ This URL appears to be a phishing attempt. Do not proceed."
}

Analyze Email

POST http://localhost:8000/analyze/email
Content-Type: application/json

{
  "content": "URGENT: Verify your account now!"
}

Response:

{
  "classification": "phishing",
  "score": 75.0,
  "features": {
    "urgency_count": 2,
    "url_count": 1,
    ...
  },
  "message": "⚠️ This email appears to be a phishing attempt."
}

Browser Extension

Automatic Protection:
- Extension automatically analyzes URLs as you browse
- Displays warning overlay for phishing sites
- Options to go back or proceed anyway
Manual Check:
- Click extension icon to see current page analysis
- View risk score and classification
- Re-analyze if needed

Features Extracted

URL Features

URL length, domain length, path length
Subdomain count
IP address detection
HTTPS usage
Suspicious keywords (login, verify, account, etc.)
URL shortening service detection
Special character count
Digit ratio

Email Features

Email length
Urgency keywords count
Number of URLs in content
Suspicious phrases
Exclamation mark count

Model Performance

The Random Forest classifier is trained on URL features and achieves:

Accuracy: ~90%+ (depends on training data)
False Positive Rate: <5%

Note: With the sample dataset (10 phishing + 10 legitimate URLs), accuracy will be limited. For production use, train with a larger dataset (10,000+ samples recommended).

Extending the Dataset

To improve model accuracy:

Add more URLs to data/phishing_urls.csv and data/legitimate_urls.csv
Ensure balanced dataset (similar number of phishing and legitimate samples)
Retrain the model: python train_model.py

Security Considerations

The extension requires <all_urls> permission to analyze any website
API runs locally by default (localhost:8000)
No data is sent to external servers
Cache stores analysis results locally for 1 hour

Troubleshooting

Model not loading:

Ensure you've run python train_model.py first
Check that model.pkl exists in the backend directory

Extension not working:

Verify the API server is running on port 8000
Check browser console for errors
Ensure extension has necessary permissions

API errors:

Install all dependencies: pip install -r requirements.txt
Check Python version (3.10+ recommended)
Verify port 8000 is not in use

Future Enhancements

Add website content analysis (HTML, forms, scripts)
Implement neural network model for improved accuracy
Add user feedback mechanism to improve model
Support for Firefox and Edge browsers
Database for storing analysis history
User authentication and personalized settings
Real-time threat intelligence integration
Whitelist/blacklist management

License

MIT License - Feel free to use and modify for your projects.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
data		data
extension		extension
frontend		frontend
.gitignore		.gitignore
FRONTEND_COMPLETE.md		FRONTEND_COMPLETE.md
FRONTEND_GUIDE.md		FRONTEND_GUIDE.md
HOW_TO_RUN.txt		HOW_TO_RUN.txt
PROJECT_COMPLETE.txt		PROJECT_COMPLETE.txt
PROJECT_FINAL_SUMMARY.md		PROJECT_FINAL_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
RUN_TESTS.bat		RUN_TESTS.bat
START_ALL.bat		START_ALL.bat
START_BACKEND.bat		START_BACKEND.bat
START_FRONTEND.bat		START_FRONTEND.bat
TESTING_COMPLETE.md		TESTING_COMPLETE.md
TEST_REPORT.md		TEST_REPORT.md
TEST_RESULTS_VISUAL.md		TEST_RESULTS_VISUAL.md
TEST_SUMMARY.md		TEST_SUMMARY.md
test_all.py		test_all.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Phishing Detection System

|Model Knowledge Base Architecture|

Features

Project Structure

Setup Instructions

Quick Start (All-in-One)

Backend Setup

Frontend Web App Setup

Browser Extension Setup

Usage

API Endpoints

Analyze URL

Analyze Email

Browser Extension

Features Extracted

URL Features

Email Features

Model Performance

Extending the Dataset

Security Considerations

Troubleshooting

Future Enhancements

License

Contributing

About

Uh oh!

Releases

Packages

Languages

UtkarshPrime/MKBA-Phishing-Detection

Folders and files

Latest commit

History

Repository files navigation

Phishing Detection System

|Model Knowledge Base Architecture|

Features

Project Structure

Setup Instructions

Quick Start (All-in-One)

Backend Setup

Frontend Web App Setup

Browser Extension Setup

Usage

API Endpoints

Analyze URL

Analyze Email

Browser Extension

Features Extracted

URL Features

Email Features

Model Performance

Extending the Dataset

Security Considerations

Troubleshooting

Future Enhancements

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages