Skip to content

A Python heuristic phishing detector analyzing URLs and web content for suspicious features. It calculates a phishing score based on URL length, special characters, HTTPS status, and blacklists. Designed for educational purposes, it flags potentially malicious sites, offering detailed output and JSON reports. Not a professional security tool.

Notifications You must be signed in to change notification settings

bishesh-droid/Python-Phishing-Website-Detector

Repository files navigation

Python Phishing Website Detector

This project implements a heuristic-based phishing website detector. It analyzes URLs and fetches web page content to extract various features, calculate a phishing score, and flag potentially malicious websites. This tool is designed for educational purposes to demonstrate how phishing detection mechanisms work.

Disclaimer: This tool is a simplified educational project and is NOT a substitute for professional browser security features or dedicated anti-phishing solutions. It relies on a set of heuristics and a simple blacklist, which may not catch all phishing attempts and can produce false positives. The author is not responsible for any damage or misuse of this software.

Ethical Considerations

Phishing detection is a critical aspect of cybersecurity, but its development and use come with significant ethical responsibilities:

  • Educational Use Only: This detector is for learning purposes. Do not rely on it for absolute protection against phishing attacks.
  • False Positives: Heuristic-based detection is prone to incorrectly flagging legitimate websites as phishing. Always verify suspicious URLs independently.
  • Privacy: When fetching external URLs, be aware that your IP address might be logged by the target server. Avoid scanning sensitive or private URLs without explicit permission.
  • Scope and Limitations: Clearly understand that this is a basic detector. It does not employ machine learning, advanced behavioral analysis, or real-time threat intelligence feeds that commercial solutions use.

Features

  • URL Feature Extraction: Analyzes URL components such as length, presence of suspicious characters (@, // in path), number of subdomains, HTTPS status, and IP address in hostname.
  • HTML Content Feature Extraction: Examines webpage content for indicators like the presence of forms, iframes, scripts, suspicious keywords (e.g., "login", "verify"), and external links.
  • Blacklist Integration: Checks analyzed domains and HTML content against a local blacklist of known malicious domains and keywords.
  • Heuristic-Based Scoring: Assigns a phishing score based on a weighted sum of detected suspicious features.
  • Configurable Threshold: Allows users to adjust the phishing score threshold for flagging websites.
  • Detailed Output: Provides a breakdown of extracted features, the calculated phishing score, and detected indicators.
  • JSON Report Generation: Can generate a detailed JSON report of the analysis for further examination.
  • Command-Line Interface: Easy-to-use interface for analyzing URLs.

Project Structure

.
├── phishing_website_detector/
│   ├── __init__.py
│   ├── main.py
│   └── blacklists.txt
├── tests/
│   ├── __init__.py
│   └── test_detector.py
├── .gitignore
├── conceptual_analysis.txt
├── README.md
└── requirements.txt

Prerequisites

  • Python 3.7+
  • pip for installing dependencies

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/Python-Phishing-Website-Detector.git
    cd Python-Phishing-Website-Detector
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install the dependencies:

    pip install -r requirements.txt

Usage

1. Update Blacklists (Optional)

The phishing_website_detector/blacklists.txt file contains default blacklisted domains and keywords. You can edit this file to add more entries, one per line.

  • Domains: Enter full domain names (e.g., malicious-site.com). The detector will covert these to lowercase for matching.
  • Keywords: Enter suspicious words or phrases (e.g., urgent_security_alert). These will be searched for in the HTML content.

2. Analyze a URL

Provide the URL you wish to analyze as a command-line argument:

python phishing_website_detector/main.py <url_to_analyze>

Example:

python phishing_website_detector/main.py https://www.example-phishing.com/login

Command-Line Options

  • -b, --blacklist <file_path>: Specify a custom path to the blacklist file (default: blacklists.txt).
  • -t, --threshold <score>: Adjust the phishing score threshold for flagging a website (default: 0.5). Scores equal to or above this threshold will flag the site as "Potentially Phishing."
  • -o, --output <file_path>: Output the detailed analysis results to a JSON file.

Examples of Usage

  • Analyze a URL with a custom blacklist and output to JSON:

    python phishing_website_detector/main.py http://suspicious.site/ --blacklist my_custom_blacklist.txt --output analysis_report.json
  • Analyze a URL with a higher phishing threshold:

    python phishing_website_detector/main.py http://another-suspicious.net/ --threshold 0.7

Scoring Heuristics Explained

The detector calculates a phishing score based on various features. Each feature, when detected, adds a certain weight to the total score. A higher score indicates a greater likelihood of being a phishing site.

Some key indicators and their conceptual impact:

  • URL Length: Very long URLs can be a characteristic of phishing sites.
  • @ Symbol in URL: Often used to trick users by displaying a legitimate-looking domain before the @ symbol.
  • // in URL Path: Can be used for confusing redirects.
  • Excessive Subdomains: Many subdomains can obscure the true top-level domain.
  • No HTTPS: The absence of HTTPS, especially on login pages, is a major red flag.
  • IP Address in Hostname: Legitimate sites rarely use raw IP addresses in their URLs.
  • Forms/Iframes/Scripts: While common, their presence, especially with other suspicious indicators, can suggest data harvesting or malicious injections.
  • Suspicious Keywords in HTML: Words like "login," "verify account," etc., when combined with other factors, can point to a phishing intent.
  • Blacklist Match: A direct hit on a blacklisted domain or keyword significantly increases the phishing score.

Testing

To run the automated tests, execute the following command from the project's root directory:

python -m unittest discover tests

Contributing

Contributions are welcome! If you have ideas for improvements or have found a bug, please open an issue or submit a pull request.

  1. Fork the repository.
  2. Create a new branch: git checkout -b feature/your-feature-name
  3. Make your changes and commit them: git commit -m 'Add some feature'
  4. Push to the branch: git push origin feature/your-feature-name
  5. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

A Python heuristic phishing detector analyzing URLs and web content for suspicious features. It calculates a phishing score based on URL length, special characters, HTTPS status, and blacklists. Designed for educational purposes, it flags potentially malicious sites, offering detailed output and JSON reports. Not a professional security tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages