Python Phishing Website Detector

This project implements a heuristic-based phishing website detector. It analyzes URLs and fetches web page content to extract various features, calculate a phishing score, and flag potentially malicious websites. This tool is designed for educational purposes to demonstrate how phishing detection mechanisms work.

Disclaimer: This tool is a simplified educational project and is NOT a substitute for professional browser security features or dedicated anti-phishing solutions. It relies on a set of heuristics and a simple blacklist, which may not catch all phishing attempts and can produce false positives. The author is not responsible for any damage or misuse of this software.

Ethical Considerations

Phishing detection is a critical aspect of cybersecurity, but its development and use come with significant ethical responsibilities:

Educational Use Only: This detector is for learning purposes. Do not rely on it for absolute protection against phishing attacks.
False Positives: Heuristic-based detection is prone to incorrectly flagging legitimate websites as phishing. Always verify suspicious URLs independently.
Privacy: When fetching external URLs, be aware that your IP address might be logged by the target server. Avoid scanning sensitive or private URLs without explicit permission.
Scope and Limitations: Clearly understand that this is a basic detector. It does not employ machine learning, advanced behavioral analysis, or real-time threat intelligence feeds that commercial solutions use.

Features

URL Feature Extraction: Analyzes URL components such as length, presence of suspicious characters (@, // in path), number of subdomains, HTTPS status, and IP address in hostname.
HTML Content Feature Extraction: Examines webpage content for indicators like the presence of forms, iframes, scripts, suspicious keywords (e.g., "login", "verify"), and external links.
Blacklist Integration: Checks analyzed domains and HTML content against a local blacklist of known malicious domains and keywords.
Heuristic-Based Scoring: Assigns a phishing score based on a weighted sum of detected suspicious features.
Configurable Threshold: Allows users to adjust the phishing score threshold for flagging websites.
Detailed Output: Provides a breakdown of extracted features, the calculated phishing score, and detected indicators.
JSON Report Generation: Can generate a detailed JSON report of the analysis for further examination.
Command-Line Interface: Easy-to-use interface for analyzing URLs.

Project Structure

.
├── phishing_website_detector/
│   ├── __init__.py
│   ├── main.py
│   └── blacklists.txt
├── tests/
│   ├── __init__.py
│   └── test_detector.py
├── .gitignore
├── conceptual_analysis.txt
├── README.md
└── requirements.txt

Prerequisites

Python 3.7+
pip for installing dependencies

Installation

Clone the repository:

git clone https://github.com/your-username/Python-Phishing-Website-Detector.git
cd Python-Phishing-Website-Detector

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```

Usage

1. Update Blacklists (Optional)

The phishing_website_detector/blacklists.txt file contains default blacklisted domains and keywords. You can edit this file to add more entries, one per line.

Domains: Enter full domain names (e.g., malicious-site.com). The detector will covert these to lowercase for matching.
Keywords: Enter suspicious words or phrases (e.g., urgent_security_alert). These will be searched for in the HTML content.

2. Analyze a URL

Provide the URL you wish to analyze as a command-line argument:

python phishing_website_detector/main.py <url_to_analyze>

Example:

python phishing_website_detector/main.py https://www.example-phishing.com/login

Command-Line Options

-b, --blacklist <file_path>: Specify a custom path to the blacklist file (default: blacklists.txt).
-t, --threshold <score>: Adjust the phishing score threshold for flagging a website (default: 0.5). Scores equal to or above this threshold will flag the site as "Potentially Phishing."
-o, --output <file_path>: Output the detailed analysis results to a JSON file.

Examples of Usage

Analyze a URL with a custom blacklist and output to JSON:

python phishing_website_detector/main.py http://suspicious.site/ --blacklist my_custom_blacklist.txt --output analysis_report.json

Analyze a URL with a higher phishing threshold:

python phishing_website_detector/main.py http://another-suspicious.net/ --threshold 0.7

Scoring Heuristics Explained

The detector calculates a phishing score based on various features. Each feature, when detected, adds a certain weight to the total score. A higher score indicates a greater likelihood of being a phishing site.

Some key indicators and their conceptual impact:

URL Length: Very long URLs can be a characteristic of phishing sites.
@ Symbol in URL: Often used to trick users by displaying a legitimate-looking domain before the @ symbol.
// in URL Path: Can be used for confusing redirects.
Excessive Subdomains: Many subdomains can obscure the true top-level domain.
No HTTPS: The absence of HTTPS, especially on login pages, is a major red flag.
IP Address in Hostname: Legitimate sites rarely use raw IP addresses in their URLs.
Forms/Iframes/Scripts: While common, their presence, especially with other suspicious indicators, can suggest data harvesting or malicious injections.
Suspicious Keywords in HTML: Words like "login," "verify account," etc., when combined with other factors, can point to a phishing intent.
Blacklist Match: A direct hit on a blacklisted domain or keyword significantly increases the phishing score.

Testing

To run the automated tests, execute the following command from the project's root directory:

python -m unittest discover tests

Contributing

Contributions are welcome! If you have ideas for improvements or have found a bug, please open an issue or submit a pull request.

Fork the repository.
Create a new branch: git checkout -b feature/your-feature-name
Make your changes and commit them: git commit -m 'Add some feature'
Push to the branch: git push origin feature/your-feature-name
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Phishing Website Detector

Ethical Considerations

Features

Project Structure

Prerequisites

Installation

Usage

1. Update Blacklists (Optional)

2. Analyze a URL

Command-Line Options

Examples of Usage

Scoring Heuristics Explained

Testing

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
phishing_website_detector		phishing_website_detector
tests		tests
.gitignore		.gitignore
README.md		README.md
conceptual_analysis.txt		conceptual_analysis.txt
requirements.txt		requirements.txt

bishesh-droid/Python-Phishing-Website-Detector

Folders and files

Latest commit

History

Repository files navigation

Python Phishing Website Detector

Ethical Considerations

Features

Project Structure

Prerequisites

Installation

Usage

1. Update Blacklists (Optional)

2. Analyze a URL

Command-Line Options

Examples of Usage

Scoring Heuristics Explained

Testing

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages