A Python-based command-line web crawler that uses Selenium to navigate websites, interact with dynamic elements (forms, buttons, search bars), capture network requests, and extract valid endpoints. The crawler saves endpoint details (URL, method, body parameters, headers) to a file in JSON, plain text, or CSV format.
- Crawls websites starting from a given URL, up to a specified page limit.
- Interacts with dynamic elements:
- Clicks buttons and submit-like elements.
- Fills forms (text inputs, dropdowns, checkboxes) without submitting.
- Enters test data in search bars.
- Triggers
onchangeandoninputevents.
- Captures network requests to identify HTTP endpoints.
- Extracts endpoints from JavaScript files.
- Validates URLs to ensure they belong to the base domain and exclude static assets (CSS, JS, images).
- Supports custom HTTP headers (e.g., Authorization tokens).
- Saves unique endpoints to a file in JSON, plain text, or CSV format.
- Browser fallback: Uses Chrome, falls back to Firefox if Chrome is unavailable.
- Accurate HTTP method detection (GET, POST, PUT, DELETE) for all endpoints.
Caution: This crawler sends real HTTP requests to the target website, interacting with forms, buttons, and search bars. Do not use this tool with real accounts or credentials, as it may trigger security measures, lock accounts, or result in bans. Always obtain permission from the website owner before crawling, and use test accounts or environments to avoid unintended consequences.
- Install Python: Ensure Python 3.6+ is installed.
- Install Dependencies:
pip install selenium requests
- Install WebDriver:
- For Chrome: Install ChromeDriver matching your Chrome version.
- For Firefox: Install GeckoDriver.
- Ensure the WebDriver executable is in your system PATH.
Run the crawler from the command line using crawler.py.
python crawler.py -u http://example.com -m 10 -o endpoints.json --headless --header "Authorization: Bearer token"-u/--url: Starting URL (required).-m/--max-pages: Maximum pages to crawl (default: 10).-o/--output: Output file (default:endpoints.json).-f/--format: Output format (json,txt,csv; default:jsonor inferred from file extension).--headless: Run in headless mode.--header: Custom header (e.g.,--header "Authorization: Bearer token"; can be used multiple times).
-
JSON (default):
- Array of objects with
url,method,body_params, andextra_headers. - Example:
endpoints.json[ { "url": "http://example.com/api/submit", "method": "POST", "body_params": {"query": "test"}, "extra_headers": {"Authorization": "Bearer token"} } ] - Use case: General-purpose, suitable for scripts or tools that parse JSON.
- Array of objects with
-
Plain Text (
txt):- One URL per line.
- Example:
endpoints.txthttp://example.com/api/submit - Use case: Direct input to tools like
nuclei,ffuf, orsqlmap.
-
CSV:
- Columns:
URL,Method,Body Params(JSON-serialized),Extra Headers(JSON-serialized). - Example:
endpoints.csvURL,Method,Body Params,Extra Headers http://example.com/api/submit,POST,"{""query"": ""test""}","{""Authorization"": ""Bearer token""}" - Use case: Structured data for analysis or tools that accept CSV.
- Columns:
-
Basic Crawl with JSON Output:
python crawler.py -u http://example.com -m 10 -o endpoints.json --headless
-
Crawl with Authentication and Text Output:
python crawler.py -u http://example.com -m 20 -o endpoints.txt -f txt --headless --header "Authorization: Bearer eyJhbGciOiJIUzUxMiJ9..." -
Crawl with CSV Output:
python crawler.py -u http://example.com -m 15 -o endpoints.csv -f csv --headless --header "User-Agent: Mozilla/5.0" -
Pipe to Nuclei for Vulnerability Scanning:
python crawler.py -u http://example.com -o endpoints.txt -f txt --headless cat endpoints.txt | nuclei -t /path/to/templates -
Pipe to FFUF for Fuzzing:
python crawler.py -u http://example.com -o endpoints.txt -f txt --headless ffuf -w endpoints.txt -u FUZZ
-
Test SQL Injection with sqlmap:
python crawler.py -u http://example.com -o endpoints.txt -f txt --headless sqlmap -m endpoints.txt --batch
- Browser Support: The crawler tries Chrome first, falling back to Firefox if Chrome is unavailable.
- HTTP Methods: Accurately detects GET, POST, PUT, DELETE methods, including for endpoints extracted from JavaScript.
- Error Handling: Logs errors and warnings for debugging.
- Output: Use the
txtformat for easy integration with most security tools.
Contributions are welcome! Please:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/YourFeature). - Commit changes (
git commit -m "Add YourFeature"). - Push to the branch (
git push origin feature/YourFeature). - Open a pull request.