A powerful Python tool to recursively clone websites using Selenium with undetected-chromedriver. Designed to bypass basic bot protections like Cloudflare challenges, it allows manual CAPTCHA solving and saves pages and assets for offline browsing.
- π΅οΈββοΈ Uses
undetected-chromedriverto evade bot detection - π§ Opens a real Chrome browser to manually solve CAPTCHAs
- π Recursively clones pages within the same domain
- π¨ Downloads all assets (images, CSS, JS)
- π Rewrites internal links for seamless offline browsing
- Python 3.8+
- Google Chrome (latest version)
- Python Packages:
undetected-chromedriverrequestsbeautifulsoup4webdriver-manager
git clone https://github.com/chromeheartbeat/Website_Cloner.git
cd Website_Cloner
2. (Optional) Create and activate a virtual environment
Linux/macOS:
python3 -m venv .venv
source .venv/bin/activate
3. Install dependencies
Using requirements file:
pip install -r requirements.txt
Or install individually:
pip install undetected-chromedriver requests beautifulsoup4 webdriver-manager
### π§ͺ Usage
python site_cloner.py
A Chrome browser will open and load the target website.
If a CAPTCHA appears, solve it manually.
Return to the terminal and press Enter to continue.
The website will be saved to the cloned_selenium_site folder.
βοΈ Customization
Change target URL:
Edit this line in site_cloner.py:
clone_website("https://example.com")
β οΈ Disclaimer
This tool is intended for educational and ethical use only.
Do not use to clone or scrape websites without explicit permission.
The author is not responsible for any misuse or legal issues.
Always respect websites' Terms of Service and copyright laws.
π License
MIT License Β© [Solution]
π Contribute
Feel free to submit issues, fork the repo, and send pull requests!
We welcome improvements, bug fixes, and new features π