Skip to content

chromeheartbeat/Website_Cloner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌐 Website_Cloner

A powerful Python tool to recursively clone websites using Selenium with undetected-chromedriver. Designed to bypass basic bot protections like Cloudflare challenges, it allows manual CAPTCHA solving and saves pages and assets for offline browsing.


πŸš€ Features

  • πŸ•΅οΈβ€β™‚οΈ Uses undetected-chromedriver to evade bot detection
  • 🧠 Opens a real Chrome browser to manually solve CAPTCHAs
  • πŸ”„ Recursively clones pages within the same domain
  • 🎨 Downloads all assets (images, CSS, JS)
  • πŸ”— Rewrites internal links for seamless offline browsing

πŸ“¦ Requirements

  • Python 3.8+
  • Google Chrome (latest version)
  • Python Packages:
    • undetected-chromedriver
    • requests
    • beautifulsoup4
    • webdriver-manager

πŸ› οΈ Installation

1. Clone the repository

git clone https://github.com/chromeheartbeat/Website_Cloner.git
cd Website_Cloner

2. (Optional) Create and activate a virtual environment
Linux/macOS:



python3 -m venv .venv
source .venv/bin/activate


3. Install dependencies
Using requirements file:

pip install -r requirements.txt

Or install individually:
pip install undetected-chromedriver requests beautifulsoup4 webdriver-manager

###  πŸ§ͺ Usage

python site_cloner.py

A Chrome browser will open and load the target website.

If a CAPTCHA appears, solve it manually.

Return to the terminal and press Enter to continue.

The website will be saved to the cloned_selenium_site folder.

βš™οΈ Customization
Change target URL:
Edit this line in site_cloner.py:
clone_website("https://example.com")

⚠️ Disclaimer
This tool is intended for educational and ethical use only.
Do not use to clone or scrape websites without explicit permission.
The author is not responsible for any misuse or legal issues.
Always respect websites' Terms of Service and copyright laws.

πŸ“ License
MIT License Β© [Solution]

πŸ™Œ Contribute
Feel free to submit issues, fork the repo, and send pull requests!
We welcome improvements, bug fixes, and new features πŸš€

About

A Python tool to recursively clone websites using Selenium with undetected-chromedriver, designed to bypass basic bot detection like Cloudflare challenges by enabling manual captcha solving. It downloads pages, assets (images, scripts, styles), and rewrites internal links for offline browsing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages