A personal, no-limits site spider written in Python โ built out of necessity when every "free tool" out there was either limited, paid, or just didn't work well on macOS.
I needed to spider a site quickly and without restrictions, but:
- Most tools had limits or required paid plans
- Others didn't work smoothly on macOS
- I just needed a simple and fast way to extract links and crawl a site, saving everything in a
.csv
So I built my own โ with the help of AI and some Python magic.
- ๐ Parses HTML using BeautifulSoup
- ๐ Recursively crawls all internal links
- ๐พ Saves crawled URLs in a
results.csv
file - โก Powered by asyncio for insane speed ๐
- ๐ง Smart deduplication of URLs (no repeat crawls)
- ๐ฏ Designed for single-site deep crawling
- Python 3.7+
aiohttp
beautifulsoup4
Install with:
pip install aiohttp beautifulsoup4 lxml
Usage:
python spider.py