Skip to content

Mid90sAhsan/PythonSpider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ•ท๏ธ Fast Async Python Site Spider

A personal, no-limits site spider written in Python โ€” built out of necessity when every "free tool" out there was either limited, paid, or just didn't work well on macOS.

๐Ÿš€ Why I Built This

I needed to spider a site quickly and without restrictions, but:

  • Most tools had limits or required paid plans
  • Others didn't work smoothly on macOS
  • I just needed a simple and fast way to extract links and crawl a site, saving everything in a .csv

So I built my own โ€” with the help of AI and some Python magic.


โš™๏ธ Features

  • ๐Ÿ”— Parses HTML using BeautifulSoup
  • ๐Ÿ” Recursively crawls all internal links
  • ๐Ÿ’พ Saves crawled URLs in a results.csv file
  • โšก Powered by asyncio for insane speed ๐Ÿš€
  • ๐Ÿง  Smart deduplication of URLs (no repeat crawls)
  • ๐ŸŽฏ Designed for single-site deep crawling

๐Ÿงฐ Requirements

  • Python 3.7+
  • aiohttp
  • beautifulsoup4

Install with:

pip install aiohttp beautifulsoup4 lxml

Usage:

python spider.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages