Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
94 lines (56 sloc) 2.62 KB

Spidertrap

Website

https://bitbucket.org/ethanr/spidertrap

Description

Trap web crawlers and spiders in an infinite set of dynamically generated webpages.

Install Location

/opt/spidertrap/

Usage

/opt/spidertrap$ python2 spidertrap.py --help

    Usage: spidertrap.py [FILE]

    FILE is file containing a list of webpage names to serve, one per line.
    If no file is provided, random links will be generated.

Video Walkthrough

<iframe src="https://onedrive.live.com/embed?cid=8D6C4317A39E3D29&resid=8D6C4317A39E3D29%2155687&authkey=AGAqCxGSjW6oLiw" width="320" height="200" frameborder="0" scrolling="no" allowfullscreen sandbox="allow-scripts allow-pointer-lock allow-forms allow-same-origin"></iframe>

Example 1: Basic Usage

Start Spidertrap by opening a terminal, changing into the Spidertrap directory, and typing the following:

/opt/spidertrap$ python2 spidertrap.py

    Starting server on port 8000...

    Server started. Use <Ctrl-C> to stop.

Then visit http://127.0.0.1:8000 in a web browser. You should see a page containing randomly generated links. If you click on a link it will take you to a page with more randomly generated links.

Spidertrap Random Links 1 Spidertrap Random Links 2

Example 2: Providing a List of Links

Start Spidertrap. This time give it a file to use to generate its links.

/opt/spidertrap$ python2 spidertrap.py big.txt

    Starting server on port 8000...

    Server started. Use <Ctrl-C> to stop.

Then visit http://127.0.0.1:8000 in a web browser. You should see a page containing links taken from the file. If you click on a link it will take you to a page with more links from the file.

Spidertrap List Links 1 Spidertrap List Links 2

Example 3: Trapping a Wget Spider

Follow the instructions in [Example 1: Basic Usage] or [Example 2: Providing a List of Links] to start Spidertrap. Then open a new terminal and tell wget to mirror the website. Wget will run until either it or Spidertrap is killed. Type Ctrl-c to kill wget.

$ sudo wget -m http://127.0.0.1:8000

    --2013-01-14 12:54:15-- http://127.0.0.1:8000/

    Connecting to 127.0.0.1:8000... connected.

    HTTP request sent, awaiting response... 200 OK

    <<<snip>>>

    HTTP request sent, awaiting response... ^C
You can’t perform that action at this time.