Skip to content
Crawl a website and take screenshots
Branch: master
Clone or download
Latest commit 3d8c32b Jun 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
lib version bump 0.2.8 Jun 14, 2019
spec
.gitignore complete test coverage Jun 14, 2019
.rspec
.travis.yml
Gemfile fix tests and switch to rspec_fixtures Oct 17, 2018
LICENSE
README.md
Rakefile
Runfile
snapcrawl.gemspec upgrade colsole to fix windows command_exist Apr 16, 2019

README.md

Snapcrawl - crawl a website and take screenshots

Build Status Gem Version Code Climate


Snapcrawl is a command line utility for crawling a website and saving screenshots.

Features

  • Crawls a website to any given depth and save screenshots
  • Can capture the full length of the page
  • Can use a specific resolution for screenshots
  • Skips capturing if the screenshot was already saved recently
  • Uses local caching to avoid expensive crawl operations if not needed
  • Reports broken links

Prerequisites

Snapcrawl requires PhantomJS and ImageMagick.

Docker Image

You can run Snapcrawl by using this docker image (which contains all the necessary prerequisites):

$ docker pull dannyben/snapcrawl

Then you can use it like this:

$ docker run --rm -it dannyben/snapcrawl --help

For more information refer to the docker-snapcrawl repository.

Install

$ gem install snapcrawl

Usage

$ snapcrawl --help

Snapcrawl

Usage:
  snapcrawl go URL [options]
  snapcrawl -h | --help 
  snapcrawl -v | --version

Options:
  -f, --folder PATH
    Where to save screenshots [default: snaps]

  -n, --name TEMPLATE
    Filename template. Include the string '%{url}' anywhere in the name to 
    use the captured URL in the filename [default: %{url}]

  -a, --age SECONDS
    Number of seconds to consider screenshots fresh [default: 86400]

  -d, --depth LEVELS
    Number of levels to crawl [default: 1]

  -W, --width PIXELS
    Screen width in pixels [default: 1280]

  -H, --height PIXELS
    Screen height in pixels. Use 0 to capture the full page [default: 0]

  -s, --selector SELECTOR
    CSS selector to capture

  -o, --only REGEX
    Include only URLs that match REGEX

  -h, --help
    Show this screen

  -v, --version
    Show version number

Examples:
  snapcrawl go example.com
  snapcrawl go example.com -d2 -fscreens
  snapcrawl go example.com -d2 > out.txt 2> err.txt &
  snapcrawl go example.com -W360 -H480
  snapcrawl go example.com --selector "#main-content"
  snapcrawl go example.com --only "products|collections"
  snapcrawl go example.com --name "screenshot-%{url}"
  snapcrawl go example.com --name "`date +%Y%m%d`_%{url}"

You can’t perform that action at this time.