📷 Automate full website screenshots and PDF generation with multiple viewport support.
JavaScript
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
examples/screenshot-delay
lib
share/icons
test
.editorconfig
.gitattributes
.gitignore
.jscsrc
.jshintrc
.travis.yml
CHANGELOG.md
CODE_OF_CONDUCT.md
CONTRIBUTING.md
LICENSE
README.md docs(ReadMe): Add SimpleCrawler TypeError to the Troubleshooting section Nov 27, 2017
index.js
package-lock.json
package.json
siteshooter.svg
siteshooter.yml

README.md

Siteshooter

Siteshooter

NPM version Build Status dependencies

Automate full website screen shots and PDF generation with multiple view port support

Features

  • Crawls specified host and generates a sitemap.xml on the fly
  • Generates entire website screen shots based on sitemap.xml
  • Define multiple view ports
  • Automated PDF generation
  • Includes crawled meta data in generated PDF
  • Reports on broken website links (404 http response)
  • Supports HTTP basic authentication
  • Supports Microsoft Online 3 step authentication
  • Supports Salesforce Visualforce 3 step authentication
  • Supports site maps with HTTP, HTTPS, and FTP protocol URLs
  • Follows HTTP 301 redirects
  • Custom JavaScript inject file - injects into page prior to screen shooting
  • Trigger page events by passing querystring values to custom inject.js file

Do you need a website and workflow management platform?

Catapult website and workflow management platform Give Catapult a shot


In This Documentation

  1. Getting Started
  2. Siteshooter Configuration File
  3. CLI Options
  4. Tests
  5. Troubleshooting & FAQ

Getting Started

Dependencies

Install the following prerequisite on your development machine:

Notable npm Modules

Quick Start

$ npm install siteshooter --global

If siteshooter is installed, make sure you have the latest version by running:

$ npm update siteshooter --global
  • You may need to run these commands with elevated privileges, e.g. sudo, you will be prompted to do so if needed.
  • Installing with the --global flag affords you the siteshooter command on your machine's command line at any path.
  • Read more about the --global flag here.

Create a Siteshooter Configuration File

$ siteshooter --init

Update Siteshooter Configuration File

View the full siteshooter.yml example

Inside siteshooter.yml, add additional options.

  • All Simple Web Crawler options can be added to sitecrawler_options and will pass through to the crawler process
  • Generated screenshot image files are optimized using imagemin and imagemin-pngquant modules, which reduce the overall size of generated PDFs. To adjust the image quality, update the image_quality option in your siteshooter.yml file.
domain:
  name: https://www.devopsgroup.io
  auth:
    user:
    pwd:

pdf_options:
 excludeMeta: true

screenshot_options:
  delay: 2000
  image_quality: '60-80'

sitecrawler_options:
  exclude:
   - "pdf"
  stripQuerystring: false
  ignoreInvalidSSL: true

viewports:
 - viewport: desktop-large
   width: 1600
   height: 1200
 - viewport: tablet-landscape
   width: 1024
   height: 768
 - viewport: iPhone5
   width: 320
   height: 568
 - viewport: iPhone6
   width: 375
   height: 667

CLI Options

$ siteshooter --help

Usage: siteshooter [options]

OPTIONS
_______________________________________________________________________________________
-c --config            Show configuration
-C --cwd               Set working directory, which will load a siteshooter.yml file in the specified path
-e --debug             Output exceptions
-h --help              Print this help
-i --init              Create siteshooter.yml template file in working directory
-p --pdf               Generate PDFs, by defined view ports, based on screen shots created via Siteshooter
-q --quiet             Only return final output
-s --screenshots       Generate screen shots, by view ports, based on sitemap.xml file
-S --sitemap           Crawl domain name specified in siteshooter.yml file and generate a local sitemap.xml file
-v --version           Print version number
-V --verbose           Verbose output
-w --website           Report on website information based on Siteshooter crawled results

When running a siteshooter command without any options, the following options will run in order by default:

  • --sitemap
  • --screenshots
  • --pdf

Custom JavaScript Inject File

To manipulate the DOM, prior to the screen shot process, add a inject.js file in the same working directory as the siteshooter.yml.

Example: inject.js file

/**
 * @file:            inject.js
 * @description:     used to inject custom JavaScript into a web page prior to a screen shot. 
 */

console.log('JavaScript injected into page.');

if ( typeof(jQuery) !== "undefined" ) {

    jQuery(document).ready(function() {
        console.log('jQuery loaded.');
    });
}

Trigger JavaScript Events

When using the optional inject.js file, events can be triggered based on the following querystring parameter - pevent

 // Add URL with pevent querystring parameter in the generated sitemap.xml
<url>
    <loc>https://www.devopsgroup.io?pevent=open-privacy-overlay</loc>
    <changefreq>weekly</changefreq>
</url>

Example: Event detection & triggering

/**
 * @file:            inject.js
 * @description:     used to inject custom JavaScript into a web page prior to a screen shot. 
 */


function getQueryVariable(variable) {
    var query = window.location.search.substring(1);
    var vars = query.split('&');
    for (var i = 0; i < vars.length; i++) {
        var pair = vars[i].split('=');
        if (decodeURIComponent(pair[0]) == variable) {
            return decodeURIComponent(pair[1]);
        }
    }
}

if ( typeof(jQuery) !== "undefined" ) {

    jQuery(document).ready(function() {
        var pageName = window.location.pathname.replace('/', ''),
            pageEvent = getQueryVariable('pevent');

        console.log('document ready.');
        console.log('userAgent', navigator.userAgent);
        console.log('Page: ', pageName);
        console.log('Event: ', pageEvent);

        switch (pageName) {

            // home
            case '':

                switch (pageEvent) {
                    case 'open-privacy-overlay':

                        jQuery('a[data-target~="#modal-privacy"]').trigger('click');
                        break;
                }

                break;
        }

    });
}

Tests

Tests are written with Mocha and can be run with npm test.

Troubleshooting

If you're having issues with Siteshooter, submit a GitHub Issue.

  • Make sure you have a siteshooter.yml file in your working directory and the yaml file is well formatted
  • Experiencing font-loading issues? Try increasing the delay setting in your siteshooter.yml file
screenshot_options:
  delay: 2000
  • Trying to take a screenshot of a page with a video? Unfortunately, PhantomJS does not support videos. As such, here's one approach to showing a video's poster image.
/**
 * @file:            inject.js
 * @description:     used to display a video's poster image
 */

if( jQuery('video').length >0 ){
    jQuery('video').parent().prepend('<img src="'+jQuery('video').attr('poster')+'"/>');
    jQuery('video').remove();
}
  • SimpleCrawler TypeError: The header content contains invalid characters
    • Try setting the acceptCookies option to false
sitecrawler_options:
  acceptCookies: false

Code of Conduct

Take a moment to read or Code of Conduct

Contributing to the project

We are always looking for quality contributions! Please check the CONTRIBUTING.md for contribution guidelines.