A grunt task that takes html snapshots from websites. Useful to make ajax sites crawlable
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
phantomjs
tasks added options for the page object of phantomJS Mar 4, 2015
.gitignore
LICENSE.md
README.md
package.json

README.md

grunt-html-snapshot

Makes it easy to provide html snapshots for client side applications so that they can be indexed by web crawlers

Getting Started

This plugin requires Grunt ~0.4.0

If you haven't used Grunt before, be sure to check out the Getting Started guide, as it explains how to create a Gruntfile as well as install and use Grunt plugins. Once you're familiar with that process, you may install this plugin with this command:

npm install grunt-html-snapshot --save-dev

Once the plugin has been installed, it may be enabled inside your Gruntfile with this line of JavaScript:

grunt.loadNpmTasks('grunt-html-snapshot');

htmlSnapshot task

Run this task with the grunt htmlSnapshot command.

configuring the htmlSnapshot task

    grunt.initConfig({
        htmlSnapshot: {
            all: {
              options: {
                //that's the path where the snapshots should be placed
                //it's empty by default which means they will go into the directory
                //where your Gruntfile.js is placed
                snapshotPath: 'snapshots/',
                //This should be either the base path to your index.html file
                //or your base URL. Currently the task does not use it's own
                //webserver. So if your site needs a webserver to be fully
                //functional configure it here.
                sitePath: 'http://localhost:8888/my-website/',
                //you can choose a prefix for your snapshots
                //by default it's 'snapshot_'
                fileNamePrefix: 'sp_',
                //by default the task waits 500ms before fetching the html.
                //this is to give the page enough time to to assemble itself.
                //if your page needs more time, tweak here.
                msWaitForPages: 1000,
                //sanitize function to be used for filenames. Converts '#!/' to '_' as default
                //has a filename argument, must have a return that is a sanitized string
                sanitize: function (requestUri) {
                    //returns 'index.html' if the url is '/', otherwise a prefix
                    if (/\/$/.test(requestUri)) {
                      return 'index.html';
                    } else {
                      return requestUri.replace(/\//g, 'prefix-');
                    }
                },
                //if you would rather not keep the script tags in the html snapshots
                //set `removeScripts` to true. It's false by default
                removeScripts: true,
                //set `removeLinkTags` to true. It's false by default
                removeLinkTags: true,
                //set `removeMetaTags` to true. It's false by default
                removeMetaTags: true,
                //Replace arbitrary parts of the html
                replaceStrings:[
                    {'this': 'will get replaced by this'},
                    {'/old/path/': '/new/path'}
                ],
                // allow to add a custom attribute to the body
                bodyAttr: 'data-prerendered',
                //here goes the list of all urls that should be fetched
                urls: [
                  '',
                  '#!/en-gb/showcase'
                ],
                // a list of cookies to be put into the phantomjs cookies jar for the visited page
                cookies: [
                  {"path": "/", "domain": "localhost", "name": "lang", "value": "en-gb"}
                ],
				// options for phantomJs' page object
				// see http://phantomjs.org/api/webpage/ for available options
				pageOptions: {
					viewportSize : {
						width: 1200,
						height: 800
					}
				}
              }
            }
        }
    });

Release History

  • 0.6.1 - trigger warnings with grunt.warn(msg, 6) instead of grunt.log(msg)
  • 0.6.0 - Provide a function hook for the file name sanitization (by @mrgamer)
  • 0.5.0 - Add option to set cookies. Also fixed a bug for scenarios where multiple instances of the tasks are being used in parallel.
  • 0.4.0 - Add more sophisticated replace functionality to transform the html output (thanks to @okcoker)
  • 0.3.0 - Escape tabs & introduced new option bodyAttr to place a custom attribute on the body
  • 0.2.1 - fixed a bug where quotes where missing from the html
  • 0.2.0 - added option to remove script tags from the output
  • 0.1.0 - Initial release