Skip to content

Out of the box assets

avdgaag edited this page Jan 24, 2011 · 1 revision

The templates comes with a couple of standard assets out-of-the-box: a sitemap.xml file, a robots.txt file and a 404 error page.

The sitemap.xml file

The sitemap.xml file can be used by spiders to better index your website. Nanoc provides a handy xml_sitemap method for generating your sitemap -- but I do not really like creating a content/sitemap.xml file for my projects. I therefore opted to use the preprocessor to generate the file on the fly when compiling.

You can still use a custom file if you want, and really make it do what you want. Most of the time, you can get by with the defaults. Any obviously non-HTML pages are ignored in the sitemap (e.g. images, stylesheets, scripts) and xml_sitemap already ignores any file with a is_hidden: true-attribute.

The robots.txt file

The robots.txt file can be used to instruct spiders on how to index your site. It basically consists of several statements to allow or disallow certain parts of the site.

The robots.txt file is generated by the nanoc preprocessor and therefore has no file in the ./content dir. It comes with sensible defaults, but you can customise its output in config.yaml. The output looks like this:

User-agent: *
Disallow: /assets
Allow: /assets/images
Sitemap: /sitemap.xml

In your config.yaml you can list specific paths to allow or disallow:

robots:
  disallow:
    - '/tag'
    - '/newsletter'
  allow:
    - '/tag/foo'
  sitemap: '/site-map.txt'

Note that if the order of the declarations really matters to you, or you want to do something really fancy, you're better off just writing a ./content/robots.txt file manually.

The 404 page

The template comes with a 404 page by default. It has a sensible error message, which you might want to customize. The server is instructed to use it in the .htaccess file.