Every repository with this icon (
Every repository with this icon (
| name | age | message | |
|---|---|---|---|
| |
.gitignore | Mon Mar 09 23:26:08 -0700 2009 | |
| |
History.txt | Mon Sep 07 07:31:25 -0700 2009 | |
| |
LICENSE | Mon Mar 09 23:20:55 -0700 2009 | |
| |
README.rdoc | Mon Sep 07 07:18:55 -0700 2009 | |
| |
Rakefile | Mon Mar 09 23:20:55 -0700 2009 | |
| |
VERSION.yml | Mon Sep 07 07:30:25 -0700 2009 | |
| |
big_sitemap.gemspec | Mon Sep 07 07:31:28 -0700 2009 | |
| |
lib/ | Mon Sep 07 07:30:12 -0700 2009 | |
| |
test/ | Mon Sep 07 07:18:55 -0700 2009 |
BigSitemap
BigSitemap is a Sitemap (sitemaps.org) generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, can be set up with just a few lines of code and is compatible with just about any framework.
BigSitemap is best run periodically through a Rake/Thor task.
require 'big_sitemap'
sitemap = BigSitemap.new(:url_options => {:host => 'example.com'})
# Add a model
sitemap.add Product
# Add another model with some options
sitemap.add(Post, {
:conditions => {:published => true},
:path => 'articles',
:change_frequency => 'daily',
:priority => 0.5
})
# Generate the files
sitemap.generate
The code above will create a minimum of three files:
- public/sitemaps/sitemap_index.xml.gz
- public/sitemaps/sitemap_products.xml.gz
- public/sitemaps/sitemap_posts.xml.gz
If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the :max_per_sitemap option), the sitemap files will be partitioned into multiple files (sitemap_products_1.xml.gz, sitemap_products_2.xml.gz, …).
If you’re using Rails then the URLs for each database record are generated with the polymorphic_url helper. That means that the URL for a record will be exactly what you would expect: generated with respect to the routing setup of your app. In other contexts where this helper isn’t available, the URLs are generated in the form:
:base_url/:path/:to_param
If the to_param method does not exist, then id will be used.
Install
Via gem:
sudo gem install alexrabarts-big_sitemap -s http://gems.github.com
Advanced
Options
- :url_options — hash with :host, optionally :port and :protocol
- :base_url — string alternative to :url_options, e.g. "example.com:8080/"
- :document_root — string defaults to Rails.root or Merb.root if available
- :path — string defaults to ‘sitemaps’, which places sitemap files under the /sitemaps directory
- :max_per_sitemap — 50000, which is the limit dictated by Google but can be less
- :batch_size — 1001 (not 1000 due to a bug in DataMapper)
- :gzip — true
- :ping_google — true
- :ping_yahoo — false, needs :yahoo_app_id
- :ping_bing — false
- :ping_ask — false
Chaining
You can chain methods together. You could even get away with as little code as:
BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate
Pinging Search Engines
To ping search engines, call ping_search_engines after you generate the sitemap:
sitemap.generate sitemap.ping_search_engines
Change Frequency, Priority and Last Modified
You can control "changefreq", "priority" and "lastmod" values for each record individually by passing lambdas instead of fixed values:
sitemap.add(Posts,
:change_frequency => lambda {|post| ... },
:priority => lambda {|post| ... },
:last_modified => lambda {|post| ... }
)
Find Methods
Your models must provide either a find_for_sitemap or all class method that returns the instances that are to be included in the sitemap.
Additionally, you models must provide a count_for_sitemap or count class method that returns a count of the instances to be included.
If you’re using ActiveRecord (Rails) or DataMapper then all and count are already provided and you don’t need to do anything unless you want to include a subset of records. If you provide your own find_for_sitemap or all method then it should be able to handle the :offset and :limit options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
Cleaning the Sitemaps Directory
Calling the clean method will remove all files from the Sitemaps directory.
Limitations
If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
TODO
Tests for Rails components.
Credits
Thanks to Alastair Brunton and Harry Love, who’s work provided a starting point for this library. scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
Thanks to those who have contributed patches:
- Mislav Marohnić
- Jeff Schoolcraft
- Dalibor Nasevic
Copyright
Copyright © 2009 Stateless Systems (statelesssystems.com). See LICENSE for details.







