Skip to content

benbalter/sitemap-parser

Repository files navigation

Sitemap Parser

Ruby Gem to parse sitemaps.org compliant sitemaps

Build Status Gem Version

Usage

Create a new instance of the Parser:

sitemap = SitemapParser.new "http://ben.balter.com/sitemap.xml"

Extract the URLs of the sitemap

sitemap.urls # => Array of Nokigiri XML::Node objects
sitemap.to_a # => Array of url strings

Options

Recurse nested sitemaps

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true})

Or if you only want to extract only sitemap urls maching a given pattern, you can provide a regex that will be used to match each page.

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true, url_regex: /sitemapregex/})

Typhoeus Options

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', { userpwd: "username:password" })

About

Ruby Gem to parse sitemaps.org compliant sitemaps

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published