Skip to content
Sitemap xml generator
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Sitemap xml generator in posix shell.


This script crawls a web site from a given starting local URL and generates a Sitemap file in the format that is accepted by Google. It does not follow links to other web sites or parent directory.



 $ [-r|--remote <url>] [-l|--locale <url>] [-p|--priority <number>] [-f|--frequency <string>] [-i|--index <string>] [-d|--docroot <path>] [-A|--accept <list>] [-R|-reject <list>] [-o|--output-file] [-6] [-h|--help] [-v|--version] [-vv|--verbose|--debug]


 $ -l https://localhost/foobar/ -r -d /home/html/foobar -p 0.8 -f daily


-r|--remote <url>           Set the remote URL.

-l|--local <url>            Set the local URL (ex. http://localhost/foobar/ )
                            Not with filename (ex. http://localhost/foo/bar.php)

-p|--priority <value>       Set the priority. Valid values range from 0.0 to 1.0.
                            Default is "0.5".

-f|--frequency <value>      Set the frequency. Valid values are:
                            always, hourly, daily, weekly, monthly, yearly, never
                            Default is "weekly".

-i|--index <filename>       Set the name of index file.
                            The default filename is "index.php".

-d|--docroot <path>         Set dhe "Doc Root".

-A|--accept <list>          Comma-separated list of accepted extensions.
                            Default is all.

-R|--reject <list>          Comma-separated list of rejected extensions.
                            Default is nothing.

-o|--output-file <filename> Set the name of the geneated sitemap file.
                            The default file name is sitemap.xml.

-6                          Set the inet6-only to wget.
                            Connect only to IPv6 addresses.

-ssl|--check-ssl            Check if there are duplicate URLs with http and https.

-vv|--verbose               Print details when crawling with wget.

--debug                     Set bash to debug mode (-x)

-v|--version                Print version.

-h|--help                   Print this help and exit.


Simple copy file in $PATH and

$ chmod +x


This script requires this command: wget, sed, awk, grep, cut and sort. Optional: tee (for verbose mode), id or whoami (for root user check).


THIS IS ONLY A TESTING SCRIPT to generate sitemap in my situation.

It was written quickly, so it has errors and ugliness of course.

It is here because I need a public place to keep it, but if you need a sitemap generator try one of this:


If you have advice and suggestions to give, you are welcome.

I'm sorry for my bad english

You can’t perform that action at this time.