Skip to content

Google Sitemap is a Sitemap standard that is supported by Ask.com, Google, YAHOO and MSN Search. This library can read in such Sitemaps and parse all urls from them.

License

AdSpray/googlesitemapparser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Sitemap-Parser

An easy-to-use library to parse sitemaps compliant with the Google Standard

Install

Install via composer:

{
    "require": {
        "tzfrs/googlesitemapparser": "1.0.*"
    }
}

Run composer install or composer update.

Getting Started

Basic parsing

Parses the data from the sitemap.xml of your server. Supports .xml and text format

<?php
require __DIR__ . '/vendor/autoload.php';

use \tzfrs\GoogleSitemapParser;
use \tzfrs\Exceptions\GoogleSitemapParserException;

try {
    $posts = (new GoogleSitemapParser('http://tzfrs.de/sitemap.xml'))->parse();
    foreach ($posts as $post) {
        print $post . '<br>';
    }
} catch (GoogleSitemapParserException $e) {
    print $e->getMessage();
}

Parsing from robots.txt

Searches for Sitemap entries in the robots.txt and parses those files. Also downloads/extracts gzip compressed sitemaps and searches for them

<?php
require __DIR__ . '/vendor/autoload.php';

use \tzfrs\GoogleSitemapParser;
use \tzfrs\Exceptions\GoogleSitemapParserException;

try {
    $posts = (new GoogleSitemapParser('http://www.sainsburys.co.uk/robots.txt'))->parseFromRobots();
    foreach ($posts as $post) {
        print $post . '<br>';
    }
} catch (GoogleSitemapParserException $e) {
    print $e->getMessage();
}

Including the priority for the sitemap entry in the response

If you also want to get the priority of a sitemap set the 2nd parameter of the constructor to true If the priority can't be found or is not set in the file an empty string will be returned.

<?php
require __DIR__ . '/vendor/autoload.php';

use \tzfrs\GoogleSitemapParser;
use \tzfrs\Exceptions\GoogleSitemapParserException;
try {
    $posts = (new GoogleSitemapParser('http://www.sainsburys.co.uk/robots.txt', true))->parseFromRobots();
    foreach ($posts as $post=>$priority) {
        print 'URL: '. $post . '<br>Priority: '. $priority . '<hr>';
    }
} catch (GoogleSitemapParserException $e) {
    print $e->getMessage();
}

Parsing compressed sitemaps

If you have an URL to a compressed sitemap such as example.com/sitemap.xml.gz then you need to use this method

<?php
require __DIR__ . '/vendor/autoload.php';

use \tzfrs\GoogleSitemapParser;
use \tzfrs\Exceptions\GoogleSitemapParserException;
try {
    $posts = (new GoogleSitemapParser('http://www.sainsburys.co.uk/wcsstore/robots/sitemap_10151_4.xml.gz'))->parseCompressed();
    foreach ($posts as $post=>$priority) {
        print 'URL: '. $post . '<br>Priority: '. $priority . '<hr>';
    }
} catch (GoogleSitemapParserException $e) {
    print $e->getMessage();
}

Methods

parse
parseFromRobots

Contributing is surely allowed! :-)

See the file LICENSE for licensing informations

About

Google Sitemap is a Sitemap standard that is supported by Ask.com, Google, YAHOO and MSN Search. This library can read in such Sitemaps and parse all urls from them.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 100.0%