-
Getting Started
- Requirements
- Installing via composer
- Installing via archive
- Installing chromium executable
- Parsing your first page
- Search methods
- Working with text
- Contribute
- License
Also needs Headless Chromium PHP and chromium executable, if you want to use this library with headless browsing support(includes by standart in packagist version).
$ composer require shamanhead/phpporser
This might works on Windows, MacOs and Linux.
Headless chromium supports all chomium-based browsers, like Chrome, Opera, Chromium etc. I can recommend to use chromium instead of chrome, because of my observation he works better than chrome.So, go on the official chromium browser downloading page and download it.
After doing this step, unpack archive and move to necessary place.
Then, specify path in your script:
require_once "vendor/autoload.php";
use HeadlessChromium\Page;
use ShamanHead\PhpPorser\App\Dom as Dom;
$dom = new Dom();
$dom->setHref('file:///home/shamanhead/dev/porser/phpporser-master/test.html');
$dom->setBrowserPath('PATH_TO_CHROME');
If you done all right, parser would work. If you have any errors occuring during this step, you can go see here, is there solution to solve your problem. In other case, please, open new issue here or on Headless Chromium PHP page.
Huh, half of work done. So now, let's try to parse simple page, like Computer sciense on wikipedia. With the help of it, I will show all the capabilities of the parser.First of all, let's try to get 'Computer sciense' string on top of the page:
<?php
require_once "vendor/autoload.php";
use ShamanHead\PhpPorser\App\Dom as Dom;
$dom = new Dom();
$dom->setHref('https://en.wikipedia.org/wiki/Computer_science');
print_r($dom->tag('h1')->class('firstHeading')->text()->merge());
?>
It's works! But how? Let's me explain:
- Parser get's all tags with name 'h1'
- Then parser get's all tags with class 'firstHeading' in h1 tags range(and it's dependencies)
- Get's text from it
- Converts result array to string format
<?php
require_once "vendor/autoload.php";
use ShamanHead\PhpPorser\App\Dom as Dom;
$dom = new Dom();
$dom->setHref('href to file');
print_r($dom->tag('h1')->array()); //finds by tag name 'h1'
print_r($dom->id('firstHeading')->array()); //finds by id name 'firstHeading'
print_r($dom->class('wrapper__main')->array()); //finds by class name 'wrapper_main'
print_r($dom->custom(['name', 'button'])->array()); //finds by 'name' attribute value 'button'
?>
You can combine search methods with each other, to find elements in special way:
<?php
require_once "vendor/autoload.php";
use ShamanHead\PhpPorser\App\Dom as Dom;
$dom = new Dom();
$dom->setHref('href to file');
print_r($dom->class('main')->id('firstHeading')->tag('h1')->array());
?>
<?php
require_once "vendor/autoload.php";
use ShamanHead\PhpPorser\App\Dom as Dom;
$dom = new Dom();
$dom->setHref('href to file');
$divText = $dom->tag('div')->id('someDiv')->text();
$divText->contents(); //Returns all text in array form.
$divText->merge('symbol'); //Returns all text in string form with 'symbol' separator
//'\n' by default.
$divText->first(); //Returns first founded text.
$divText->last(); //Returns last founded text.
?>