Skip to content

Fast and easy to use html dom parser written in PHP. It's build on top of php DOMDocument

License

Notifications You must be signed in to change notification settings

alex-michaud/html-dom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html-dom

Fast and easy to use html dom parser written in PHP. It's build on top of php DOMDocument

Require PHP 5.3+

Usage

Simply include the class in this classic way :

require_once('Html_dom.php');

Then load a dom document like this :

$html_dom = file_get_html('index.html');

You can also load a html string directly :

$html_dom = str_get_html('<ul><li>item 1</li><li>item 2</li><li>item 3</li></ul>');

Once you have the document loaded you can parse it, modify it and output the modified version.

Output

You can output the document using the save() method :

echo $html_dom->save();

You can also save the output in a file directly if you specify the file path :

$html_dom->save('/path/to/file.html');

Parse document

Parsing a document can be done with diffrent methods. The fastest one, if you have the element id, is getElementById(). The second fastest one is probably getElementsByTagName(). Finally, the general one where you can pass all kinds of selector is find().

Html_dom_node getElementById(string $elementId)

$contentElement = $html_dom->getElementById('content');

Html_dom_node_collection getElementsByTagName(string $tagName)

Html_dom_node getElementsByTagName(string $tagName, int $index)

$liElementCollection = $html_dom->getElementsByTagName('li');
$secondLiElement = $html_dom->getElementsByTagName('li', 1);

Html_dom_node_collection find(string $cssSelector)

Html_dom_node find(string $cssSelector, int $index)

$pElementCollection = $html_dom->find('p'); // array of all the "<p>" elements
$pElement = $html_dom->find('p', 0); // first "<p>" element
$pElement = $html_dom->find('p', 1); // second "<p>" element
$elementCollection = $html_dom->find('div.promo'); // array of DOM element "<div>" with attribute class="promo"
$element = $html_dom->find('#login', 0); // DOM element with attribute id="login"
$element = $html_dom->find('meta[name="description"]', 0); // DOM meta element with attribute name="description"
$element = $html_dom->find('ul', 0)->first_child(); // first child element under "<ul>" (sould be the first "<li>" element)
$element = $html_dom->find('ul', 0)->last_child(); // last child element under "<ul>" (sould be the last "<li>" element)
$liElementCollection = $html_dom->find('ul li'); // array of dom elements
$element = $html_dom->find('ul li')->offsetGet(2); // third element in the array

Retrieve data

Once we have a Html_dom_node or a Html_dom_node_collection, we can retrieve some data.

$ul_content = $html_dom->find('ul', 0)->innertext; // content of first "<ul>" element
$li_content = $html_dom->find('ul li', 1)->innertext; // content of second "<li>" element
$attrValue = $html_dom->find('a', 0)->href; // value of "href" attribute
$attrValue = $html_dom->find('a', 0)->my_custom_attribute; // value of "my_custom_attribute" attribute (will work for any attribute)

Modify document

You can modify the content of a Html_node or modify its attributes.

$html_dom->find('h1', 0)->innertext = 'New H1 title'; // replace H1 title
$html_dom->find('h1', 0)->innertext .= '!!!'; // add exclamations mark to H1 title
$html_dom->find('.menu_item')->addClass('class_test'); // find all the elements with class "menu_item" and add the class "class_test"
$html_dom->find('.menu_item')->class = 'class_test'; // find all the elements with class "menu_item" and replace the class by "class_test"
$html_dom->find('ul li')->removeClass('menu_item'); // find all the "<li>" elements under "<ul>" and remove the class "menu_item"
$html_dom->find('ul li', 0)->hasClass('menu_item'); // find the first "<li>" element under "<ul>" and verify if it has the class "menu_item" (return true or false)

// once you made some modifications, don't forget to output the results
echo $html_dom->save();

API

Class Html_dom

Methods
loadHTML(string $str)
loadHTMLFile(string $file_path)
setBasicAuth(string $username, string $password)

Example : 
$html_dom = new Html_dom();
$html_dom->setBasicAuth('username', 'secret_password');
$html_dom->loadHTMLFile('/path/to/file.html')
getElementById(string $elementId)
getElementsByTagName(string $tagName[, int $index])
save(string $file_path)
find(string $selector[, int $index])

Class Html_dom_node

Let's assume that we have a code that start with this

$html_dom = file_get_html('index.html');
$html_dom_node = $html_dom->getElementById('content');

string getTag()

$html_dom_node->getTag();
OR
$html_dom_node->tag;

string getInnerText()

$html_dom_node->getInnerText();
OR
$html_dom_node->innertext;

string getOuterText()

$html_dom_node->getOuterText();
OR
$html_dom_node->outertext;

string getAttr(string $attributeName)

$html_dom_node->getAttr(string $attributeName)
OR
$html_dom_node->attribute_name;

Examples : 
$html_dom_node->class;
$html_dom_node->id;
$html_dom_node->href;
$html_dom_node->title;
$html_dom_node->my_custom_attribute;

void setInnerText(string $value)

$html_dom_node->setInnerText($value);
OR
$html_dom_node->innertext = $value;

void setOuterText(string $value)

$html_dom_node->setOuterText($value);
OR
$html_dom_node->outertext = $value;

void append(string $value)

$html_dom_node->append($value);

void prepend(string $value)

$html_dom_node->prepend($value);

void addClass(string $class_name)

$html_dom_node->addClass($class_name);

void removeClass(string $class_name)

$html_dom_node->removeClass($class_name);

bool hasClass(string $class_name)

$html_dom_node->hasClass($class_name);

void setAttr(string $attributeName, string $value)

$html_dom_node->setAttr($attributeName, $value);
OR
$html_dom_node->attribute_name = $value;

Examples : 
$html_dom_node->class = 'my_class';
$html_dom_node->id = 'element_id';
$html_dom_node->href = 'www.example.com';
$html_dom_node->title = 'My title';
$html_dom_node->my_custom_attribute = 'my_custom_value';

boolean removeAttr(string $attributeName)

$html_dom_node->removeAttr($attributeName)

Html_dom_node first_child()

$firstChildElement = $html_dom_node->first_child();

Html_dom_node last_child()

$lastChildElement = $html_dom_node->last_child();

Html_dom_node previous_sibling()

$previousElement = $html_dom_node->previous_sibling();

Html_dom_node next_sibling()

$nextElement = $html_dom_node->next_sibling();

Html_dom_node_collection children()

$elementCollection = $html_dom_node->children();

Html_dom_node_collection siblings()

$elementCollection = $html_dom_node->siblings();

Html_dom_node parent()

$parentElement = $html_dom_node->parent();

mixed find(string $selector[, int $index])

$elementCollection = $html_dom_node->find('li');
$element = $html_dom_node->find('li', 0);

Html_dom_node getElementById(string $elementId)

$element = $html_dom_node->getElementById('content');

mixed getElementsByTagName(string $selector[, int $index])

$elementCollection = $html_dom_node->getElementsByTagName('li');
$element = $html_dom_node->getElementsByTagName('li', 0);

mixed remove()

$html_dom_node->remove();

void remove_childs()

$html_dom_node->remove_childs()

Class Html_dom_node_collection

This Class extends ArrayObject, so all the methods available with ArrayObject can be used here. PHP ArrayObject

Here is a list of the most common methods you might need.

Methods

integer count()

$html_dom_node_collection->count();

boolean offsetExists(mixed $index)

$html_dom_node_collection->offsetExists(mixed $index);

Html_dom_node offsetGet(mixed $index)

$html_dom_node_collection->offsetGet($index);

void offsetSet(mixed $index, mixed $value)

$html_dom_node_collection->offsetSet($index, $value);

void offsetUnset(mixed $index)

$html_dom_node_collection->offsetUnset($index);

You can also iterate in the array using the following methods

seek()
$html_dom_node_collection->seek();
rewind()
$html_dom_node_collection->rewind();
next()
$html_dom_node_collection->next();
current() // return the current Html_node
$html_dom_node_collection->current();
valid() // return a boolean
$html_dom_node_collection->valid();

You can also apply one the the Html_dom_node method to all the items of a Html_dom_node_collection

the examples below assume the we have loaded a document into $html_dom

$html_dom->find('ul li')->addClass('li_class'); // Will add the class "li_class" to all the "<li>" items
$html_dom->find('ul li')->removeClass('li_class'); // Will remove the class "li_class" to all the "<li>" items

About

Fast and easy to use html dom parser written in PHP. It's build on top of php DOMDocument

Resources

License

Stars

Watchers

Forks

Packages

No packages published