Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
The typical use of html5-php is to parse html5 to a DOM or to turn a DOM into html5.
To create a new HTML5 parser just write
// composer autoload require "vendor/autoload.php"; use Masterminds\HTML5; $html5 = new HTML5($options);
The three ways to easily parse html5 are html5 strings, html5 files, and html5 fragments.
Parsing html5 strings
// An example HTML document: $html = <<< 'HERE' <html> <head> <title>TEST</title> </head> <body id='foo'> <h1>Hello World</h1> <p>This is a test of the HTML5 parser.</p> </body> </html> HERE; // Parse the document. $dom is a DOMDocument. $dom = $html5->loadHTML($html);
DOMDocument is the same object returned when parsing html4, xml, and xhtml with the built in tools from libxml.
Parsing html5 files
Parsing a file or resource can happen without loading the markup to a string.
// Parse the document. $dom is a DOMDocument. $dom = $html5->loadHTMLFile('path/to/file.html');
Parsing html5 fragments
// An example HTML fragment: $fragment = "<p>This is a test of the HTML5 parser.</p>"; // Parse the document. $dom is a DOMDocumentFragment. $dom = $html5->loadHTMLFragment($fragment);
DOMDocumentFragment is similar to
DOMDocument in that it is a container for elements. DOMDocumentFragments can be attached to DOMDocuments. When that happens all the children are moved to the DOMDocument.
The serializer can write DOMDocuments and DOMDocumentFragments to strings and files.
Writing to a string
// $dom is either a DOMDocument, DOMDocumentFragment, or DOMNodeList. $string = $html5->saveHTML($dom);
Writing to a file
// $dom is either a DOMDocument, DOMDocumentFragment, or DOMNodeList. $string = $html5->save($dom, 'path/to/file.html');
html5 has a long list of entities to encode going beyond the typical use cases. These include characters like periods, commas, and thousands of other common characters. There is an option of whether to encode the entire list or whether to encode only the basics as done by
htmlspecialchars. The default is only the basic characters.
To change the default value to encode all entities:
$html5 = new HTML5(array('encode_entities' => TRUE));
To encode all entities at call time:
// $dom is either a DOMDocument, DOMDocumentFragment, or DOMNodeList. $string = $html5->saveHTML($dom, array('encode_entities' => TRUE));