-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refacttored HTML5 class & added HTML5 Helper with static methods #37
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,100 +1,107 @@ | ||
<?php | ||
/** | ||
* The main HTML5 front end. | ||
*/ | ||
use HTML5\Parser\StringInputStream; | ||
|
||
use HTML5\Parser\FileInputStream; | ||
use HTML5\Parser\StringInputStream; | ||
use HTML5\Parser\DOMTreeBuilder; | ||
use HTML5\Parser\Scanner; | ||
use HTML5\Parser\Tokenizer; | ||
use HTML5\Parser\DOMTreeBuilder; | ||
use HTML5\Serializer\OutputRules; | ||
use HTML5\Serializer\Traverser; | ||
|
||
/** | ||
* This class offers convenience methods for parsing and serializing HTML5. | ||
* It is roughly designed to mirror the \DOMDocument class that is | ||
* It is roughly designed to mirror the \DOMDocument class that is | ||
* provided with most versions of PHP. | ||
* | ||
* EXPERIMENTAL. This may change or be completely replaced. | ||
*/ | ||
class HTML5 { | ||
|
||
class HTML5 | ||
{ | ||
/** | ||
* Global options for the parser and serializer. | ||
* @var array | ||
*/ | ||
public static $options = array( | ||
|
||
private $options = array( | ||
// If the serializer should encode all entities. | ||
'encode_entities' => FALSE, | ||
'encode_entities' => FALSE | ||
); | ||
|
||
private $errors = array(); | ||
|
||
public function __construct(array $options = array()) { | ||
$this->options = array_merge($this->options, $options); | ||
} | ||
/** | ||
* Get the default options. | ||
* | ||
* @return array | ||
* The default options. | ||
*/ | ||
public function getOptions() { | ||
return $this->options; | ||
} | ||
/** | ||
* Load and parse an HTML file. | ||
* | ||
* This will apply the HTML5 parser, which is tolerant of many | ||
* varieties of HTML, including XHTML 1, HTML 4, and well-formed HTML | ||
* 3. Note that in these cases, not all of the old data will be | ||
* This will apply the HTML5 parser, which is tolerant of many | ||
* varieties of HTML, including XHTML 1, HTML 4, and well-formed HTML | ||
* 3. Note that in these cases, not all of the old data will be | ||
* preserved. For example, XHTML's XML declaration will be removed. | ||
* | ||
* The rules governing parsing are set out in the HTML 5 spec. | ||
* | ||
* @param string $file | ||
* The path to the file to parse. If this is a resource, it is | ||
* assumed to be an open stream whose pointer is set to the first | ||
* The path to the file to parse. If this is a resource, it is | ||
* assumed to be an open stream whose pointer is set to the first | ||
* byte of input. | ||
* @return \DOMDocument | ||
* A DOM document. These object type is defined by the libxml | ||
* A DOM document. These object type is defined by the libxml | ||
* library, and should have been included with your version of PHP. | ||
*/ | ||
public static function load($file) { | ||
|
||
public function load($file) { | ||
// Handle the case where file is a resource. | ||
if (is_resource($file)) { | ||
// FIXME: We need a StreamInputStream class. | ||
return static::loadHTML(stream_get_contents($file)); | ||
return $this->loadHTML(stream_get_contents($file)); | ||
} | ||
|
||
$input = new FileInputStream($file); | ||
return static::parse($input); | ||
return $this->parse($input); | ||
} | ||
|
||
/** | ||
* Parse a HTML Document from a string. | ||
* | ||
* Take a string of HTML 5 (or earlier) and parse it into a | ||
* | ||
* Take a string of HTML 5 (or earlier) and parse it into a | ||
* DOMDocument. | ||
* | ||
* @param string $string | ||
* A html5 document as a string. | ||
* @return \DOMDocument | ||
* A DOM document. DOM is part of libxml, which is included with | ||
* A DOM document. DOM is part of libxml, which is included with | ||
* almost all distribtions of PHP. | ||
*/ | ||
public static function loadHTML($string) { | ||
public function loadHTML($string) { | ||
$input = new StringInputStream($string); | ||
return static::parse($input); | ||
return $this->parse($input); | ||
} | ||
|
||
/** | ||
* Convenience function to load an HTML file. | ||
* | ||
* This is here to provide backwards compatibility with the | ||
* PHP DOM implementation. It simply calls load(). | ||
* | ||
* @param string $file | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You changed the param to $string There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
* The path to the file to parse. If this is a resource, it is | ||
* assumed to be an open stream whose pointer is set to the first | ||
* The path to the file to parse. If this is a resource, it is | ||
* assumed to be an open stream whose pointer is set to the first | ||
* byte of input. | ||
* | ||
* @return \DOMDocument | ||
* A DOM document. These object type is defined by the libxml | ||
* A DOM document. These object type is defined by the libxml | ||
* library, and should have been included with your version of PHP. | ||
*/ | ||
public static function loadHTMLFile($file, $options = NULL) { | ||
return static::load($file, $options); | ||
public function loadHTMLFile($file) { | ||
return $this->load($file); | ||
} | ||
|
||
/** | ||
* Parse a HTML fragment from a string. | ||
* | ||
|
@@ -105,11 +112,62 @@ public static function loadHTMLFile($file, $options = NULL) { | |
* A DOM fragment. The DOM is part of libxml, which is included with | ||
* almost all distributions of PHP. | ||
*/ | ||
public static function loadHTMLFragment($string) { | ||
public function loadHTMLFragment($string) { | ||
$input = new StringInputStream($string); | ||
return static::parseFragment($input); | ||
return $this->parseFragment($input); | ||
} | ||
/** | ||
* Return all errors encountered into parsing phase | ||
* @return array | ||
*/ | ||
public function getErrors() { | ||
return $this->errors; | ||
} | ||
/** | ||
* Return true it some errors were encountered into parsing phase | ||
* @return bool | ||
*/ | ||
public function hasErrors() { | ||
return count($this->errors)>0; | ||
} | ||
|
||
/** | ||
* Parse an input stream. | ||
* | ||
* Lower-level loading function. This requires an input stream instead | ||
* of a string, file, or resource. | ||
*/ | ||
public function parse(\HTML5\Parser\InputStream $input) { | ||
$this->errors = array(); | ||
$events = new DOMTreeBuilder(); | ||
$scanner = new Scanner($input); | ||
$parser = new Tokenizer($scanner, $events); | ||
|
||
$parser->parse(); | ||
|
||
$document = $events->document(); | ||
|
||
if($document){ | ||
$this->errors = $document->errors; | ||
} | ||
|
||
return $document; | ||
} | ||
/** | ||
* Parse an input stream where the stream is a fragment. | ||
* | ||
* Lower-level loading function. This requires an input stream instead | ||
* of a string, file, or resource. | ||
*/ | ||
public function parseFragment(\HTML5\Parser\InputStream $input) { | ||
$events = new DOMTreeBuilder(TRUE); | ||
$scanner = new Scanner($input); | ||
$parser = new Tokenizer($scanner, $events); | ||
|
||
$parser->parse(); | ||
|
||
return $events->fragment(); | ||
} | ||
/** | ||
* Save a DOM into a given file as HTML5. | ||
* | ||
|
@@ -120,19 +178,19 @@ public static function loadHTMLFragment($string) { | |
* @param array $options | ||
* Configuration options when serializing the DOM. These include: | ||
* - encode_entities: Text written to the output is escaped by default and not all | ||
* entities are encoded. If this is set to TRUE all entities will be encoded. | ||
* Defaults to FALSE. | ||
* entities are encoded. If this is set to TRUE all entities will be encoded. | ||
* Defaults to FALSE. | ||
*/ | ||
public static function save($dom, $file, $options = array()) { | ||
$options = $options + static::options(); | ||
public function save($dom, $file, $options = array()) { | ||
$close = TRUE; | ||
if (is_resource($file)) { | ||
$stream = $file; | ||
$close = FALSE; | ||
} | ||
} | ||
else { | ||
$stream = fopen($file, 'w'); | ||
} | ||
$options = array_merge($this->getOptions(), $options); | ||
$rules = new OutputRules($stream, $options); | ||
$trav = new Traverser($dom, $stream, $rules, $options); | ||
|
||
|
@@ -142,7 +200,6 @@ public static function save($dom, $file, $options = array()) { | |
fclose($stream); | ||
} | ||
} | ||
|
||
/** | ||
* Convert a DOM into an HTML5 string. | ||
* | ||
|
@@ -151,70 +208,15 @@ public static function save($dom, $file, $options = array()) { | |
* @param array $options | ||
* Configuration options when serializing the DOM. These include: | ||
* - encode_entities: Text written to the output is escaped by default and not all | ||
* entities are encoded. If this is set to TRUE all entities will be encoded. | ||
* Defaults to FALSE. | ||
* entities are encoded. If this is set to TRUE all entities will be encoded. | ||
* Defaults to FALSE. | ||
* | ||
* @return string | ||
* A HTML5 documented generated from the DOM. | ||
*/ | ||
public static function saveHTML($dom, $options = array()) { | ||
public function saveHTML($dom, $options = array()) { | ||
$stream = fopen('php://temp', 'w'); | ||
static::save($dom, $stream, $options); | ||
return stream_get_contents($stream, -1, 0); | ||
} | ||
|
||
/** | ||
* Parse an input stream. | ||
* | ||
* Lower-level loading function. This requires an input stream instead | ||
* of a string, file, or resource. | ||
*/ | ||
public static function parse(\HTML5\Parser\InputStream $input) { | ||
$events = new DOMTreeBuilder(); | ||
$scanner = new Scanner($input); | ||
$parser = new Tokenizer($scanner, $events); | ||
|
||
$parser->parse(); | ||
|
||
return $events->document(); | ||
} | ||
|
||
/** | ||
* Parse an input stream where the stream is a fragment. | ||
* | ||
* Lower-level loading function. This requires an input stream instead | ||
* of a string, file, or resource. | ||
*/ | ||
public static function parseFragment(\HTML5\Parser\InputStream $input) { | ||
$events = new DOMTreeBuilder(TRUE); | ||
$scanner = new Scanner($input); | ||
$parser = new Tokenizer($scanner, $events); | ||
|
||
$parser->parse(); | ||
|
||
return $events->fragment(); | ||
$this->save($dom, $stream, array_merge($this->getOptions(), $options)); | ||
return stream_get_contents($stream, - 1, 0); | ||
} | ||
|
||
/** | ||
* Get the default options. | ||
* | ||
* @return array | ||
* The default options. | ||
*/ | ||
public static function options() { | ||
return static::$options; | ||
} | ||
|
||
/** | ||
* Set a default option. | ||
* | ||
* @param string $name | ||
* The option name. | ||
* @param mixed $value | ||
* The option value. | ||
*/ | ||
public static function setOption($name, $value) { | ||
static::$options[$name] = $value; | ||
} | ||
|
||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason
$options
is private rather than protected? Isn't protected more appropriate?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected
equals to "hidden" dependency.When someone extends
HTML5
class, we can't change it's internal implementation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the visibility is private (where only the class that defines it can access it) methods that use it can't be overridden and still have access to the options (including the constructor). An extending class can't change the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can replace all uses of
$this->options
with$this->getOptions()
. Can be a solution?You can overwrite getOptions method changing the default values..