-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding two options required for compatibility with 3rd party components #69
Conversation
New Options : * implicitHtmlNamespace = Allows the use of createElement instead of createElementNS for HTML elements. This is required for compatibility with PHPPowertools/DOM-Query and for compatibility with symfony/CssSelector. * target = allows an existing DOMDocument (or subclass thereof) to be passsed to the DOMTreeBuilder instead of creating a new one. This option is required for compatibility with PHPPowertools/DOM-Query
If option implicitHtmlNamespace is undefined, the following error is generated : *Undefined index: implicitHtmlNamespace* This patch fixes that error.
Changing version number of composer file to avoid potential version conflicts when using third party code that needs the two new options
The conversation about adding namespaces to the elements happened in #43, #44, and #45. This follows the HTML5 spec more closely. So, I'm not sure about the Putting a DOMDocument (or your subclass of it) in $options['target'] feels a bit weird to me. But, the idea of being able to pass in an existing DOMDocument is interesting. Are there any potential pitfalls? 'target' does not explain what the option is for, but I'm not sure what else to recommend instead of it. Without saying "This is required for compatibility with [...]" can you explain why this option would be generally helpful? Where else could it be used? In any case, I suspect that if either of these get in, they'll need to be in separate PRs so that we can discuss each one individually |
@cognifloyd & @stof : To play well with Old code :if (isset($this->nsStack[0][$prefix])) {
$ele = $this->doc->createElementNS($this->nsStack[0][$prefix], $lname);
} else {
$ele = $this->doc->createElement($lname);
} New code :if (!isset($this->nsStack[0][$prefix]) || ($prefix === "" && $this->options['implicitHtmlNamespace'])) {
$ele = $this->doc->createElement($lname);
} else {
$ele = $this->doc->createElementNS($this->nsStack[0][$prefix], $lname);
} To work well with Old code :$impl = new \DOMImplementation();
$dt = $impl->createDocumentType('html');
$this->doc = $impl->createDocument(null, null, $dt); New code :if (isset($options['target'])) {
$this->doc = $options['target'];
} else {
$impl = new \DOMImplementation();
$dt = $impl->createDocumentType('html');
$this->doc = $impl->createDocument(null, null, $dt);
} Both changes have been submitted in this patch. They should have no impact on any existing codebase and both changes passed the Travis build. For me, the second change is essential, as I need the result of any HTML parsing taking place to be an existing instance of class This class also implements the
An alternative to the first change to If I know that whatever HTML5 code I'm parsing doesn't contain SVG, MathML or other tags that aren't HTML tags, I don't need namespaces and I don't want them.
AFAIK, there are no pitfalls. The behavior of the library is the same, except that you don't need to return your result. This is "oldskool" PHP programming, really. I can't say I generally support this technique, but for this use case I can't find an alternative implementation that's even remotely as efficient.
It can be used in any library that intends to subclasses the In such a case, you ALWAYS want the very instance that implements the Other use cases would be eg. libraries that implement additional interfaces to
I need these features and I need them now. Do we really need all that red tape for such simple yet powerful changes after they already passed the Travis build? |
I'm okay with this patch. We knew from the outset that namespaces in HTML5 are hackish, and that different tools would implement HTML5 DOM traversal differently. So I think this patch is just fine. It does not break backward compatibility, and it broadens HTML5-PHP's compatibility with other tools. I'll leave it to @goetas or @mattfarina to agree or disagree. |
Feel free to fork the project. BTW, I'm ok with the
|
I thought about targetDocument as well. I think simply calling it "target" didn't work for me, because at first I connected it with "target" from html (which frame or new window to open the link in). 'targetDocument' avoids that bit of confusion, I think. So, I'm okay with targetDocument if there's nothing better.
That feels better to me. Perhaps a little more verbose would be good: |
Those names sound better to me. What places should we document these two new options? |
well! The documentation about this can be somewhere here https://github.com/Masterminds/html5-php/wiki/Basic-Usage#instantiating and https://github.com/Masterminds/html5-php#xml-namespaces |
Changing names of the two new options
@goetas :
I did. Hence, this pull request.
Done!
Done! (changed it to
I have no experience writing unit tests, so I don't feel comfortable setting them up for someone else's project. Would it be possible any of you write the tests after accepting this pull request that contain the modified code? |
Can anyone already merge this pull request? |
Where are the 2 new options documented? How feasible would it be to do integration tests? |
What exactly would you need to test? The changes to the code are pretty straightforward and very minimal.
See below. DocumentationOverview
Backwards compatibilityExisting projects using html5-Php will not be impacted by this patch. In absence of the new options, the behavior of the library is identical to the behavior before the patch. Altered codeThese options are implemented only in Changes for the implicitHtmlNamespace option :Old code :if (isset($this->nsStack[0][$prefix])) {
$ele = $this->doc->createElementNS($this->nsStack[0][$prefix], $lname);
} else {
$ele = $this->doc->createElement($lname);
} New code :if (!isset($this->nsStack[0][$prefix]) || ($prefix === "" && $this->options['implicitHtmlNamespace'])) {
$ele = $this->doc->createElement($lname);
} else {
$ele = $this->doc->createElementNS($this->nsStack[0][$prefix], $lname);
} Changes for the targetDocument option :Old code :$impl = new \DOMImplementation();
$dt = $impl->createDocumentType('html');
$this->doc = $impl->createDocument(null, null, $dt); New code :if (isset($options['targetDocument'])) {
$this->doc = $options['targetDocument'];
} else {
$impl = new \DOMImplementation();
$dt = $impl->createDocumentType('html');
$this->doc = $impl->createDocument(null, null, $dt);
} |
I was thinking that if we plan to add this feature, would be nice to have an public function loadHTMLFile($file, array $options = array()) @mattfarina Is it a breaking change? Do we need a |
@goetas Sorry for the slow response. Now that we're entering the holiday season here I've been spending a bit of time offline and just saw this. Being that this is an addition and not a backwards incompatible change we should be fine with a 2.1.0 release. |
@mattfarina Thanks! I was planning to add something as this... With this option should be possible to "customize" somehow the "reading" process. |
Any progress? What's preventing this pull request from being merged? |
@goetas :
Any template for writing them? I have no experience writing unit tests, but I could look into it if I know what to do. Can't this be added in a new, subsequent pull request?
Seems like a simple copy-paste to me. Let me know exactly which content you want me to add where and I'll take care of it. Can't this be added in a new, subsequent pull request?
That feature is unrelated to the implementation of the two options I added. It makes no sense to me to include them in the same pull request. Also, this feature doesn't really make much sense to me. Why not just add a Proposed implementation : public function __construct(array $options = array())
{
$this->setOptions($options);
}
public function setOptions(array $options = array())
{
$this->options = array_merge($this->options, $options);
return $this;
}
public function setOption($key, $value)
{
$this->options[$key] = $value;
return $this;
}
public function getOptions()
{
return $this->options;
}
public function getOption($key)
{
return (isset($this->options[$key]) ? $this->options[$key] : false);
}
Huh? What does that mean? |
@jslegers have 3-4 pull request for one functionality is a bad idea |
@goetas :
It's been a month now since I created this pull request. This shouldn't be taking so long. |
The idea with squashing the pull request into one commit is so that the entire feature is rolled into one commit instead of spread across many. Maybe we can relax this restraint for this issue? (I'll defer to @goetas on that) We're not trying to be annoying about these things. But we're all very busy (as I'm sure you are too), and we've committed to maintain a very high standard of code for this project. I hope it's not coming across as arrogance or anything. |
Not sure where to begin. Again, I have no experience with unit tests, so it's going to take some time figuring out what to do... whereas I bet it's only 5 to 10 minutes work for you guys...
Which segments? Where exactly? Could anyone provide the full text for the readme file, that I can just copy-paste? An alternative option would be to submit a pull request to my fork.
A If you want to set one or more options with every call of your loadHTMLFile method without writing extra lines of code, chaining your methods like
Are there any downsides to using multiple commits? Also, how can I merge multiple commits? My
I'm all in favor of high standards, but this much red tape for (1) tiny changes that have (2) passed the Travis build and are (3) 100% backwards-compatible is quite a turn-off for me. This, especially because my Dom-Query library depends on these changes. Blocking this pull request prevents people from using that library in projects that rely on Composer for dependency management. Yet, the only reason for blocking the request seems to be the lack of documentation or unit tests. If high quality is so important, IMO it's better to focus on actually implementing new features that improve the performance or flexibility of the library (eg. a new implementation of This level of red tape is one of the reasons I'm reluctant to join existing open source projects. It's one of the reasons I prefer building my own libraries or frameworks instead. It's also one of the reasons I prefer working as the only programmer in a small company rather than as one of many programmers in a large company. While it may be intended to improve the quality of a project's code, all this administration stiffles innovation on so many levels it has mostly an adverse effect. |
…html5-php into feature/domquery-options
I'm disappointed in the direction this thread has gone. I just pulled this into a feature branch (
I've decided not to add a special I'm also trying to normalize option names across the parser. Some are snake case and some are camel case. I'm sticking to snake for new ones, and adding duplicate camel/snake support for old ones. |
@technosophos most probably, in the next week I can take a look into (I've completed my house change)... let me try :) |
IMHO, if you want to become a good developer I will suggest to you to learn to write tests, documentation and to collaborate (instead of reinventing the wheel by your self). |
Unit tests are useful for projects that have multiple programmers working on them at the same time or if you don't really understand enough of the code you're working on to understand the impact of your changes. In the company I work for, I do mostly custom dev work built from scratch and the only other technical guy rarely goes beyond adding and configuring modules in Drupal, so there's no real need for unit testing... nor is there time for it. I might add unit tests to some of my larger open source projects in the future, but as long as they are one-man projects, it doesn't really matter.
For my CSS framework, I built a documentation site with demos of all major components, which I maintain on my own in my spare time. I'm planning to do the same for my PHP framework once I released more components. Until then, I prefer to spend whatever time I have for open source work on improving my existing codebase and writing new components for my PHP framework.
Before I start working on a project, I always look for existing libraries or frameworks to take care of the heavy lifting. Most of the time, though, I end up writing my own libraries/framework or writing my own customized version of a library/framework, because other projects rarely fit my needs.
|
I got a little bit done and through legal review. Still need tests, and I think there may be an option or two that I missed. (@goetas -- you should probably add your copyright info on the license. I had to update for Google stuff, and noticed that your name is missing.) |
@technosophos did you see my html-parsing-options branch and #72 PR? Are we working on the same thing? :-D |
I think that #72 is not related to this... it is a generic feature. Your work is related html-parsing-options branch, and we can merge it some how, or you can make some copy-paste from my code (test suite parts) |
I added two new options.
These options are implemented only in
\Masterminds\HTML5\Parser\DOMTreeBuilder
and have the following purpose :implicitHtmlNamespace
= Allows the use of createElement instead ofcreateElementNS for HTML elements. This is required for compatibility
with
\PHPPowertools\DOM-Query
and for compatibility with\Symfony\Component\CssSelector\CssSelector
.target
= allows an existing DOMDocument (or subclass thereof) to bepasssed to the DOMTreeBuilder instead of creating a new one. This
option is required for compatibility with
\PHPPowertools\DOM-Query