Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding two options required for compatibility with 3rd party components #69

Closed
wants to merge 5 commits into from
Closed

Conversation

jslegers
Copy link
Contributor

@jslegers jslegers commented Dec 4, 2014

I added two new options.

These options are implemented only in \Masterminds\HTML5\Parser\DOMTreeBuilder and have the following purpose :

New Options :

* implicitHtmlNamespace = Allows the use of createElement instead of
createElementNS for HTML elements. This is required for compatibility
with PHPPowertools/DOM-Query and for compatibility with
symfony/CssSelector.

* target = allows an existing DOMDocument (or subclass thereof) to be
passsed to the DOMTreeBuilder instead of creating a new one. This
option is required for compatibility with PHPPowertools/DOM-Query
If option implicitHtmlNamespace is undefined, the following error is
generated :
*Undefined index: implicitHtmlNamespace*

This patch fixes that error.
Changing version number of composer file to avoid potential version
conflicts when using third party code that needs the two new options
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.15%) when pulling 0882434 on PHPPowertools:master into b39dd88 on Masterminds:master.

@cognifloyd
Copy link
Contributor

The conversation about adding namespaces to the elements happened in #43, #44, and #45. This follows the HTML5 spec more closely. So, I'm not sure about the implicitHtmlNamespace option. The best would be to get Symfony\CssSelector to support generating xpaths with proper xml namespaces. See more specifically: #45 (comment) and #45 (comment)

Putting a DOMDocument (or your subclass of it) in $options['target'] feels a bit weird to me. But, the idea of being able to pass in an existing DOMDocument is interesting. Are there any potential pitfalls? 'target' does not explain what the option is for, but I'm not sure what else to recommend instead of it. Without saying "This is required for compatibility with [...]" can you explain why this option would be generally helpful? Where else could it be used?

In any case, I suspect that if either of these get in, they'll need to be in separate PRs so that we can discuss each one individually

@jslegers
Copy link
Contributor Author

jslegers commented Dec 5, 2014

@cognifloyd & @stof :

To play well with symfony/CssSelector in its current state, Mastermind/html5-php needs but a single and tiny change :

Old code :
if (isset($this->nsStack[0][$prefix])) {
    $ele = $this->doc->createElementNS($this->nsStack[0][$prefix], $lname);
} else {
    $ele = $this->doc->createElement($lname);
}
New code :
if (!isset($this->nsStack[0][$prefix]) || ($prefix === "" && $this->options['implicitHtmlNamespace'])) {
    $ele = $this->doc->createElement($lname);
} else {
    $ele = $this->doc->createElementNS($this->nsStack[0][$prefix], $lname);
}

To work well with PHPPowertools/DOM-Query, it needs one more change :

Old code :
$impl = new \DOMImplementation();
$dt = $impl->createDocumentType('html');
$this->doc = $impl->createDocument(null, null, $dt);
New code :
if (isset($options['target'])) {
    $this->doc = $options['target'];
} else {
    $impl = new \DOMImplementation();
    $dt = $impl->createDocumentType('html');
    $this->doc = $impl->createDocument(null, null, $dt);
}

Both changes have been submitted in this patch. They should have no impact on any existing codebase and both changes passed the Travis build.

For me, the second change is essential, as I need the result of any HTML parsing taking place to be an existing instance of class \PowerTools\DOM_Document.

This class also implements the querySelector and querySelectorAll methods, that are used by \PowerTools\DOM_Query for selecting DOM elements, and which makes use of symfony/CssSelector under the hood. Alternative implementations would have major impacts on flexibility, syntax, robustness and/or performance that are simply unacceptable for a DOM crawling library intended to mimic the behavior of JavaScript (and especially jQuery) behavior as closely as both possible and reasonable.

The best would be to get Symfony\CssSelector to support generating xpaths with proper xml namespaces.

An alternative to the first change to Mastermind/html5-php would be that Symfony adds support for HTML namespaces, but I can't rely on that for the moment. Also, I do believe that developers should have the freedom to decide for themselves if they want namespaced elements.

If I know that whatever HTML5 code I'm parsing doesn't contain SVG, MathML or other tags that aren't HTML tags, I don't need namespaces and I don't want them.

Putting a DOMDocument (or your subclass of it) in $options['target'] feels a bit weird to me. But, the idea of being able to pass in an existing DOMDocument is interesting. [...] Are there any potential pitfalls?

AFAIK, there are no pitfalls. The behavior of the library is the same, except that you don't need to return your result.

This is "oldskool" PHP programming, really. I can't say I generally support this technique, but for this use case I can't find an alternative implementation that's even remotely as efficient.

Without saying "This is required for compatibility with [...]" can you explain why this option would be generally helpful? Where else could it be used?

It can be used in any library that intends to subclasses the \DOMDocument with the purpose of reïmplementing the loadHTML, loadHTMLFile, saveHTML, saveHTMLFile, save and/or load methods to add support for HTML5.

In such a case, you ALWAYS want the very instance that implements the loadHTML to also contain the results of the parsing. Cloning nodes would be an alternate implementation, but it would be a very expensive one that's totally unacceptable for pretty much any use case.

Other use cases would be eg. libraries that implement additional interfaces to \DOMDocument for easier iteration, serialization, etc. For example, consider goetas/xmldom. By your very own @goetas.

In any case, I suspect that if either of these get in, they'll need to be in separate PRs so that we can discuss each one individually

I need these features and I need them now. Do we really need all that red tape for such simple yet powerful changes after they already passed the Travis build?

@technosophos
Copy link
Member

I'm okay with this patch. We knew from the outset that namespaces in HTML5 are hackish, and that different tools would implement HTML5 DOM traversal differently. So I think this patch is just fine. It does not break backward compatibility, and it broadens HTML5-PHP's compatibility with other tools.

I'll leave it to @goetas or @mattfarina to agree or disagree.

@goetas
Copy link
Member

goetas commented Dec 6, 2014

I need these features and I need them now.

Feel free to fork the project.

BTW, I'm ok with the target option. I'm also ok with the implicitHtmlNamespace feature. I have just some dubts about the implementation.

  • Regarding the target option:
  • Regarding implicitHtmlNamespace:
    • can you call it disableHtmlNamespace? ( @cognifloyd @technosophos Ideas about naming? )
    • can you add some tests for this?

@cognifloyd
Copy link
Contributor

I thought about targetDocument as well. I think simply calling it "target" didn't work for me, because at first I connected it with "target" from html (which frame or new window to open the link in). 'targetDocument' avoids that bit of confusion, I think. So, I'm okay with targetDocument if there's nothing better.

  • Regarding implicitHtmlNamespace:

That feels better to me. Perhaps a little more verbose would be good: disableHtmlNamespaceInDom or disableHtmlNamespaceInTargetDom ...?

@technosophos
Copy link
Member

Those names sound better to me. disableHtmlNamespaceInDom sounds pretty clear. I'm okay with abbreviating Namespace to NS (as most XML stuff, including DOM, does a lot). So we could do disableHtmlNsInDom. Not quite as clear... but shorter.

What places should we document these two new options?

@goetas
Copy link
Member

goetas commented Dec 7, 2014

well! disableHtmlNsInDom and targetDocument will work fine!

The documentation about this can be somewhere here https://github.com/Masterminds/html5-php/wiki/Basic-Usage#instantiating and https://github.com/Masterminds/html5-php#xml-namespaces

Changing names of the two new options
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.15%) when pulling 7481267 on PHPPowertools:master into b39dd88 on Masterminds:master.

@jslegers
Copy link
Contributor Author

jslegers commented Dec 8, 2014

@goetas :

Feel free to fork the project.

I did. Hence, this pull request.

Can you call it targetDocument? ( @cognifloyd @technosophos Ideas about naming? )

Done!

can you call it disableHtmlNamespace? ( @cognifloyd @technosophos Ideas about naming? )

Done! (changed it to disableHtmlNsInDom)

can you add some tests for this?

I have no experience writing unit tests, so I don't feel comfortable setting them up for someone else's project.

Would it be possible any of you write the tests after accepting this pull request that contain the modified code?

@jslegers
Copy link
Contributor Author

Can anyone already merge this pull request?

@mattfarina
Copy link
Member

Where are the 2 new options documented? How feasible would it be to do integration tests?

@jslegers
Copy link
Contributor Author

@mattfarina :

How feasible would it be to do integration tests?

What exactly would you need to test? The changes to the code are pretty straightforward and very minimal.

Where are the 2 new options documented?

See below.

Documentation

Overview

  • implicitHtmlNamespace :
    • If set and true, HTML namespaces are not explicitly defined when parsing HTML files.
    • If not set or false, HTML namespaces are explicitly defined when parsing HTML files.
  • targetDocument :
    • If set, it will be used to store the DOM resulting from parsing an HTML string. It can be a DOMDocument or subclass thereof.
    • If not set, a new DOMDocument will be created to store the DOM resulting from parsing an HTML string.

Backwards compatibility

Existing projects using html5-Php will not be impacted by this patch. In absence of the new options, the behavior of the library is identical to the behavior before the patch.


Altered code

These options are implemented only in \Masterminds\HTML5\Parser\DOMTreeBuilder.

Changes for the implicitHtmlNamespace option :

Old code :
if (isset($this->nsStack[0][$prefix])) {
    $ele = $this->doc->createElementNS($this->nsStack[0][$prefix], $lname);
} else {
    $ele = $this->doc->createElement($lname);
}
New code :
if (!isset($this->nsStack[0][$prefix]) || ($prefix === "" && $this->options['implicitHtmlNamespace'])) {
    $ele = $this->doc->createElement($lname);
} else {
    $ele = $this->doc->createElementNS($this->nsStack[0][$prefix], $lname);
}

Changes for the targetDocument option :

Old code :
$impl = new \DOMImplementation();
$dt = $impl->createDocumentType('html');
$this->doc = $impl->createDocument(null, null, $dt);
New code :
if (isset($options['targetDocument'])) {
    $this->doc = $options['targetDocument'];
} else {
    $impl = new \DOMImplementation();
    $dt = $impl->createDocumentType('html');
    $this->doc = $impl->createDocument(null, null, $dt);
}

@goetas
Copy link
Member

goetas commented Dec 17, 2014

I was thinking that if we plan to add this feature, would be nice to have an $option parameter into HTML5 main class.

public function loadHTMLFile($file, array $options = array())

@mattfarina Is it a breaking change? Do we need a 3.0 release or 2.1 is enough?

@mattfarina
Copy link
Member

@goetas Sorry for the slow response. Now that we're entering the holiday season here I've been spending a bit of time offline and just saw this.

Being that this is an addition and not a backwards incompatible change we should be fine with a 2.1.0 release.

@goetas
Copy link
Member

goetas commented Dec 22, 2014

@mattfarina Thanks! I was planning to add something as this... With this option should be possible to "customize" somehow the "reading" process.
(This option is already available in the "saving" process)

@jslegers
Copy link
Contributor Author

Any progress? What's preventing this pull request from being merged?

@goetas
Copy link
Member

goetas commented Jan 5, 2015

@jslegers
At the moment:

  • no unit tests for new features
  • the "documentation" (this) should be moved to readme.md
  • this feature should be implemented
  • some commits present into this PR should be squashed

@jslegers
Copy link
Contributor Author

jslegers commented Jan 5, 2015

@goetas :

no unit tests for new features

Any template for writing them? I have no experience writing unit tests, but I could look into it if I know what to do.

Can't this be added in a new, subsequent pull request?

the "documentation" (this) should be moved to readme.md

Seems like a simple copy-paste to me. Let me know exactly which content you want me to add where and I'll take care of it.

Can't this be added in a new, subsequent pull request?

this feature should be implemented

That feature is unrelated to the implementation of the two options I added. It makes no sense to me to include them in the same pull request.

Also, this feature doesn't really make much sense to me. Why not just add a public function setOptions(array $options = array()) and/or a public function setOption($options) method instead, analogous to the public function getOptions() method?

Proposed implementation :

    public function __construct(array $options = array())
    {
        $this->setOptions($options);
    }

    public function setOptions(array $options = array())
    {
        $this->options = array_merge($this->options, $options);
        return $this;
    }

    public function setOption($key, $value)
    {
        $this->options[$key] = $value;
        return $this;
    }

    public function getOptions()
    {
        return $this->options;
    }

    public function getOption($key)
    {
        return (isset($this->options[$key]) ? $this->options[$key] : false);
    }

some commits present into this PR should be squashed

Huh? What does that mean?

@goetas
Copy link
Member

goetas commented Jan 5, 2015

@jslegers have 3-4 pull request for one functionality is a bad idea

@jslegers
Copy link
Contributor Author

jslegers commented Jan 5, 2015

@goetas :

have 3-4 pull request for one functionality is a bad idea

  • Can you provide me a template for writing the unit tests?
  • Can you tell me precisely what content should be added to the readme.md file?
  • Can you provide me some feedback on the propose alternative implementation for the new options feature?

It's been a month now since I created this pull request. This shouldn't be taking so long.

@technosophos
Copy link
Member

  • For tests, start here: https://github.com/Masterminds/html5-php/blob/master/test/HTML5/Html5Test.php
  • For documentation, you just copy the text you already wrote to a section called "Options" in the README.
  • Adding an $options argument to parsing functions is a pattern common in many PHP and JavaScript libraries (actually, in C too). I don't mind also having a setter/getter combo for options, too.

The idea with squashing the pull request into one commit is so that the entire feature is rolled into one commit instead of spread across many. Maybe we can relax this restraint for this issue? (I'll defer to @goetas on that)

We're not trying to be annoying about these things. But we're all very busy (as I'm sure you are too), and we've committed to maintain a very high standard of code for this project. I hope it's not coming across as arrogance or anything.

@jslegers
Copy link
Contributor Author

jslegers commented Jan 5, 2015

@technosophos :

For tests, start here: https://github.com/Masterminds/html5-php/blob/master/test/HTML5/Html5Test.php

Not sure where to begin. Again, I have no experience with unit tests, so it's going to take some time figuring out what to do... whereas I bet it's only 5 to 10 minutes work for you guys...

For documentation, you just copy the text you already wrote to a section called "Options" in the README.

Which segments? Where exactly?

Could anyone provide the full text for the readme file, that I can just copy-paste? An alternative option would be to submit a pull request to my fork.

Adding an $options argument to parsing functions is a pattern common in many PHP and JavaScript libraries (actually, in C too). I don't mind also having a setter/getter combo for options, too.

A setter/getter combo is a better approach because it is only executed when needed, which is better for performance and makes more sense in contexts where you want to call loadHTMLFile for the same instance multiple times.

If you want to set one or more options with every call of your loadHTMLFile method without writing extra lines of code, chaining your methods like $html5->setOption($key, $value)->loadHTMLFile($file) is a better approach.

The idea with squashing the pull request into one commit is so that the entire feature is rolled into one commit instead of spread across many.

Are there any downsides to using multiple commits? Also, how can I merge multiple commits? My git expertise is pretty limited.

We're not trying to be annoying about these things. But we're all very busy (as I'm sure you are too), and we've committed to maintain a very high standard of code for this project. I hope it's not coming across as arrogance or anything.

I'm all in favor of high standards, but this much red tape for (1) tiny changes that have (2) passed the Travis build and are (3) 100% backwards-compatible is quite a turn-off for me.

This, especially because my Dom-Query library depends on these changes. Blocking this pull request prevents people from using that library in projects that rely on Composer for dependency management. Yet, the only reason for blocking the request seems to be the lack of documentation or unit tests.

If high quality is so important, IMO it's better to focus on actually implementing new features that improve the performance or flexibility of the library (eg. a new implementation of \Masterminds\HTML5\Parser\FileInputStream) than preventing tiny changes from being added to the codebase because it's lacking documentation or unit tests.

This level of red tape is one of the reasons I'm reluctant to join existing open source projects. It's one of the reasons I prefer building my own libraries or frameworks instead. It's also one of the reasons I prefer working as the only programmer in a small company rather than as one of many programmers in a large company. While it may be intended to improve the quality of a project's code, all this administration stiffles innovation on so many levels it has mostly an adverse effect.

technosophos added a commit that referenced this pull request Jan 6, 2015
@technosophos
Copy link
Member

I'm disappointed in the direction this thread has gone.

I just pulled this into a feature branch (feature/domquery-options). Any additional work on this issue should be done against that feature branch. I have to send my own patches through Google legal review, so it'll take a while before I can get my changes in. I'll commit them to that feature branch when I can. I'm working on:

  • documentation
  • unit tests
  • associated code cleanup

I've decided not to add a special $options to the parse/serialize functions, and will stick to the global new HTML5($options) usage. If anyone wants to add the additional options support, please open a new ticket.

I'm also trying to normalize option names across the parser. Some are snake case and some are camel case. I'm sticking to snake for new ones, and adding duplicate camel/snake support for old ones.

@goetas
Copy link
Member

goetas commented Jan 6, 2015

@technosophos most probably, in the next week I can take a look into (I've completed my house change)... let me try :)

@goetas
Copy link
Member

goetas commented Jan 6, 2015

@jslegers

This level of red tape is one of the reasons I'm reluctant to join existing open source projects. It's one of the reasons I prefer building my own libraries or frameworks instead. It's also one of the reasons I prefer working as the only programmer in a small company rather than as one of many programmers in a large company. While it may be intended to improve the quality of a project's code, all this administration stiffles innovation on so many levels it has mostly an adverse effect.

IMHO, if you want to become a good developer I will suggest to you to learn to write tests, documentation and to collaborate (instead of reinventing the wheel by your self).

@jslegers
Copy link
Contributor Author

jslegers commented Jan 6, 2015

If you want to become a good developer I will suggest to you to learn to write tests, documentation

Unit tests are useful for projects that have multiple programmers working on them at the same time or if you don't really understand enough of the code you're working on to understand the impact of your changes.

In the company I work for, I do mostly custom dev work built from scratch and the only other technical guy rarely goes beyond adding and configuring modules in Drupal, so there's no real need for unit testing... nor is there time for it.

I might add unit tests to some of my larger open source projects in the future, but as long as they are one-man projects, it doesn't really matter.

documentation

For my CSS framework, I built a documentation site with demos of all major components, which I maintain on my own in my spare time. I'm planning to do the same for my PHP framework once I released more components.

Until then, I prefer to spend whatever time I have for open source work on improving my existing codebase and writing new components for my PHP framework.

reinventing the wheel by your self

Before I start working on a project, I always look for existing libraries or frameworks to take care of the heavy lifting. Most of the time, though, I end up writing my own libraries/framework or writing my own customized version of a library/framework, because other projects rarely fit my needs.

html5-php is actually one of those few libraries that does exactly what I want and how I want it... except for those two options I added in this patch.

technosophos added a commit that referenced this pull request Jan 7, 2015
@technosophos
Copy link
Member

I got a little bit done and through legal review. Still need tests, and I think there may be an option or two that I missed. (@goetas -- you should probably add your copyright info on the license. I had to update for Google stuff, and noticed that your name is missing.)

@goetas
Copy link
Member

goetas commented Jan 7, 2015

@technosophos did you see my html-parsing-options branch and #72 PR?

Are we working on the same thing? :-D

@technosophos
Copy link
Member

@goetas I patched a bunch of options names to make them consistent. So I was sorta working on the other half of what you started in #72. I think mine should cleanly merge into yours.

@goetas
Copy link
Member

goetas commented Jan 7, 2015

I think that #72 is not related to this... it is a generic feature.

Your work is related html-parsing-options branch, and we can merge it some how, or you can make some copy-paste from my code (test suite parts)

@goetas goetas mentioned this pull request Jan 18, 2015
@goetas goetas closed this in #74 Feb 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants