Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error handling (HHVM compatibility) #33

Closed
goetas opened this issue May 7, 2014 · 13 comments
Closed

Better error handling (HHVM compatibility) #33

goetas opened this issue May 7, 2014 · 13 comments

Comments

@goetas
Copy link
Member

goetas commented May 7, 2014

Hi!
I suggest to change error handling in DOMTreeBuilder. (
https://github.com/Masterminds/html5-php/blob/master/src/HTML5/Parser/DOMTreeBuilder.php#L79)

Injecting a errors property to DOMDocument is not so clean and won't always work, especially on HHVM.

@technosophos
Copy link
Member

I'm open to alternative suggestions, provided they don't involve writing an alternative DOM implementation.

I think that in almost all cases, the parseErrors() function is used to report errors during parsing, so if there's a better place to put the errors, the patch should be trivial.

@goetas
Copy link
Member Author

goetas commented May 11, 2014

it is not so easy. I'm attempting to remove $dom->errors variable. But I need a different places where store errors.

I'm thinking to raise an exception when error occurs, but this will drastically change errors handling for this lib.

@mattfarina
Copy link
Member

@goetas I'm not sure an exception for each error is a good idea. So many documents have errors. You'd never be able to parse most things. It would really change error handling.

Like @technosophos I'm open to suggestions. I'm just trying to think practically.

@technosophos
Copy link
Member

Oh... sorry... missed this thread and commented straight on the commit.
I suppose we could create a DOMDocument wrapper, attach errors to that, and return it instead of the built-in DOMDocument. We'd have to do the same with DOMFragment, but that sure beats trying to re-implement the DOM.

@technosophos
Copy link
Member

I'm also gonna ask @miso-belica for an opinion. More than anyone else, he's had to actually deal with parser errors.

@miso-belica
Copy link
Contributor

I think to raise an exception is very bad idea - malformed documents will not be possible to parse.

I think the wrapper for DOMDocument is the best idea. It solves the issue in backward compatible manner. And it's not so hard to implement without reimplementation the whole DOM with magic methods __get, __set, __call, ... (or by inheritance?)

Second solution is to return an array with DOMDocument and $errors so API will be changed into this list($dom, $errors) = HTML5::loadHTML($html);. But this solution will lead to breaking change.

@goetas
Copy link
Member Author

goetas commented May 13, 2014

Can be adopted something like this?

$html5 = new HTML5($options = array());
$html5->setOptions($options);
$html5->getOptions(); // return: array

$html5->load($file);  // return: DOMDocument|null
$html5->loadHTML($string);  // return: DOMDocument|null
$html5->loadHTMLFragement($string);  // return: DOMDocument|null

$html5->getErrors(); // return: array
$html5->hasErrors(); // return: boolean

$html5->parse($input); // return: DOMDocument|null
$html5->parseFragment($input); // return: ?|null

$html5->save($dom, $file, $options = array());  // return: boolean
$html5->saveHTML($dom, $options = array());  // return: string

The proposed solution is a big change to the API... (can be created a new branch? 2.0?)
I do not like static methods... HTML5 class uses a lot of them. Static methods are not configurable and HTML5 class is not extensible...

@technosophos
Copy link
Member

I've been thinking a lot about @miso-belica 's proposal for returning two objects ($dom and $errors). I almost proposed something similar yesterday. I write a lot of Go, and this matches the Go error handling pattern... but deep down I feel like it is a little hacky in PHP.

All things considered, I find myself leaning toward @goetas 's most recent solution: Get rid of the statics and have people use HTML5 instances. It feels like the most flexible, and gives us the ability to strategically grow the API later.

According to our semantic versioning scheme, this would mean we'd have to release the change as a 2.0 release, but I'm okay with that.

And if we really wanted to, we could easily adapt @miso-belica 's Go-like solution to provide an "easy" set of static methods that merely wrapped @goetas 's solution.

@mattfarina -- what are your thoughts about committing to this level of API change?

@goetas
Copy link
Member Author

goetas commented May 13, 2014

I agree with @technosophos
We can create a HTML5Helper that contains all current static methods, and simply wrap them into new HTML5 API calls

@miso-belica
Copy link
Contributor

I don't like the method setOptions because I think the HTML5 instance should be configured once and then used. The method leads to usage like $html5 = new HTML5($options = array()); -> $html5->load($file); -> $html5->setOptions($options); -> $html5->load($file); ... But everything else is fine IMHO.

@technosophos Yes returning a tuple from method is very uncommon in PHP (I'm Python guy the most). But I think of the methods html5->load*() like they have 2 results (DOM and errors) so why one of the results should be stored somewhere and not returned? But $html5->getErrors(); is fine, just wrote my thoughts :)

@goetas
Copy link
Member Author

goetas commented May 13, 2014

@miso-belica your "options" argumentation is right.
I will try do create a PR as soon as possible.

@mattfarina
Copy link
Member

I think the idea of setOptions and getOptions is for altering them on the html5 object after they were initially created. I don't see that as something you'd need to do each time. It just provides access to that data which should be a protected property.

I like the idea of an object instead of a singleton. You can have to parsers side by side with different options this way.

I'm torn on the use of list or multiple methods to get the errors. This is one of those cases where I really like how languages like Go handle things. I wish it felt normal here. I think I like the option from @goetas the best. To use some methods to see if the errors exist and get them.

This is one of those cases where we'd need to increment the major api version. But, I think that's ok.

@goetas
Copy link
Member Author

goetas commented May 14, 2014

Moving discussion to #37

@goetas goetas closed this as completed May 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants