Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider SgmlReader as alternative default to HtmlAgilityPack #25

Closed
atifaziz opened this issue Aug 23, 2015 · 6 comments
Closed

Consider SgmlReader as alternative default to HtmlAgilityPack #25

atifaziz opened this issue Aug 23, 2015 · 6 comments

Comments

@atifaziz
Copy link
Owner

Originally reported on Google Code with ID 25

What new or enhanced feature are you proposing?

Fizzler tools like Visual Fizzler rely on HtmlAgilityPack as the default
implementation. However, the HtmlAgilityPack project seems to have gone
stale at the moment and has a few bugs pending that are also affecting CSS
selection via Fizzler. Consider a more robust default alternative.

What goal would this enhancement help you achieve?

It would make Fizzler look less buggy. :)

Reported by azizatif on 2009-05-06 15:22:18

@atifaziz
Copy link
Owner Author

See issue #24 for one HtmlAgilityPack bug biting Fizzler.

Reported by azizatif on 2009-05-06 15:22:58

  • Status changed: New

@atifaziz
Copy link
Owner Author

Two possibles:

http://developer.mindtouch.com/SgmlReader
http://code.google.com/p/twintsam/

Reported by info%colinramsay.co.uk@gtempaccount.com on 2009-05-06 15:27:22

@atifaziz
Copy link
Owner Author

> twintsam

The project home page says, "The code is not usable yet." That leaves just SgmlReader
for now.

Reported by azizatif on 2009-05-06 15:34:00

@atifaziz
Copy link
Owner Author

I think dropping HtmlAgilityPack (at least as the default) is a 
good idea. It isn't actively maintained and its developers don't 
seem too eager to fix bugs in it either. It is an excellent library 
for simple HTML parsing, and is afaik the only one exposing a full 
DOM (which is very convenient), but because of its bugs and 
inactivity, I think it's a wise plan to move away from it.

SgmlReader and Twintsam are both alternatives worth looking into. I know 
Thomas Broyer, the project owner of Twintsam, and it is a very promising 
project with the goal of being the reference implementation of the HTML5 
parsing algorithm in C#. That's a noble goal, imho.

SgmlReader, on the other hand, is a nice, but old and a bit dated 
implementation. I believe, though, that SgmlReader is the best of 
the three at the moment, but the code quality of the project is in 
my humble opinion not too great, which is why I don't consider 
contributing to it. I also don't think there's much testing to 
speak of in the SgmlReader project, although it is being actively 
maintained and bugs are fixed.

I would love to cooperate in implementing either of these (or others, if 
there are any) alternatives. For the long term, I think Twintsam might be 
the best project to bet on, but it does indeed need some work before it's 
production ready, so it might be something worth investigating for version 
2.0 of Fizzler.

Reported by asbjornu on 2009-05-06 19:11:55

@atifaziz
Copy link
Owner Author

@asbjornu: That for your feedback on the various alternatives.

> It isn't actively maintained and its developers don't 
> seem too eager to fix bugs in it either.

Wonder if it's time to fork?

> would love to cooperate in implementing either of these

Great! I've changed the summary of this issue so now it points to specifically to
SgmlReader and you can initially submit your contribution as a patch. If you need
assistance with understanding any bits of Fizzler, let us know!

We can open another issue for Twintsam when it makes sense.

Reported by azizatif on 2009-05-08 12:13:52

@atifaziz
Copy link
Owner Author

New Fizzler.Systems.XmlNodeQuery in r193 will support use of SgmlReader. All tests
pass, including an extra one to test "form input" CSS selector which was the root
reason for starting this issue.

Reported by info%colinramsay.co.uk@gtempaccount.com on 2009-05-11 23:46:14

  • Status changed: Fixed

@atifaziz atifaziz added enhancement New feature or request and removed Type-Enhancement labels Aug 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant