You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been using Fizzler with great success, but today I came across some HTML that silently
failed to parse correctly.
I was selecting all of the <a> elements and noticed that one was being ignored. Here
are the repo steps:
1. Load the HTML from http://pastebin.com/T1Lsr6w6 (this is the "View Source" for http://www.diapers.com/product/productdetail.aspx?productid=16913)
2. Try to query the selector "#pdp"
3. Example code (assuming String html has the HTML above)
var doc = new HtmlDocument();
doc.LoadHtml(html);
var dom = doc.DocumentNode;
var pdpElement = dom.QuerySelector("#pdp");
What is the expected output? What do you see instead?
Expect pdpElement to be an HtmlNode of <a href="http://c1.diapers.com/images/products/p/pg/pg-256_1z.jpg"
class="MagicZoomPlus" id="pdp" title="Pampers Sensitive Thick Baby Wipes Refill 360ct."
target="_blank">
Instead, it doesn't find a match.
What version of the product are you using? On what operating system?
Fizzler 0.9
Please provide any additional information below.
Reported by portman.wills on 2011-04-06 19:36:38
The text was updated successfully, but these errors were encountered:
I narrowed down the error slightly.
Using VisualFizzler (neat tool!) I can see that everything up to line 282 is selectable
(for example "#siteNav").
But after line 283, I can't select anything (for example "div.topToolBox").
So the issue has to do with long lines like on line 283 of that pastebin example.
Sure enough, when I remove this line (#283) from the HTML, everything works perfectly.
It's pathologically long (51,553 characters in fact!!) so this is probably a defect
in one of the underlying framework classes that Fizzler is using.
In the meantime, I've changed my code to chop long lines at 1024 characters before
handing off to Fizzler, and everything is working again. But you still might want to
investigate what precisely is going wrong on that long line, so I'll keep the issue
open.
We're using HTMLAgilityPack so it's probably an issue there, but it should be fairly
trivial to swap out HTMLAgilityPack for another parser. It could also be that this
issue has been fixed by a more recent version of HTMLAgilityPack than the one in the
download.
Reported by info%colinramsay.co.uk@gtempaccount.com on 2011-04-07 13:48:49
Originally reported on Google Code with ID 45
Reported by
portman.wills
on 2011-04-06 19:36:38The text was updated successfully, but these errors were encountered: