You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps will reproduce the problem?
1. Try to extract that url:
http://sourceforge.net/projects/xampp/files/XAMPP%20Windows/1.7.4/xampp-win32-1.
7.4-VC6-installer.exe/download
I have used ArticleExtractor.
It throws few times:
Warning: SAX input contains nested A elements -- You have probably hit a bug in
your HTML parser (e.g., NekoHTML bug #2909310). Please clean the HTML
externally and feed it to boilerpipe again. Trying to recover somehow...
and then crashes with OutOfMemoryException
I'm using version 1.2.0. I have tested on Windows and on Ubuntu as well.
Original issue reported on code.google.com by fzr...@gmail.com on 29 Jul 2011 at 1:27
The text was updated successfully, but these errors were encountered:
The input was no HTML (application/x-msdos-program instead), boilerpipe
nevertheless accepted it and NekoHTML choked on it.
In the meantime, in boilerpipe trunk checks were added to only fetch text/html
content, and throw an exception otherwise. boilerpipe-web has additional checks
(e.g., content length).
In both cases, the NekoHTML bug exception will not appear anymore.
Original comment by ckkohl79 on 22 Jan 2012 at 11:03
Original issue reported on code.google.com by
fzr...@gmail.com
on 29 Jul 2011 at 1:27The text was updated successfully, but these errors were encountered: