Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflowError when parsing certain html #135

Open
prismofeverything opened this issue Nov 4, 2015 · 4 comments
Open

StackOverflowError when parsing certain html #135

prismofeverything opened this issue Nov 4, 2015 · 4 comments

Comments

@prismofeverything
Copy link

Using enlive when reading certain urls gives me a StackOverflowError, with these parts of the stacktrace repeated over and over:

                           clojure.core/mapcat         core.clj: 2660
                             clojure.core/apply         core.clj:  630
                               clojure.core/seq         core.clj:  137
                                            ...                       
                            clojure.core/map/fn         core.clj: 2622
net.cgrand.enlive-html/zip-select-nodes*/select1/fn  enlive_html.clj:  512
   net.cgrand.enlive-html/zip-select-nodes*/select1  enlive_html.clj:  512
                                            ...                       
                            clojure.core/mapcat         core.clj: 2660
                             clojure.core/apply         core.clj:  630
                               clojure.core/seq         core.clj:  137
                                            ...                       
                            clojure.core/map/fn         core.clj: 2622
 net.cgrand.enlive-html/zip-select-nodes*/select1/fn  enlive_html.clj:  512
    net.cgrand.enlive-html/zip-select-nodes*/select1  enlive_html.clj:  512

Any way to avoid this? Are we just naively recurring somewhere? Can this be turned into a loop/recur?

Thank you!

@retnuh
Copy link

retnuh commented Feb 2, 2016

I'm getting this as well. Digging through logs now to find some example data...

@fdserr
Copy link
Collaborator

fdserr commented Feb 3, 2016

Can you provide a failing gist please?

@retnuh
Copy link

retnuh commented Feb 3, 2016

https://gist.github.com/retnuh/9747891f2d1fb74e787b

I've stripped down the clojure to more or less bare bones, but haven't had time to dig through the HTML file. I at first thought it might be the STYLE tag outside the HTML tag, but a stripped down version (i.e. most of the body removed) works okay.

bad2.html also triggers StackOverflowError, and it happens much more quickly.

@fdserr
Copy link
Collaborator

fdserr commented Feb 3, 2016

Thanks Hunter.

but a stripped down version (i.e. most of the body removed) works okay.

The snippet seems alright, but the html file is too large for us to investigate it. It'd be greatly helpful if you could track down where exactly it blows up. Alternatively, try using JSoup as a parser as it is more robust than TagSoup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants