Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for crawling HTML fragments #3

Closed
jhurliman opened this issue Jun 7, 2011 · 1 comment
Closed

Add support for crawling HTML fragments #3

jhurliman opened this issue Jun 7, 2011 · 1 comment

Comments

@jhurliman
Copy link
Contributor

If the crawler encounters a page such as http://hypem.com/blog/a/1?ax=1 that contains an HTML fragment and not a complete document, the crawler will crash when trying to appendChild() a jQuery script element on the undefined window.document.body object. One option might be to add support for the HTML5 parser library which should be able to handle fragments: https://github.com/aredridel/html5

A short repro:

var Crawler = require('node-crawler').Crawler;
var crawler = new Crawler({
  maxConnections: 1,
  callback: function(err, res, $) {
    console.log('worked!'); // The app will crash before this point
  }
});

crawler.queue(['http://hypem.com/blog/a/1?ax=1']);
@jhurliman
Copy link
Contributor Author

I apologize, I was trying to run node-crawler with a more recent version of jsdom (0.2.0) which seemed to be causing this issue.

mike442144 pushed a commit that referenced this issue Nov 23, 2020
fix for node URL accessibility for different node version
Miniast added a commit that referenced this issue May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant