Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run Quickscrape in headless mode #63

Open
lanzer opened this issue Oct 28, 2015 · 3 comments
Open

Cannot run Quickscrape in headless mode #63

lanzer opened this issue Oct 28, 2015 · 3 comments

Comments

@lanzer
Copy link

lanzer commented Oct 28, 2015

running quickscrape with the -h or --headless option will not launch casper

I noticed that the headless parameter is not being passed when adding the scraper through scraperbox:

quickscrape.js (211)

    scrapers.addScraper(program.scraper);

Also, scraperbox need to pass the parameter to scraper, which is ready to listen to the parameter

scraperbox.js (48)

ScraperBox.prototype.addScraper = function(def) {
  if (typeof(def) == 'string') {
    def = JSON.parse(fs.readFileSync(def, 'utf8'));
  }
  var scraper = new Scraper(def);
  if (scraper.valid) {
    this.scrapers.push(scraper);
    return true;
  } else {
    return false;
  }
}

So I've made the following adjustments:

quickscrape.js (211)

    scrapers.addScraper(program.scraper, program.headless);

scraperbox.js (48)

ScraperBox.prototype.addScraper = function(def, headless) {
  if (typeof(def) == 'string') {
    def = JSON.parse(fs.readFileSync(def, 'utf8'));
  }
  var scraper = new Scraper(def, headless);
  if (scraper.valid) {
    this.scrapers.push(scraper);
    return true;
  } else {
    return false;
  }
}

I think the scraper checking routine also need to have the parameter added:

quickscrape.js (139)

  var scraper = new Scraper(JSON.parse(definition), program.headless);

Now I can see casperjs running. I noticed that 404 type status also result in the quickscrape halting. Will need to look into that.

@blahah
Copy link
Member

blahah commented Oct 28, 2015

Thanks for the report. Pull requests are very welcome with the changes you have made :)

lanzer pushed a commit to lanzer/thresher that referenced this issue Oct 29, 2015
lanzer pushed a commit to lanzer/quickscrape that referenced this issue Oct 29, 2015
…perBox,

this, along with the fix on scraperBox.js will enable headless mode for quickscrape

ContentMine#63
@lanzer
Copy link
Author

lanzer commented Oct 29, 2015

Done. Should I do the same for the "response status" bug also?

@blahah
Copy link
Member

blahah commented Oct 29, 2015

Thank very much. If you are willing, yes a PR fixing any bug is welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants