package.json

{
  "name": "scrape-it",
  "description": "A Node.js scraper for humans.",
  "keywords": [
    "scrape",
    "it",
    "a",
    "scraping",
    "module",
    "for",
    "humans"
  ],
  "license": "MIT",
  "version": "5.3.1",
  "main": "lib/index.js",
  "types": "lib/index.d.ts",
  "scripts": {
    "test": "node test"
  },
  "author": "Ionică Bizău <bizauionica@gmail.com> (https://ionicabizau.net)",
  "contributors": [
    "ComFreek <comfreek@outlook.com> (https://github.com/ComFreek)",
    "Jim Buck <jim@jimmyboh.com> (https://github.com/JimmyBoh)"
  ],
  "repository": {
    "type": "git",
    "url": "git+ssh://git@github.com/IonicaBizau/scrape-it.git"
  },
  "bugs": {
    "url": "https://github.com/IonicaBizau/scrape-it/issues"
  },
  "homepage": "https://github.com/IonicaBizau/scrape-it#readme",
  "blah": {
    "h_img": "https://i.imgur.com/j3Z0rbN.png",
    "cli": "scrape-it-cli",
    "description": "Want to save time or not using Node.js? Try our [hosted API](https://scrape-it.saasify.sh).",
    "installation": [
      {
        "h2": "FAQ"
      },
      {
        "p": "Here are some frequent questions and their answers."
      },
      {
        "h3": "1. How to parse scrape pages?"
      },
      {
        "p": "`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:"
      },
      {
        "ol": [
          "**The ajax response is in JSON format.** In this case, you can make the request directly, without needing a scraping library.",
          "**The ajax response gives you HTML back.** Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response",
          "**The ajax request is so complicated that you don't want to reverse-engineer it.** In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page."
        ]
      },
      {
        "h3": "2. Crawling"
      },
      {
        "p": "There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files."
      },
      {
        "h3": "3. Local files"
      },
      {
        "p": "Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`."
      }
    ]
  },
  "dependencies": {
    "@types/cheerio": "^0.22.29",
    "assured": "^1.0.14",
    "cheerio-req": "^1.2.3",
    "scrape-it-core": "^1.0.0",
    "typpy": "^2.3.11"
  },
  "devDependencies": {
    "lien": "^3.3.0",
    "tester": "^1.4.4"
  },
  "files": [
    "bin/",
    "app/",
    "lib/",
    "dist/",
    "src/",
    "scripts/",
    "resources/",
    "menu/",
    "cli.js",
    "index.js",
    "bloggify.js",
    "bloggify.json",
    "bloggify/"
  ]
}