Skip to content
TypeScript Dockerfile JavaScript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.babelrc
.dockerignore
.gitattributes
.gitignore
.prettierrc
.travis.yml
Dockerfile
INPUT_SCHEMA.json
LICENSE
README.md
apify.json
jest.config.js
jest.setup.js
launch.json
package-lock.json
package.json
tsconfig.json

README.md

Web Designer News - Scraper

Scraper for www.webdesignernews.com, using Apify.

Input settings

Only startUrl required, for example this could be the main page: https://www.webdesignernews.com, some of the pagination pages: https://www.webdesignernews.com/page/2 or some specific section: https://www.webdesignernews.com/category/apps or https://www.webdesignernews.com/category/apps/page/3

wayToScrape can be old or new, by default old. If, for example, the starting link is on the third page (https://webdesignernews.com/page/3 for example), then scraper will scrape third page, fourth page, fifth and so on, until pagination or maxRequestsPerCrawl limit ends. But if the wayToScrape is new - third, second and first pages will be scraped. So you decide which way actor will work.

Apify use

This actor is published in Apify library: https://apify.com/ganevru/webdesignernews-scraper.

How to run actor in Apify: how to run section

Local use

It is not necessary to use the apify.com service to use actor. Copy this repository and install node dependencies, for example, this way:

git clone git@github.com:Ganevru/actor-webdesignernews-scraper.git
cd actor-webdesignernews-scraper
npm install

Create in the root of project: apify_storage/key_value_stores/default/INPUT.json

In INPUT.json, write your settings, for example:

{
  "startUrl": "https://webdesignernews.com/page/3",
  "wayToScrape": "old",
  "maxRequestsPerCrawl": 3,
  "maxRequestRetries": 3,
  "maxConcurrency": 3,
  "liveView": true,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

Start scrapping with command:

npm run local-start

Or, if you need a clean start:

npm run local-start-fresh

This will remove the results of all previous launches!

The result of the launch will be in apify_storage/datasets/default

It will be something like this:

{
  "title": "AMP Pages Fundamentally Changing How the Web Works",
  "vote": 47,
  "thumb": "https://webdesignernews.s3.amazonaws.com/post_images/2657750/vlcsnap-2019-04-17-00h58m00s057-728x407-1d95-140x112.jpg",
  "link": "https://www.webdesignernews.com/redirect/id/2657750",
  "source": "androidpolice.com",
  "date": "2 days ago",
  "requestUrl": "https://webdesignernews.com/page/3"
}
{
  "title": "Managing Complexity",
  "vote": 71,
  "thumb": "https://webdesignernews.s3.amazonaws.com/post_images/2657412/1*RiHC2-4TI3SKYnw4TCSbrg-f33b-140x112.jpeg",
  "link": "https://www.webdesignernews.com/redirect/id/2657412",
  "source": "uxdesign.cc",
  "date": "4 days ago",
  "requestUrl": "https://webdesignernews.com/page/4"
}
{
  "title": "The Psychology of Pricing",
  "vote": 111,
  "thumb": "https://webdesignernews.s3.amazonaws.com/post_images/2659276/price-comparison-old-ug-e5f2-140x112.png",
  "link": "https://www.webdesignernews.com/redirect/id/2659276",
  "source": "quicksprout.com",
  "date": "2 days ago",
  "requestUrl": "https://webdesignernews.com/page/3"
}
You can’t perform that action at this time.