Skip to content

Ajinkya009/asyncCrawler

Repository files navigation

Node.js Async Crawler

Requirements

Recursively crawl popular blogging website https://medium.com using Node.js and harvest all possible hyperlinks that belong to medium.com and store them in a database.

What is needed to be stored?

  1. Every unique URL encountered.
  2. The total reference count of every URL.
  3. A complete unique list of parameters associated with this URL

Prerequisites

1) Node.js
2) MongodB

Running server in development mode

After mongodb server is up and running, run following commands:

1) npm install
2) npm start dev-server

Once the server starts:

1) It will start crawling data and then it will upload it to database.
2) User can get all the uploaded data by sending a GET request to 'http://localhost:3000/api/url/getAllData'.

Deployment in production mode

docker-compose build
docker-compose up

Once the server starts, it will follow same steps mentioned in the above section.

Built With

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published