NATS Based User Rating Web Crawler

The goal of this project is implement a proof of concept for a system that collects rating information from Roku website using microservices and NATS

Solutions

There are two solutions that were implemented. One is using chromedp to perform the crawler and the second using an API call. Both solutions have the same architecture, the only difference is the use of a chromedp container, as showed in the diagrams below.

The Reader Client is parsing a CSV file with URLs that are sent to the system using GRPC calls. The Request Listener is getting the GRPC requests and put them into a NATS Queue. The main reason to use a queue is that the crawler process takes several seconds so we need a way to keep a list of URLs that should be processed. Each URL is associated to a request-id that is generated by the Reader Client so that is possible to trace a request with the final result.

On the other hand, the use of a queue allows scale the system having multiple Web Cralers (or API Crawlers) collecting the ratings and storing them in a Postgres DB.

The crawler microservice is using the Fan-In Fan-Out pattern to collect data from Roku website.

Chromedp Solution Diagram

API Solution Diagram

Proof Of Concept Results

The results, comments and conclusions of the PoC are here

Running this project

Dependencies

Go 1.18
nats.go v1.13.1
grpc v1.43.0
protobuf v1.5.2
cobra v1.4.0
viper v1.10.1

Development environment

Go 1.18
Goland or other IDE
Docker

Microservices

There are three microservices that should be executed to run the system:

The first one is the reader which parse the CSV and send GRPC requests with the URLs
The second microservice is the listener which runs the GRPC server listening for the URLs that should be parsed and send them to NATS queue
The third one is the crawler which collects the ratings using Chromedb or an API url call

GRPC and Proto Buffers are defined at grpcapi/pb

Run microservices

The application CLI is implemented with Cobra and Viper libraries, so it is possible to override the configuration with flags and environment variables. The precedence to override a configuration is: flag -> environment variable -> configuration field

Here some examples to run the application

./reader -f csv/target_urls_test.csv runs the reader microservice
./listener runs the listener microservice
./crawler runs the crawler microservice

To run with docker just use docker-compose build and then docker-compose up --scale crawler=10

Configuration

Each microservice has its configuration file at config/local.yaml already set to be used with Docker. To use the API option at crawler microservice, change the crawler.use_api flag to false

TODOs

Check chromedp image configuration to improve performance
Add unit tests, specially mocks for DB, NATS and Browser

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
crawler		crawler
docs		docs
grpcapi		grpcapi
listener		listener
reader		reader
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NATS Based User Rating Web Crawler

Solutions

Chromedp Solution Diagram

API Solution Diagram

Proof Of Concept Results

Running this project

Dependencies

Development environment

Microservices

Run microservices

Configuration

TODOs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

StevenRojas/natscrawler

Folders and files

Latest commit

History

Repository files navigation

NATS Based User Rating Web Crawler

Solutions

Chromedp Solution Diagram

API Solution Diagram

Proof Of Concept Results

Running this project

Dependencies

Development environment

Microservices

Run microservices

Configuration

TODOs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages