The goal of this project is implement a proof of concept for a system that collects rating information from Roku website using microservices and NATS
There are two solutions that were implemented. One is using chromedp to perform the crawler and the second using an API call. Both
solutions have the same architecture, the only difference is the use of a chromedp container, as showed in the diagrams below.
The Reader Client is parsing a CSV file with URLs that are sent to the system using GRPC calls.
The Request Listener is getting the GRPC requests and put them into a NATS Queue.
The main reason to use a queue is that the crawler process takes several seconds so we need a way to keep a list of URLs that should be processed.
Each URL is associated to a request-id that is generated by the Reader Client so that is possible to trace a request with the final result.
On the other hand, the use of a queue allows scale the system having multiple Web Cralers (or API Crawlers)
collecting the ratings and storing them in a Postgres DB.
The crawler microservice is using the Fan-In Fan-Out pattern to collect data from Roku website.
The results, comments and conclusions of the PoC are here
- Go 1.18
- nats.go v1.13.1
- grpc v1.43.0
- protobuf v1.5.2
- cobra v1.4.0
- viper v1.10.1
- Go 1.18
- Goland or other IDE
- Docker
There are three microservices that should be executed to run the system:
- The first one is the
readerwhich parse the CSV and send GRPC requests with the URLs - The second microservice is the
listenerwhich runs the GRPC server listening for the URLs that should be parsed and send them to NATS queue - The third one is the
crawlerwhich collects the ratings using Chromedb or an API url call
GRPC and Proto Buffers are defined at grpcapi/pb
The application CLI is implemented with Cobra and Viper libraries, so it is possible to override the configuration with flags and environment variables.
The precedence to override a configuration is: flag -> environment variable -> configuration field
Here some examples to run the application
./reader -f csv/target_urls_test.csvruns thereadermicroservice./listenerruns thelistenermicroservice./crawlerruns thecrawlermicroservice
To run with docker just use docker-compose build and then docker-compose up --scale crawler=10
Each microservice has its configuration file at config/local.yaml already set to be used with Docker.
To use the API option at crawler microservice, change the crawler.use_api flag to false
- Check
chromedpimage configuration to improve performance - Add unit tests, specially mocks for DB, NATS and Browser

