API server and scraper application
The stack consists of the following elements:
- Python 3.9.x
- Postgres 13
- Redis 5.x
To replicate the development environment (somewhat) faithfully on your development machine, we use Docker Compose. It is highly recommended to use Docker for development. If you know what you are doing, you can set up the individual parts of the stack yourself.
If you are on Windows or Mac, use the Docker Desktop
for your operating system. On Linux, you will need to install docker
and
docker-compose
tools via your distro's preferred installation method.
This is done both before the first time you want to run the Docker cluster.
docker-compose build
To start the Docker cluster up, run the following command:
docker-compose up
Or if we want to scale some parts we can start with next command:
docker-compose up -d --scale worker-scraping=3 --scale worker-mapping=2 --scale api-service=1
Once it's up, the development server is available on port 80.
To remove all containers defined in docker-compose.yml, stop the cluster and run:
docker-compose down --remove-orphans
or
docker-compose down -v --rmi all --remove-orphans
After that, set everything up as if it's the first time.
It is a DB service with a PostgreSQL database instance.Can be accessed on address 127.0.0.1:5432
Celery monitoring utility.
Can be accessed on address 127.0.0.1:6660
It is used for monitoring and checking tasks (scraping, mapping/transforming),
with all time consumed and eventual errors.
It is Flask mini web api with one endpoint.
It is a reverse proxy for our mini Flask API.
Can be accessed on 127.0.0.1
. And accepts date for our single endpoint.
127.0.0.1/2021/04/16
-> return result for date 2021-04-16
Beat service is a cron job. It triggers a scraping task every 60sec.
These two services do scraping and mapping/transformations jobs. They are independent, so can scale on it's own.