ETL pipeline for monitoring cripto curency price and build analytical dashboard based on collected data inside Data Warehouse.
The following system diagram represents the project structure. From the picture, it may be seen that the system is composed from 4 docker containers with following purposes:
- pipeline performs complete ETL cycle (cron job)
- warehause contains main storage of cleaned data (Clickhouse)
- stagedb plays the role of a backup storage for raw data (MongoDB)
- dashboard generates and shows reports out of cleaned data (Metabase)
Additionally, dashboard uses PostgreSQL database container as its internal storage.
βββ docs
β
βββ pipeline
β βββ cron # Scheduler configs
β βββ docker # Environment configs
β βββ logs # Logs for pipeline service
β β
β βββ src # ETL source code
β β βββ config.py # Enviroment parsers
β β βββ db.py # Warehouse management
β β βββ etl.py # ETL functions
β β βββ run.py # Pipeline script
β β βββ stagedb.py # StageDB management
β β
β βββ tests # Unittests for ETL source code
β
βββ warehouse
β βββ db # Warehouse database files (Clickhouse)
β βββ logs # Logs for warehouse service
β
βββ stagedb
β βββ db # StageDB database files (MongoDB)
β
βββ dashboard
βββ db # Dashboard database files (PostgreSQL)
βββ docker # Environment configs
βββ logs # Logs for dashboard service
To run the project, perform the following steps:
- Clone repo to your machine
git clone https://github.com/Genvekt/coincap_monitor.git
cd coincap_monitor
- Create
.env
file with the following envairoment parameters:
-
API_KEY
: key that you must retrieve from here -
API_URL
: url to the CoinCap API -
STAGEDB_HOST
,STAGEDB_DB
,STAGEDB_USER
,STAGEDB_PASSWORD
,STAGEDB_PORT
: MongoDB access data -
CLICKHOUSE_HOST
,CLICKHOUSE_DB
,CLICKHOUSE_USER
,CLICKHOUSE_PASSWORD
,CLICKHOUSE_PORT
: ClickHouse access data -
POSTGRES_HOST
,POSTGRES_DB
,POSTGRES_USER
,POSTGRES_PASSWORD
,POSTGRES_PORT
: PostgreSQL access dataExample
.env
file:API_KEY={YOUR_API_KEY} API_URL=http://api.coincap.io/v2 STAGEDB_HOST=stagedb STAGEDB_DB=stagedbdb STAGEDB_USER=stagedbuser STAGEDB_PASSWORD={YOUR_MONGODB_PASSWORD} STAGEDB_PORT=27017 CLICKHOUSE_HOST=warehouse CLICKHOUSE_DB=clickhousedb CLICKHOUSE_USER=clickhouseuser CLICKHOUSE_PASSWORD={YOUR_CLICKHOUSE_PASSWORD} CLICKHOUSE_PORT=9000 POSTGRES_HOST=dashboard_db POSTGRES_DB=postgres POSTGRES_USER=postgres POSTGRES_PASSWORD={YOUR_POSTGRESQL_PASSWORD} POSTGRES_PORT=5432
- Run application
docker network create CoinCapNet
docker-compose run --build -d
- Stop application:
docker-compose down -v