Spacebox is a comprehensive set of open-source tools for data indexation and storage, utilizing the cutting-edge technology of ClickHouse as the foundation for the storage facility. It's built to provide quick access to large data sets and ensure a stable architecture that guarantees data consistency and enables a lightweight setup.
The overall architecture provided on the scheme:
Following sdk modules supported in current version:
- bank
- core
- auth
- authz
- distribution
- gov
- mint
- staking
- slashing
- feegrant
- ibc
- liquidity
- Spacebox-crawler - pull and parce blocks, pass them to Kafka
- Crawler MongoDB - light DB to store crawled blocks status(e.g., OK, Error, missing)
- Apache Kafka - data broker, a buffer between crawler and main DB
- ClickHouse - one of the most powerful DB to store indexed data
- Zookeeper - helps to maintain Kafka cluster (to remove in release)
- Kafka UI - helps to track performance and check (to remove in release)
Such architecture was chosen to obtain consistency of data being parsed. Not a single block should be missed, nor a single tx message. Also, separating data processing layers allow relatively easy modification: swap the main DB, for example, to one that suits your project better.
First of all, you'll need an x86-64 Linux machine, preferably with a fast drive, Docker engine, and Docker-compose installed and, of course, the node you're going to pull chain data from.
To start spacebox
populate .env
with chain settings like RPC, GRPC, and start \ stop height:
START_HEIGHT=5048767 # Start block height
STOP_HEIGHT=0 # Stop block height, 0 for actual height
WORKERS_COUNT=15 # go workers to pull data in async mode
SUBSCRIBE_NEW_BLOCKS=true # pull actual blocks
# Chain settings
CHAIN_PREFIX=cosmos # Prefix of indexing chain
WS_ENABLED=true # Websocket enabled
RPC_URL=http://0.0.0.0:26657 # RPC API
GRPC_URL=0.0.0.0:9090 # GRPC API, no HTTP\S prefix
GRPC_SECURE_CONNECTION=false # GRPC secure connection
When .env
is filled to start all containers run:
docker-compose up -d
To stop all containers use:
docker-compose down
Also would be a good idea to give the Docker permissions and ownership for the ./volumes
folder:
To be fixed in release
chown -R 1001:1001 volumes/
It usually requires around ~2 minutes to start everything. Check the log of each container with docker logs <container_name>
to see what's going under the hood. Also, it is possible to adjust some parameters on the fly, edit .env
, and restart the appropriate container.
Also if you're indexing node from the same host machine you may need to add following line to crawlers docker-compose.yaml
section:
ports:
- '2112:2112'
extra_hosts:
- "host.docker.internal:host-gateway"
and set RPC and GRPC addresses in the .env
accordingly:
RPC_URL=http://host.docker.internal:26657
GRPC_URL=host.docker.internal:9090
Both spacebox-crawler and spacebox-writer provide some handy Prometheus-compatible metrics for monitoring. Add the followitg to your Prometheus config:
- job_name: 'spacebox-crawler'
scrape_interval: 5s
static_configs:
- targets: ['localhost:2112']
Example of the Grafana dashboard:
*TODO: add grafana board json