Skip to content

benitotrm/database_from_Bitcoin_Core

Repository files navigation

database_from_Bitcoin_Core

This project is focused on extracting Bitcoin blockchain data from Bitcoin Core via its RPC API using a Python ETL process, which stores the data in Parquet files. The schema for the database is as follows:

Schema Field Type Description Index
Blocks height int32 Height of the block Unique
block_hash string Hash of the block
time int64 Timestamp of the block (Unix)
tx_count int32 Number of transactions in block
------------------- --------------- ----------- --------------------------------- ----------------
Transactions height int32 Block height Non-Unique
block_hash string Hash of the related block
txid string Transaction ID
is_coinbase bool_ Is the transaction a coinbase?
------------------- --------------- ----------- --------------------------------- ----------------

Bitcoin Core setup

This guide assumes you're alredy running a full bitcoin node on your machine and have properly configured the RCP API. If this is not the case please follow these instructions first: https://bitcoin.org/en/full-node. Once your node is fully synced you can start using this repo to generate your database.

Python environment setup

Setting up a virual environment:

python3 -m venv venv
source venv/bin/activate
pip install -e . 

And you can use 'deactivate' on the bash when done.

You need to set your RPC API credentials rpc_user and rpc_password for the code to work. Use the python-dotenv library and add them to a .env file on your root folder like this:

RPC_USER=your_rpc_username
RPC_PASSWORD=your_rpc_password

Then, it's important you run the following unit test first, as they are focused on ensuring the API is properly set up.

python -m unittest discover

ETL process

The following commands execute the complete population of each dataset:

python src/blocks/populate_blocks.py
python src/transactions/populate_transactions.py
...

Same commands with an example use of their optional parameters:

# Selecting 'start' and 'end' block height
python src/populate_blocks.py --start 10000 --end 20000 

DQ process

The following are the commands execut the relevant Data Quality checks of each dataset:

python src/blocks/blocks_dq.py
python src/transactions/transacitions_dq.py 
...

Workflow automation

You can run these files manually or setup an automated workflow using cron (on linux) that can run automatically or manually.

For the cron job to work seamlessly I prefer to use SSH. I set the agent to start on each machine reboot so it's properly configured to update the agent, environment and key paths by adding this configuration on ~/.bashrc.

Cron can be setup as follows:

crontab -e

Line to add to cron for a scheduled midnight run:

0 0 * * * ~/Projects/database_from_Bitcoin_Core/workflow.sh

Example manual run:

~/Projects/database_from_Bitcoin_Core/workflow.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published