Clone this wiki locally
This Hive Serde will be used to represent blocks and transactions from crypto ledgers, such as the Bitcoin Blockchain, as tables in Hive. These tables can then be joined with other tables containing other relevant information, such as stock market movements or weather patterns. The following Hive Serde is available
- BitcoinBlockSerde: Represents information about BitcoinBlock(s) containing transactions transparently as a table in Hive
Note that the Hadoop File Format and Hive Serde are available on Maven Central and you do not need to build and publish it to a local Maven anymore to use it. Furthermore, they are available from the releases page.
git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger
You can build the application by changing to the directory hadoopcryptoledger/hiveserde and using the following command:
../gradlew clean build
After the build you will find the Serde in ./build/libs/hadoopcryptoledger-hiveserde-1.0.3.jar It is assumed that you have build hadoopcryptoledger-fileformat-1.0.3.jar Alternatively you can download both from the release page.
You need to put it into a local directory (e.g. /tmp/hadoopcryptoledger-hiveserde-1.0.3.jar) and execute hive. Enter the following line every time you start hive:
add jar /tmp/hadoopcryptoledger-fileformat-1.0.3.jar add jar /tmp/hadoopcryptoledger-hiveserde-1.0.3.jar
You can create any database in hive:
create database blockchains;
You can then create a table containing Bitcoin Blockchain data as follows:
Create table BitcoinBlockchain ROW FORMAT SERDE 'org.zuinnote.hadoop.bitcoin.hive.serde.BitcoinBlockSerde' STORED AS INPUTFORMAT 'org.zuinnote.hadoop.bitcoin.format.mapred.BitcoinBlockFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.mapred.lib.NullOutputFormat' LOCATION '/path/to/bitcoin/blockchain/files';
The table has the following structure:
magicno binary from deserializer blocksize int from deserializer version int from deserializer time int from deserializer bits binary from deserializer nonce int from deserializer transactioncounter bigint from deserializer hashprevblock binary from deserializer hashmerkleroot binary from deserializer transactions array<struct<version:int,incounter:binary,outcounter:binary,listofinputs:array<struct<prevtransactionhash:binary,previoustxoutindex:bigint,txinscriptlength:binary,txinscript:binary,seqno:bigint>>,listofoutputs:array<struct<value:bigint,txoutscriptlength:binary,txoutscript:binary>>,locktime:int>> from deserializer
Find more examples on how to configure the InputFileFormat via TBLPROPERTIES here: https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/hive-bitcoin
Hint: It is recommended to select the relevant transactions and blocks from the provided tables and do CTAS to store them in another table using a format suitable for analytics, such as ORC or Parquet.
Understanding the structure of Bitcoin data: