Skip to content

Hive UDF

Jörn Franke edited this page Dec 14, 2019 · 26 revisions

The Hive UDF adds cryptoledger-specific functionality to facilitate working with the data in Hive. The same functionality might also be often archived with standard Hive functionality (see here for examples using standard Hive with the Bitcoin blockchain).

Currently the Hive UDF has the following functionality:

  • Bitcoin
    • hclBitcoinScriptPattern: extracts information about the destination of a transaction based on txOutScript. (1) a transfer for pay-to-witness-public-key-hash: P2WPKH_address (2) for pay-to-witness-public-key-hash pay-to-witness-public-key-hash (P2WPKH) nested in BIP16 P2SH: P2WPKHP2SH_address (3) a transaction for 1-of-2 multi-signature version 0 pay-to-witness-script-hash (P2WSH): P2WSH_address (4) a standard transfer to a Bitcoin address : "bitcoinaddress_ADRESS" where ADDRESS is the Bitcoin address, (5) an (obsolete) transfer to a public key: "bitcoinpupkey_PUBKEY" where PUBKEY is the public key, (6) in case of output that cannot be spent: "unspendable", (7) in case anyone can spend: "anyone", (8) in case of transaction puzzle: "puzzle_HASH256" where HASH256 is the puzzle (9) in all other cases null (different type of Bitcoin transaction)
    • hclBitcoinTransactionHash: calculates the hash of a BitcoinTransaction (txid) and returns it as byte array. This can be used to link the input of a transaction with the output of the originating transaction to build a transaction graph.
    • hclBitcoinTransactionHashSegwit: calculates the hash of a BitcoinTransaction including Segwit information (wtxid) and returns it as byte array. This can be used to link the input of a transaction with the output of the originating transaction to build a transaction graph.
  • Namecoin
    • hclNamecoinExtractField: extracts further information from a firstUpdate or update Namecoin operation (e.g. domain name or domain information). Returns an Array of Size 2, where the first item is the domain name and the second item are the domain information. As an input you use the output script of a transaction.
    • hclNamecoinGetNameOperation: extracts the type of Namecoin operation (e.g. OP_NAME_NEW, OP_NAME_FIRSTUPDATE, OP_NAME_UPDATE or unknown)
  • Ethereum
    • hclEthereumGetTransactionHash: to get the transaction hash of an EthereumTransaction
    • hclEthereumGetSendAddress: to get the sendaddress (from) for an EthereumTransaction and you need to provide a chainId (1 Mainnet, 2 Morden, 3 Rposten)
    • hclEthereumCalculateChainId: to calculate the chainid of an EthereumTransaction

Build

Note that the Hadoop File Format, the Hive Serde and the Hive UDF are available on Maven Central and you do not need to build and publish it to a local Maven anymore to use it. Furthermore, they are available from the releases page.

git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger

You can build the application by changing to the directory hadoopcryptoledger/hiveudf and using the following command:

../gradlew clean build

Deploy

Before you can deploy the Hive UDF you need to deploy the Hive Serde.

After the build you will find the UDF in ./build/libs/hadoopcryptoledger-hiveudf-1.2.1.jar Alternatively you can download it from the release page.

You need to put it into a local directory (e.g. /tmp/hadoopcryptoledger-hiveudf-1.2.1.jar) and execute hive.

Note that you need for the EthereumUDFs (and only for them) the BouncyCastle library (download here). Enter the following line every time you start Hive in this case:

add jar /tmp/bcprov-ext-jdk15-on-1.64.jar

Enter the following line every time you start hive:

add jar /tmp/hadoopcryptoledger-hiveudf-1.2.1.jar 

Furthermore, you need to add the UDF in Hive either as

Use