Skip to content

Crypto UDF for Impala, including support for MD5, SHA and AES (and others).

License

Notifications You must be signed in to change notification settings

ScalefreeCOM/impala-crypto-udf

 
 

Repository files navigation

impala-crypto-udf by Scalefree International GmbH

This repo contains user-defined functions (UDF) for Apache Impala to implement cryptographic functions in the query language, for example to encrypt and decrypt data or hash values. This is required when building a GDPR secured data lake with or without Data Vault 2.0.

Supported Algorithms

This library is based on https://cryptopp.com/ and intends to implement the following functionality:

AES and AES candidates (encryption/decryption)

AES (Rijndael), RC6, MARS, Twofish, Serpent, CAST-256

other block ciphers (encryption/decryption)

ARIA, Blowfish, Camellia, CHAM, HIGHT, IDEA, Kalyna (128/256/512), LEA, SEED, RC5, SHACAL-2, SIMECK, SIMON (64/128), Skipjack, SPECK (64/128), Simeck, SM4,Threefish (256/512/1024), Triple-DES (DES-EDE2 and DES-EDE3), TEA, XTEA

block cipher modes of operation (encryption/decryption)

ECB, CBC, CBC ciphertext stealing (CTS), CFB, OFB, counter mode (CTR)

hash functions

BLAKE2b, BLAKE2s, Keccack (F1600), SHA-1, SHA-2, SHA-3, SHAKE (128/256), SipHash, Tiger, RIPEMD (128/160/256/320), SM3, WHIRLPOOL

public-key cryptography (encryption/decryption)

RSA, DSA, Determinsitic DSA (RFC 6979), ElGamal, Nyberg-Rueppel (NR), Rabin-Williams (RW), EC-based German Digital Signature (ECGDSA), LUC, LUCELG, DLIES (variants of DHAES), ESIGN

padding schemes for public-key systems

PKCS#1 v2.0, OAEP, PSS, PSSR, IEEE P1363 EMSA2 and EMSA5

elliptic curve cryptography

ECDSA, Determinsitic ECDSA (RFC 6979), ed25519, ECGDSA, ECNR, ECIES, x25519, ECDH, ECMQV

insecure or obsolescent algorithms retained for backwards compatibility and historical value

MD2, MD4, MD5, Panama Hash, DES, ARC4, SEAL 3.0, WAKE-OFB, DESX (DES-XEX3), RC2, SAFER, 3-WAY, GOST, SHARK, CAST-128, Square

Funding

This project is funded by Scalefree to support cryptographic functions in Impala. This is required in order to secure a data lake and support deletion of consumer records, a requirement of the GDPR. Transparent, filesystem-level encryption is not sufficient for this purpose / doesn't meet the legal requirements (consult your lawyers).

More information about GDPR & Data Vault 2.0 can be found in https://kb.scalefr.ee/lphesS

About Scalefree

Founded by Dan Linstedt and Michael Olschimke, Scalefree gives companies across the world the knowledge they need to leverage Big Data in an innovative way. Building upon its name, the services offered by Scalefree allow companies to build comprehensive Business Intelligence solutions without the confining and stifling concern of scalability in the long term.

That said, the focus of the company remains growing its consulting and partnership programs in which organizations from a variety of countries as well as industries can learn to leverage their own Business Intelligence solutions tailored to their own venture.

About Data Vault 2.0

Designed and developed for branches of the US government, Data Vault 2.0 reshapes the way individuals and organization can utilize Big Data within their own businesses. Though its first iteration was developed for the large data load of a government agency, the newest version of Data Vault 2.0 is specifically designed to be easily implemented in a variety of capacities, leading to its success in the financial, governmental, and automotive, to name but a few, sectors across the world.

Get Started

To get started:

  1. Install the impala udf development package: http://archive.cloudera.com/cdh7/ (check out https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/managing-clusters/topics/cm-managing-parcels.html for how to install parcels on Cloudera)
  2. git clone https://github.com/ScalefreeCOM/impala-crypto-udf.git
  3. cmake .
  4. make

The crypto UDFs will get built to build/. This contains test executables that you can run locally, without the impala service installed as well as the shared object artifacts that we can run on impala.

How do I contribute code?

Our goal is to implement as many functionality as possible from the https://cryptopp.com/ as possible.

Please send contributions to molschimke@scalefree.com

Find

We use Github issues to track bugs for this project. Find an issue that you would like to work on (or file one if you have discovered a new issue!). If no-one is working on it, assign it to yourself only if you intend to work on it shortly.

It’s a good idea to discuss your intended approach on the issue. You are much more likely to have your patch reviewed and committed if you’ve already got buy-in from the impala-crypto-udf community before you start.

Fix

Now start coding! As you are writing your patch, please keep the following things in mind:

First, please include tests with your patch. If your patch adds a feature or fixes a bug and does not include tests, it will generally not be accepted. If you are unsure how to write tests for a particular component, please ask on the issue for guidance.

Second, please keep your patch narrowly targeted to the problem described by the issue. It’s better for everyone if we maintain discipline about the scope of each patch. In general, if you find a bug while working on a specific feature, file a issue for the bug, check if you can assign it to yourself and fix it independently of the feature. This helps us to differentiate between bug fixes and features and allows us to build stable maintenance releases.

Finally, please write a good, clear commit message, with a short, descriptive title and a message that is exactly long enough to explain what the problem was, and how it was fixed.

About

Crypto UDF for Impala, including support for MD5, SHA and AES (and others).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 97.0%
  • CMake 2.1%
  • C 0.9%