Skip to content

Latest commit

 

History

History

HiveOperator

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

HiveOperator (Source code)

This operator provides functionality to query a Hive Metastore server using a HiveQL string and returns a response in the format of a delimited string.

The operator runs on a custom Docker images that extends the public docker image python:3.6.4-slim-stretch and uses the Kerberos client binary krb5-user as well as libsasl2 for Ubuntu. The PyHive python module is developed and maintained by Dropbox: https://github.com/dropbox/PyHive

alt text

(Graph source code)

alt text

Requirements

Content

1. Dockerfile

  • Extends the SAP-delivered Dockerfile python:3.6.4-slim-stretch
  • Imports the python library pyhive
  • Install Kerberos client libraries

2. Custom operator 'HiveOperator'

  • Derived from Python3Operator
  • Uses image tags python36:"", pyhive:pip3 and tornado:5.0.2
  • input port inSql of type string: expects a single HiveQL-compliant string without a semicolon
  • output port output of type string: outputs the response from the Hive Metastore server, columns are delimited by a comma (default) but can be overriden using the delimiter configuration parameter (See description below)

3. Sample graph HiveOperator_test

  • Provides an interactive terminal to query a Hive Metastore server and display the results. Note, the HiveOperator can only process one HiveQL statement at a time.

How to run

  • Import solution/HiveOperator-1.0.tgz via SAP Data Intelligence System Management -> Files -> Import Solution
  • Run the Graph -> examples.HiveOperator_test

Operator configuration parameters

database:                 Specify which database in Hive metastore to connect to
delimiter:		  Used to separate columns in HiveOperator output e.g. 1.34;Hello;World;
hive_hostname:		  Hostname or IP address to Hive Metastore server
hive_port:		  Port used by Hive Metastore server
http_mode:		  If hive.server2.transport.mode is set to http, set this parameter to true
kerberos_enabled:	  If Hive cluster is kerberized set to true and read additional notes below
kerberos_keytab_filename: The file name of the uploaded keytab file (case sensetive)
kerberos_principal: 	  Kerberos principal used with uploaded keytab file
kerberos_realm: 	  Kerberos realm used with principal and keytab file
username: 		  Username for plain authentication
password: 		  Password for plain authentication

Kerberos configuration (Optional) Upload .keytab and krb5.conf file via the HiveOperator designer. These will be copied into the docker container at runtime. Remember to specify the kerberos realm and principal name in the operator's configuration section when designing your graph.

alt text

Troubleshooting

For detailed errors messages check the failing vflow pod's log. Note: Bad HiveQL syntax will cause the graph to terminate. HiveQL statements must not include a semicolon.