Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting Superset to Hive with Kerberos failing #4951

Closed
3 tasks done
mayukhghoshme opened this issue May 8, 2018 · 9 comments
Closed
3 tasks done

Connecting Superset to Hive with Kerberos failing #4951

mayukhghoshme opened this issue May 8, 2018 · 9 comments

Comments

@mayukhghoshme
Copy link

mayukhghoshme commented May 8, 2018

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if any
  • I have reproduced the issue with at least the latest released version of superset
  • I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.22.1

I am trying to connect Superset to a Kerberised Hive cluster, however, that is failing with the Kerberos error.

2018-05-08 01:59:37,051:ERROR:root:Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: /tmp/krb5cc_0))
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/superset/views/core.py", line 1507, in testconn engine.connect()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2091, in connect return self._connection_cls(self, **kwargs)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 90, in __init__ if connection is not None else engine.raw_connection()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2177, in raw_connection self.pool.unique_connection, _connection)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect return fn()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 328, in unique_connection return _ConnectionFairy._checkout(self)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 766, in _checkout fairy = _ConnectionRecord.checkout(pool)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 516, in checkout rec = pool._do_get()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1138, in _do_get self._dec_overflow()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__ compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1135, in _do_get return self._create_connection()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 333, in _create_connection return _ConnectionRecord(self)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 461, in __init__ self.__connect(first_connect_check=True)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 651, in __connect connection = pool._invoke_creator(self)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 105, in connect return dialect.connect(*cargs, **cparams)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 393, in connect return self.dbapi.connect(*cargs, **cparams)
File "/usr/lib/python2.7/site-packages/pyhive/hive.py", line 64, in connect return Connection(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pyhive/hive.py", line 159, in __init__ self._transport.open()
File "/usr/lib/python2.7/site-packages/thrift_sasl/__init__.py", line 79, in open message=("Could not start SASL: %s" % self.sasl.getError()))
TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: /tmp/krb5cc_0))

Below is the setup and things I have tried:

  1. Installed Superset following the docs in AWS EC2 instance.
  2. Started the Superset web server as root on port 80.
  3. Installed necessary Kerberos packages.
  4. Created user(let us call this X), got keytab, was able to do kinit.
  5. Creating a new data source from the UI with the below connection string fails:
    hive://xx.xx.xx.xx:10000/default?auth=KERBEROS&kerberos_service_name=hive
  6. Have tried with "Impersonate the Logged on user" and without it.

I am able to connect to hive from the Python shell using the user X and SQLAlchemy:

import sqlalchemy
engine = sqlalchemy.create_engine('hive://xx.xx.xx.xx:10000/default', connect_args={'auth': 'KERBEROS','kerberos_service_name': 'hive'})
c = engine.connect()
result = c.execute('SELECT count(*) from my_schema.my_table')
result.fetchall()
[(5132,)]

By the looks of the error message, it seems to me that is trying to look for a Kerberos credential cache for the user root. No Kerberos credentials available (default cache: /tmp/krb5cc_0))

Am I missing something here?

@mistercrunch
Copy link
Member

mistercrunch commented May 9, 2018

Have you tried using the engine_params key in the extra JSON blob of the database configuration? It would look something like this:
screen shot 2018-05-08 at 9 57 44 pm

This should result in something much like in your python shell

@mayukhghoshme
Copy link
Author

image

Above is the error I am getting.

@mayukhghoshme
Copy link
Author

I think I was unable to form the JSON string correctly. Formed it correctly and I have gotten the same error:

image

@mayukhghoshme
Copy link
Author

Okay, I figured it out. As I was mentioning earlier, it was trying to look for the Kerberos cache for the user 'root'. No Kerberos credentials available (default cache: /tmp/krb5cc_0. Got the hint from the file name since Kerberos cache tickets are usually appended with the gid of the user. In this case it is _0 which is for 'root'.

So the resolution is that,

  1. All the below steps should be performed by a user who can authenticate itself against KDC and not some user like 'root'.

image

  1. Using Impyla seemed to be more elegant than PyHive. Install impyla using pip as root.

pip install impyla

  1. You need to have the keytab file for the user used in step1. Do a kinit using the same.

  2. Configure the connection string in Superset something like this.

SQLAlchemy URI = impala://<hive_host>:10000/default

    "metadata_params": {},
    "engine_params": {
		"connect_args": {
		    "auth_mechanism": "GSSAPI",
		    "kerberos_service_name":"hive"
		 }
}
}
  1. If something goes wrong regarding the packages, you can connect to the Superset server and use the Python shell to ensure that the modules are working fine, this way:
from sqlalchemy import *
engine = sqlalchemy.create_engine('impala://<hive_host>:10000/default', connect_args={'auth_mechanism': 'GSSAPI','kerberos_service_name': 'hive'})
c = engine.connect()
result = c.execute('SELECT count(*) from my_table')
print result.fetchall()

@juthikashenoy
Copy link

I tried using Impala - connecting to the thrift hive server - I get this Error

ERROR: {"error": "Connection failed!\n\nThe error message returned was:\n(thrift.Thrift.TApplicationException) Invalid method name: 'OpenSession' [SQL: u'SHOW TABLES']"}
Can someone suggest how to fix this ? the jdbc+hive or hive+jdbc gives an error saying dialect doesnt exist . I am using superset v 0.25 . Thanks !

@mistercrunch
Copy link
Member

Impala or "Impyla"?

@juthikashenoy
Copy link

I installed the Impyla library - as suggested in the message trail above - I tried adding more connect args like below -
impala://xxx.xxx.com:11030/
"connect_args": {
auth_mechanism='PLAIN',
"username":"jshenoy",
"password":"fudgetit"
}
I don't have a kerberized hive instance . Here is the stack trace -
2018-06-07 11:22:11,583:INFO:werkzeug:10.21.165.80 - - [07/Jun/2018 11:22:11] "POST /superset/testconn HTTP/1.1" 500 -
2018-06-07 11:22:42,236:DEBUG:impala.hiveserver2:Connecting to HiveServer2 xxx.xxx.com:11030 with PLAIN authentication mechanism
2018-06-07 11:22:42,237:DEBUG:impala._thrift_api:get_socket: host=xxx.xxx.com port=11030 use_ssl=False ca_cert=None
2018-06-07 11:22:42,237:DEBUG:impala._thrift_api:get_transport: socket=<thrift.transport.TSocket.TSocket instance at 0x7f2f41df2128> host=xxx.xxx.com kerberos_service_name=impala auth_mechanism=PLAIN user=jshenoy password=fuggetaboutit
2018-06-07 11:22:42,243:ERROR:root:[Errno 104] Connection reset by peer

@mistercrunch
Copy link
Member

Maybe open an issue with Impyla?

@juthikashenoy
Copy link

ok - I will do that . How about the jdbc+hive dialect , is it supposed to work ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants