Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superset, Presto and Kerberos #8794

Closed
elukey opened this issue Dec 9, 2019 · 10 comments
Closed

Superset, Presto and Kerberos #8794

elukey opened this issue Dec 9, 2019 · 10 comments

Comments

@elukey
Copy link
Contributor

elukey commented Dec 9, 2019

Superset cannot work with a kerberized Presto cluster due to a pyhive.

Expected results

Superset to fetch data correctly from a kerberized Presto cluster.

Actual results

The support to query a Kerberized Presto cluster was added in Pyhive, but upstream didn't not make any release (0.6.2 is in fact missing from Pypi https://pypi.org/project/PyHive).

If it was only for Hive support I wouldn't spend much time on Superset to fix this, but Presto's performances are really a game changer for data dashboarding. Is there any plan to explore alternative clients like presto-python-client? I could also offer some dev time to help if needed!

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

@dpgaspar
Copy link
Member

dpgaspar commented Dec 9, 2019

Hi @elukey, not totally aware of this, but are you talking about this new feature on PyHive, dropbox/PyHive@6925cd7 ?

and last release is 0.6.1 from 10-Sep-2018

Also seems like presto-python-client is only a dbapi, superset needs a sqlalchemy dialect also.

Probably the best way to go, would be to kindly ask PyHive project maintainers to create a new release (0.6.2 or 0.7.0)

@elukey
Copy link
Contributor Author

elukey commented Dec 9, 2019

@dpgaspar thanks for answering!

There is already an issue opened long time ago to pyhive's upstream:dropbox/PyHive#288

Judging also from other issues like dropbox/PyHive#296 it seems that dropbox is not maintaining it anymore?

Too bad for presto-python-client, will try to do more research and see if there is something alternative to use..

@elukey
Copy link
Contributor Author

elukey commented Dec 20, 2019

@dpgaspar does anybody in Superset have contacts in Dropbox to see if they can restore the pyhive's support and/or release version 0.6.2??

@elukey
Copy link
Contributor Author

elukey commented Jan 14, 2020

It was mentioned in the Presto-related issue that https://github.com/cloudera/impyla could be reviewed, not sure if it is a good alternative or not.

@elukey
Copy link
Contributor Author

elukey commented Feb 5, 2020

Any idea/news?

@willbarrett
Copy link
Member

Preset has been chatting with one of the Dropbox engineers about this, discussions ongoing.

@clearnote01
Copy link

@elukey I was planning to use superset for a project and assumed that kerberized presto cluster would be available but then found this. I can see that pyHive has released their 0.6.2 version dropbox/PyHive#288 recently, I imagine sqlalchemy needs to update their dialect for presto to use this pyHive version, and then superset needs to update to use this sqlalchemy dialect?

@elukey
Copy link
Contributor Author

elukey commented Mar 29, 2020

@clearnote01 everything should work if you use PyHive 0.6.2, I had to use the following configuration in Superset:

SQLAlchemyURL: presto://hostname.something.com:1234/analytics_hive?protocol=https

Extra:
{
    "metadata_params": {},
    "engine_params": {
    "connect_args": {
             "KerberosConfigPath": "/etc/krb5.conf",
             "KerberosKeytabPath": "/path/of/the/superset/keytab/superset.keytab",
             "KerberosPrincipal": "superset/hostname.something.com@REALM",
             "KerberosRemoteServiceName": "presto",
             "requests_kwargs": {
                 "verify": "/etc/superset/presto_ca/ca.crt.pem"
             }
       }
   },
..cut..
}

Some notes:

  • analytics_hive in the SQLAlchemyURL is the name of the Hive catalog, swap it with what you have configured in Presto.
  • The above config is related to using Presto with TLS self signed certificates.
  • KerberosKeytabPath is the path of the superset keytab on the Superset host
  • KerberosPrincipal is the principal of superset
  • If you want to use "Impersonate the logged on user" then you'll need to make the superset user in hadoop-site.xml as proxy.

Hope it helps :)

@clearnote01
Copy link

@elukey Oh this was extremely helpful. Currently the documentation for presto connector is barebones in Superset, I suppose this could be an excellent addition there.

@elukey elukey closed this as completed Apr 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants