New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting thrift timeouts with large number of tables in hive #153
Comments
The call made to HiveServer2 to list the tables is pretty slow, we recently changed it to 'show tables'. Does it make it better at least for now? https://issues.cloudera.org/browse/HUE-2243 I think we return a max of 5000 tables by DB currently. We could up it but this is might create problems as it is a lot. @enricoberti , I thought we were doing #2 already? |
Romain, I can give that patch a try. Assume its already committed to master yes? |
Yes, you have the master commit id link in the JIRA |
@romainr |
@enricoberti is going to check why it is not just loading just the tables of the currently selected DB |
I just checked on master and the tables are loaded asynchronously and just for the selected database (and only if not cached already). |
[MAPR-HUE-9] Hue doesn't display correctly progress of Oozie Job
We have recently upgraded a hadoop cluster from cdh4 => cdh5.2 and also picked up a pretty big set of changes in hue.
The environment currently has approx 17k tables with a variety of column complexities ( eg some are simple with just a few and others are far more complex with 10+ columns ). The behavior introduced some time in the past with hue seems to have changed a few things.
The exception we tend to get:
The old layout of hue ( at least in 2.2.0 not sure if/when the change took place ) has a simple editor for the main hive editor with separate a separate tab for the table listing. Each table had a preview button and so was far less complex.
The layout of the hive editor now has a drop-down to select databases ( default or users as I understand it ). Below the drop-down is a list of tables and an option to see sample data for each table. This seems to be loaded and cached, however the issue we are getting is that the volume of tables and cost of generating previews for all in a single http request - our user browsers time out ( web servers too ).
To "get by" we have bumped timeouts up but even now this is still not enough ( the times were bumped 11/Feb ).
I would suggest one of two possible fixes, certainly not exhaustive:
Versions:
cdh 5.2.1
hue version is from 5.2.3
Host os: Centos 6.2
python: Python 2.6.6
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
hive> show tables;
(...)
Time taken: 0.76 seconds, Fetched: 16455 row(s)
Not sure if you guys need/want any additional information.
The text was updated successfully, but these errors were encountered: