Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting thrift timeouts with large number of tables in hive #153

Closed
bilsch opened this issue Feb 19, 2015 · 7 comments
Closed

Getting thrift timeouts with large number of tables in hive #153

bilsch opened this issue Feb 19, 2015 · 7 comments

Comments

@bilsch
Copy link

bilsch commented Feb 19, 2015

We have recently upgraded a hadoop cluster from cdh4 => cdh5.2 and also picked up a pretty big set of changes in hue.

The environment currently has approx 17k tables with a variety of column complexities ( eg some are simple with just a few and others are far more complex with 10+ columns ). The behavior introduced some time in the past with hue seems to have changed a few things.

The exception we tend to get:

[18/Feb/2015 19:23:09 +0000] thrift_util  WARNING  Not retrying thrift call ExecuteStatement due to socket timeout
[18/Feb/2015 19:23:09 +0000] thrift_util  INFO     Thrift saw a socket error: timed out

The old layout of hue ( at least in 2.2.0 not sure if/when the change took place ) has a simple editor for the main hive editor with separate a separate tab for the table listing. Each table had a preview button and so was far less complex.

The layout of the hive editor now has a drop-down to select databases ( default or users as I understand it ). Below the drop-down is a list of tables and an option to see sample data for each table. This seems to be loaded and cached, however the issue we are getting is that the volume of tables and cost of generating previews for all in a single http request - our user browsers time out ( web servers too ).

To "get by" we have bumped timeouts up but even now this is still not enough ( the times were bumped 11/Feb ).

I would suggest one of two possible fixes, certainly not exhaustive:

  1. Allow a config conditional to auto fetch preview data. Can default to enable, but at least we can disable and retain usability
  2. Simplify the page design to have the preview fetch occur as an async and have the browser poll in a separate http request, allowing the page to render ( and maybe put a "...loading" div in there )

Versions:

cdh 5.2.1
hue version is from 5.2.3
Host os: Centos 6.2
python: Python 2.6.6
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

hive> show tables;
(...)
Time taken: 0.76 seconds, Fetched: 16455 row(s)

Not sure if you guys need/want any additional information.

@romainr
Copy link
Contributor

romainr commented Feb 19, 2015

The call made to HiveServer2 to list the tables is pretty slow, we recently changed it to 'show tables'.

Does it make it better at least for now? https://issues.cloudera.org/browse/HUE-2243

I think we return a max of 5000 tables by DB currently. We could up it but this is might create problems as it is a lot.

@enricoberti , I thought we were doing #2 already?

@bilsch
Copy link
Author

bilsch commented Feb 19, 2015

Romain, I can give that patch a try. Assume its already committed to master yes?

@romainr
Copy link
Contributor

romainr commented Feb 19, 2015

Yes, you have the master commit id link in the JIRA

@grisha
Copy link

grisha commented Feb 19, 2015

@romainr SHOW TABLES doesn't make any significantly better, still takes several minutes and causes our nginx to timeout. Ideally it should only try to load the currently selected database (it seems it's trying to load every single one at once), not all of them, and limit itself to no more than a 1000 tables or something like that (may be something configurable).

@romainr
Copy link
Contributor

romainr commented Feb 19, 2015

@enricoberti is going to check why it is not just loading just the tables of the currently selected DB

@enricoberti
Copy link
Contributor

I just checked on master and the tables are loaded asynchronously and just for the selected database (and only if not cached already).

@romainr
Copy link
Contributor

romainr commented Jul 29, 2015

FYI:
One timeout was not being set and could also deadlock:
26d9545
3d4a0e8

@romainr romainr closed this as completed Aug 10, 2015
abraslavskiy referenced this issue in mapr/hue Apr 19, 2017
[MAPR-HUE-9] Hue doesn't display correctly progress of Oozie Job
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants