Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-22074: Slow compilation due to IN to OR transformation #746

Closed
wants to merge 1 commit into from

Conversation

vineetgarg02
Copy link
Contributor

No description provided.


HiveConf conf;
try {
conf = Hive.get().getConf();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of obtaining the conf object here statically, let's just pass the int value using the ctx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcamachor ctx is passed on to genExprNode by the callers. There are several callers which uses genExprNode. For this we will have to update all of these callers to set the config in TypeCheckCtx.
Fetching HiveConf only occurs if the expression at hand is GenericUDFIn and CBO wasn't executed therefore this shouldn't add much overhead.
Let me know if you think otherwise

Copy link
Contributor

@jcamachor jcamachor Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to pass this value from the callers in the context. You would not need to change all callers; if value is not passed, rewriting could be skipped. I see mainly two advantages of doing this:

  1. if transformation is never happening, we will not be retrieving the conf and this value for every IN clause in a query (note that isCBOExecuted method is misleading, the value returned is foldExpr boolean which is false sometimes even for calls coming from CBO cf. first line in genFilterRelNode method in CalcitePlanner), and
  2. removing the static call to Hive object from within the folding logic.

I see there are other calls to Hive.get() in the class, that information should probably be moved to context too.
These can all be tackled together in a follow-up, but I think since we are cleaning up this logic, it would make sense to do it at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll create a separate jira and tackle this in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcamachor I have created https://issues.apache.org/jira/browse/HIVE-22135 for refactoring Hive.get().getConf().
For the current patch I have gotten green run. Let me know if everything looks good to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants