Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2134] Prevent implicit column filter list from getting serialized while submitting task to executor #1935

Conversation

manishgupta88
Copy link
Contributor

Problem
In the current store blocklet pruning in driver and no further pruning takes place in the executor side. But still the implicit column filter list being sent to executor. As the size of list grows the cost of serializing and deserializing the list is increasing which can impact the query performance.

Solution
Remove the list from the filter expression before submitting the task to executor.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?
    No
  • Any backward compatibility impacted?
    No
  • Document update required?
    No
  • Testing done
    UT added
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3535/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2299/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3377/

*
* @param expression
*/
public void removeInExpressionFromFilterExpression(Expression expression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should not belong here. Better do in scanrdd only

@manishgupta88 manishgupta88 force-pushed the executor_filter_list_serialization branch from 4630dbf to 252daa4 Compare February 8, 2018 08:39
…and deserializing to executor to improve query performance
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3585/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2348/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3434/

@ravipesala
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 11a795c Feb 9, 2018
asfgit pushed a commit that referenced this pull request Feb 9, 2018
…rialized while submitting task to executor

Problem
In the current store blocklet pruning in driver and no further pruning takes place in the executor side. But still the implicit column filter list being sent to executor. As the size of list grows the cost of serializing and deserializing the list is increasing which can impact the query performance.

Solution
Remove the list from the filter expression before submitting the task to executor.

This closes #1935
anubhav100 pushed a commit to anubhav100/incubator-carbondata that referenced this pull request Jun 22, 2018
…rialized while submitting task to executor

Problem
In the current store blocklet pruning in driver and no further pruning takes place in the executor side. But still the implicit column filter list being sent to executor. As the size of list grows the cost of serializing and deserializing the list is increasing which can impact the query performance.

Solution
Remove the list from the filter expression before submitting the task to executor.

This closes apache#1935
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants