Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3217] Optimize implicit filter expression performance by removing extra serialization #3039

Conversation

manishgupta88
Copy link
Contributor

@manishgupta88 manishgupta88 commented Dec 31, 2018

Fixed performance issue for Implicit filter column

  1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task
  2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient
  • Any interfaces changed?
    No
  • Any backward compatibility impacted?
    No
  • Document update required?
    No
  • Testing done
    Added UT
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA

1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task
2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient
@manishgupta88 manishgupta88 changed the title [WIP] Optimize implicit filter expression performance by removing extra serialization [CARBONDATA-3217] Optimize implicit filter expression performance by removing extra serialization Dec 31, 2018
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2094/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10348/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2299/

blockletIds = new HashSet<>();
blockIdToBlockletIdMapping.put(blockId, blockletIds);
}
blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to catch the NumberFormatException for Integer.parseInt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required to catch the NumberFormatException. Blocklet Id is always expected to be an integer number and in the complete flow it is used as an integer. So if there is any exception in the conversion then the code problem is not here but from the other part of code. So if we handle the exception here the actual cause will get suppressed here

@apache apache deleted a comment from manishgupta88 Jan 2, 2019
@ravipesala
Copy link
Contributor

LGTM

1 similar comment
@kumarvishal09
Copy link
Contributor

LGTM

@asfgit asfgit closed this in bc1e944 Jan 4, 2019
asfgit pushed a commit that referenced this pull request Jan 21, 2019
…removing extra serialization

Fixed performance issue for Implicit filter column
1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task
2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient

This closes #3039
qiuchenjian pushed a commit to qiuchenjian/carbondata that referenced this pull request Jun 14, 2019
…removing extra serialization

Fixed performance issue for Implicit filter column
1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task
2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient

This closes apache#3039
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants