New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3217] Optimize implicit filter expression performance by removing extra serialization #3039
Conversation
1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task 2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2094/ |
Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10348/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2299/ |
blockletIds = new HashSet<>(); | ||
blockIdToBlockletIdMapping.put(blockId, blockletIds); | ||
} | ||
blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to catch the NumberFormatException for Integer.parseInt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required to catch the NumberFormatException. Blocklet Id is always expected to be an integer number and in the complete flow it is used as an integer. So if there is any exception in the conversion then the code problem is not here but from the other part of code. So if we handle the exception here the actual cause will get suppressed here
LGTM |
1 similar comment
LGTM |
…removing extra serialization Fixed performance issue for Implicit filter column 1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task 2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient This closes #3039
…removing extra serialization Fixed performance issue for Implicit filter column 1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task 2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient This closes apache#3039
Fixed performance issue for Implicit filter column
No
No
No
Added UT
NA