New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP:[CARBONDATA-2785][ExternalFormat] Optimize table pruning info for pruning by segment #2564
WIP:[CARBONDATA-2785][ExternalFormat] Optimize table pruning info for pruning by segment #2564
Conversation
1. create csv based carbon table using CREATE TABLE fact_table (col1 bigint, col2 string, ..., col100 string) STORED BY 'CarbonData' TBLPROPERTIES( 'foramt'='csv', 'csv.delimiter'=',', 'csv.header'='col1,col2,col100') 2. Load data to this table using ALTER TABLE fact_table ADD SEGMENT LOCATION 'path/to/data1' This closes apache#2374
In previous implementation, table pruning is performed once for all segments. In this case, the pruning info is updated after default/CG/FG datamap pruning. Now we want the table pruning to be performed segment by segment, so in this case, the pruning info will be updated after default/CG/FG datamap pruning for each segment. This means that we need to accumulate the pruning info during pruning.
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7524/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6278/ |
retest sdv please |
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6051/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7641/ |
null, updateStatusManager); | ||
List<InputSplit> splits = new ArrayList<>(); | ||
for (Segment segment : filteredSegmentToAccess) { | ||
List<InputSplit> splitsPerSegment = getSplitsForSegment(job, filterInterface, segment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is func called getSplitsOfOneSegment, can you change both func name to make it more readable
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6383/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6198/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7825/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6550/ |
In previous implementation, table pruning is performed once for all
segments. In this case, the pruning info is updated after default/CG/FG
datamap pruning.
Now we want the table pruning to be performed segment by segment, so in
this case, the pruning info will be updated after default/CG/FG datamap
pruning for each segment.
This means that we need to accumulate the pruning info during pruning.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
Any interfaces changed?
Only internal interfaces changed
Any backward compatibility impacted?
NO
Document update required?
NO
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
NO, only optimize procedure
- How it is tested? Please attach test report.
Tested in local
- Is it a performance related change? Please attach the performance test report.
NO
- Any additional information to help reviewers in testing this change.
NO
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA