Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP:[CARBONDATA-2785][ExternalFormat] Optimize table pruning info for pruning by segment #2564

Closed

Conversation

xuchuanyin
Copy link
Contributor

In previous implementation, table pruning is performed once for all
segments. In this case, the pruning info is updated after default/CG/FG
datamap pruning.

Now we want the table pruning to be performed segment by segment, so in
this case, the pruning info will be updated after default/CG/FG datamap
pruning for each segment.

This means that we need to accumulate the pruning info during pruning.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?
    Only internal interfaces changed

  • Any backward compatibility impacted?
    NO

  • Document update required?
    NO

  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    NO, only optimize procedure
    - How it is tested? Please attach test report.
    Tested in local
    - Is it a performance related change? Please attach the performance test report.
    NO
    - Any additional information to help reviewers in testing this change.
    NO

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA

1. create csv based carbon table using
CREATE TABLE fact_table (col1 bigint, col2 string, ..., col100 string)
STORED BY 'CarbonData'
TBLPROPERTIES(
  'foramt'='csv',
  'csv.delimiter'=',',
  'csv.header'='col1,col2,col100')

2. Load data to this table using
ALTER TABLE fact_table ADD SEGMENT LOCATION 'path/to/data1'

This closes apache#2374
In previous implementation, table pruning is performed once for all
segments. In this case, the pruning info is updated after default/CG/FG
datamap pruning.

Now we want the table pruning to be performed segment by segment, so in
this case, the pruning info will be updated after default/CG/FG datamap
pruning for each segment.

This means that we need to accumulate the pruning info during pruning.
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7524/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6278/

@brijoobopanna
Copy link
Contributor

retest sdv please

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6051/

@brijoobopanna
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7641/

null, updateStatusManager);
List<InputSplit> splits = new ArrayList<>();
for (Segment segment : filteredSegmentToAccess) {
List<InputSplit> splitsPerSegment = getSplitsForSegment(job, filterInterface, segment,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is func called getSplitsOfOneSegment, can you change both func name to make it more readable

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6383/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6198/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7825/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6550/

@xuchuanyin xuchuanyin changed the title [CARBONDATA-2785][ExternalFormat] Optimize table pruning info for pruning by segment WIP:[CARBONDATA-2785][ExternalFormat] Optimize table pruning info for pruning by segment Aug 25, 2018
@xuchuanyin xuchuanyin closed this Sep 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants