Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3523] Store data file size into index file #3356

Closed
wants to merge 1 commit into from

Conversation

QiangCai
Copy link
Contributor

Store data file size into the index file
[Background]
In BlockIndex, the file_size is always zero. We can set the actual value during data loading and use it during the query to improve the scenario of disaggregated compute and storage.
[Benefi]
For cloud and local table, it will help to improve the driver performance of the first query.

  1. avoid invoking listFiles for each segment
  2. avoid invoking getFileStatus for each data file

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/317/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/313/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/313/

@QiangCai QiangCai changed the title [WIP] Store data file size into index file [CARBONDATA-3523] Store data file size into index file Sep 23, 2019
@jackylk
Copy link
Contributor

jackylk commented Sep 23, 2019

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/552/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/553/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/556/

@jackylk
Copy link
Contributor

jackylk commented Sep 25, 2019

LGTM

@asfgit asfgit closed this in 64a574e Sep 25, 2019
QiangCai added a commit to QiangCai/carbondata that referenced this pull request Sep 29, 2019
In BlockIndex, the file_size is always zero. We can set the actual value during data loading and use it during the query to improve the query performance.

1. avoid invoking listFiles for each segment
2. avoid invoking getFileStatus for each data file

This closes apache#3356
asfgit pushed a commit that referenced this pull request Oct 4, 2019
In BlockIndex, the file_size is always zero. We can set the actual value during data loading and use it during the query to improve the query performance.

1. avoid invoking listFiles for each segment
2. avoid invoking getFileStatus for each data file

This closes #3356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants