[CARBONDATA-3523] Store data file size into index file #3356

QiangCai · 2019-08-13T02:42:32Z

Store data file size into the index file
[Background]
In BlockIndex, the file_size is always zero. We can set the actual value during data loading and use it during the query to improve the scenario of disaggregated compute and storage.
[Benefi]
For cloud and local table, it will help to improve the driver performance of the first query.

avoid invoking listFiles for each segment
avoid invoking getFileStatus for each data file

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

CarbonDataQA · 2019-08-13T03:00:30Z

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/317/

CarbonDataQA · 2019-08-13T04:01:30Z

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/313/

CarbonDataQA · 2019-08-13T04:08:19Z

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/313/

jackylk · 2019-09-23T14:28:43Z

retest this please

CarbonDataQA · 2019-09-23T14:46:59Z

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/552/

CarbonDataQA · 2019-09-23T16:12:33Z

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/553/

CarbonDataQA · 2019-09-23T16:17:57Z

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/556/

jackylk · 2019-09-25T15:53:26Z

LGTM

In BlockIndex, the file_size is always zero. We can set the actual value during data loading and use it during the query to improve the query performance. 1. avoid invoking listFiles for each segment 2. avoid invoking getFileStatus for each data file This closes apache#3356

In BlockIndex, the file_size is always zero. We can set the actual value during data loading and use it during the query to improve the query performance. 1. avoid invoking listFiles for each segment 2. avoid invoking getFileStatus for each data file This closes #3356

store data file size into index file

8a64809

QiangCai changed the title ~~[WIP] Store data file size into index file~~ [CARBONDATA-3523] Store data file size into index file Sep 23, 2019

asfgit closed this in 64a574e Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-3523] Store data file size into index file #3356

[CARBONDATA-3523] Store data file size into index file #3356

QiangCai commented Aug 13, 2019

CarbonDataQA commented Aug 13, 2019

CarbonDataQA commented Aug 13, 2019

CarbonDataQA commented Aug 13, 2019

jackylk commented Sep 23, 2019

CarbonDataQA commented Sep 23, 2019

CarbonDataQA commented Sep 23, 2019

CarbonDataQA commented Sep 23, 2019

jackylk commented Sep 25, 2019

[CARBONDATA-3523] Store data file size into index file #3356

[CARBONDATA-3523] Store data file size into index file #3356

Conversation

QiangCai commented Aug 13, 2019

CarbonDataQA commented Aug 13, 2019

CarbonDataQA commented Aug 13, 2019

CarbonDataQA commented Aug 13, 2019

jackylk commented Sep 23, 2019

CarbonDataQA commented Sep 23, 2019

CarbonDataQA commented Sep 23, 2019

CarbonDataQA commented Sep 23, 2019

jackylk commented Sep 25, 2019