Skip to content

[opt](file-meta-cache) reduce file meta cache size and disable cache for some cases#32340

Merged
morningman merged 1 commit intoapache:masterfrom
morningman:reduce_file_meta_cache
Mar 18, 2024
Merged

[opt](file-meta-cache) reduce file meta cache size and disable cache for some cases#32340
morningman merged 1 commit intoapache:masterfrom
morningman:reduce_file_meta_cache

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Mar 17, 2024

Proposed changes

File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

  1. Add a new method exceed_prune_limit() for CachePolicy
    For ObjLRUCache, it always return true so that the minor of full gc on BE will prune the cache each time.

  2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

  3. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 17, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38705 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5501fa18588c0af7fcc353da466a681237de3404, data reload: false

------ Round 1 ----------------------------------
q1	17671	4584	4130	4130
q2	2024	151	147	147
q3	10612	1132	929	929
q4	7775	753	767	753
q5	7496	2781	2756	2756
q6	188	123	121	121
q7	1190	844	819	819
q8	9332	2039	1995	1995
q9	7232	6446	6455	6446
q10	8542	3567	3678	3567
q11	433	222	219	219
q12	645	306	292	292
q13	17798	2914	2867	2867
q14	287	251	247	247
q15	498	458	450	450
q16	501	397	399	397
q17	963	569	545	545
q18	7259	6461	6462	6461
q19	3466	1459	1482	1459
q20	551	286	283	283
q21	6312	3513	3543	3513
q22	354	317	309	309
Total cold run time: 111129 ms
Total hot run time: 38705 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4120	4095	4058	4058
q2	321	219	215	215
q3	3016	2896	2863	2863
q4	1896	1549	1654	1549
q5	5223	5247	5294	5247
q6	193	116	119	116
q7	2287	1833	1892	1833
q8	3141	3315	3309	3309
q9	8548	8581	8553	8553
q10	3752	3676	3686	3676
q11	537	443	437	437
q12	737	563	579	563
q13	16921	2854	2866	2854
q14	273	251	255	251
q15	489	443	441	441
q16	456	414	411	411
q17	1738	1499	1459	1459
q18	7586	7065	7280	7065
q19	1618	1517	1469	1469
q20	1921	1708	1734	1708
q21	4790	4776	4703	4703
q22	521	443	450	443
Total cold run time: 70084 ms
Total hot run time: 53223 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 34.94% (8576/24542)
Line Coverage: 26.66% (69544/260818)
Region Coverage: 25.94% (36114/139205)
Branch Coverage: 22.89% (18446/80582)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5501fa18588c0af7fcc353da466a681237de3404_5501fa18588c0af7fcc353da466a681237de3404/report/index.html

Copy link
Contributor

@xinyiZzz xinyiZzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 6a36d61 into apache:master Mar 18, 2024
morningman added a commit to morningman/doris that referenced this pull request Mar 18, 2024
…for some cases (apache#32340)

File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
morningman added a commit to morningman/doris that referenced this pull request Mar 18, 2024
…for some cases (apache#32340)

File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
morningman added a commit to morningman/doris that referenced this pull request Mar 18, 2024
pick part of apache#32340

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

3. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
morningman added a commit that referenced this pull request Mar 18, 2024
pick part of #32340

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

3. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
yiguolei pushed a commit that referenced this pull request Mar 21, 2024
…for some cases (#32340)

File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
yiguolei pushed a commit that referenced this pull request Mar 21, 2024
…for some cases (#32340)

File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
@xiaokang xiaokang mentioned this pull request Mar 22, 2024
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
…ache#32367)

pick part of apache#32340

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

3. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.7-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants