Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARCTIC-1057][AMS] Only load self-optimizing enabled tables into cache #1145

Merged
merged 27 commits into from
Mar 27, 2023

Conversation

XBaith
Copy link
Contributor

@XBaith XBaith commented Feb 23, 2023

Why are the changes needed?

This PR was inspired by related issue #1057 , but was not created for implementation.

AMS can not handle large scale Iceberg tables which has a certain number of small files/delete files. Some of tables in iceberg catalog are not optimizing enabled, so these tables are not nessessary for AMS to scan or load into memory.
I assume that self-optimizing disabled tables allocate a mount of memory and compare the GC times after remove them:

  • Load all valid tables:
    image
  • Only load self-optimizing tables:
    image

The below sreenshot lists number of instances and retained heap per class:
image

I also dump it and find that ConcurrentHashMap in OptimizeService retained too much heap

  • case 1: Too many snapshots entry
    image
    image

  • case 2: Too many delete files
    image

Brief change log

  • Reduce AMS memory
  • Avoid load/display useless tables for AMS optimize
  • Clean the tasks when self-optimizing disabled

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduces a new feature? (no)
  • If yes, how is the feature documented? (not documented)

@github-actions github-actions bot added module:ams-server Ams server module module:ams-dashboard Ams dashboard module labels Feb 23, 2023
@XBaith
Copy link
Contributor Author

XBaith commented Feb 23, 2023

#1057

@wangtaohz
Copy link
Contributor

Thanks for your contribution! I will help to review it ASAP.

@zhoujinsong
Copy link
Contributor

#1057

Thanks a lot for your contribution!
But I have also noticed that the PR title and description do not match the Arctic community requirement.
You can find the Arctic community requirement here:
https://docs.google.com/document/d/1Pwv1LflW3z5yq34qDpEYwSxOB3LIE5qoKQ_5Sv0IV1Y/edit?usp=sharing

@XBaith XBaith changed the title [Imporvement][Optimize] Only load self-optimizing enabled tables into cache [ARCTIC-1057][AMS] Only load self-optimizing enabled tables into cache Feb 24, 2023
@XBaith
Copy link
Contributor Author

XBaith commented Feb 24, 2023

Okay

XBaith and others added 2 commits February 27, 2023 14:23
# Conflicts:
#	ams/ams-server/src/main/java/com/netease/arctic/ams/server/optimize/OptimizeService.java
Copy link
Contributor

@wangtaohz wangtaohz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XBaith I left some comments, please take a look.

XBaith and others added 3 commits February 28, 2023 17:27
# Conflicts:
#	ams/ams-server/src/main/java/com/netease/arctic/ams/server/ArcticMetaStore.java
@XBaith XBaith requested a review from wangtaohz March 1, 2023 09:34
@codecov
Copy link

codecov bot commented Mar 2, 2023

Codecov Report

Patch coverage has no change and project coverage change: +23.41 🎉

Comparison is base (0e5140e) 29.27% compared to head (76231da) 52.68%.

Additional details and impacted files
@@              Coverage Diff              @@
##             master    #1145       +/-   ##
=============================================
+ Coverage     29.27%   52.68%   +23.41%     
+ Complexity     5374      523     -4851     
=============================================
  Files           695       43      -652     
  Lines         70786     3705    -67081     
  Branches       8180      354     -7826     
=============================================
- Hits          20723     1952    -18771     
+ Misses        48058     1623    -46435     
+ Partials       2005      130     -1875     
Flag Coverage Δ
core ?
trino 52.68% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 652 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@XBaith XBaith closed this Mar 6, 2023
@XBaith XBaith reopened this Mar 6, 2023
# Conflicts:
#	ams/ams-server/src/main/java/com/netease/arctic/ams/server/optimize/OptimizeService.java
@XBaith
Copy link
Contributor Author

XBaith commented Mar 7, 2023

Hi @wangtaohz, change complete , please review if you have time.

@wangtaohz
Copy link
Contributor

It's a bit difficult to understand disableSelfOptimizing and enableSelfOptimizing, since they are not doing what the method name describes.

I suggest it to be:

validTables(or cachedTables) contains two collections:

  • optimizeTables
  • unOptimizeTables

and the process refreshAndListTables contains 4 steps:

  • add new tables: add tables not in validTables into optimizeTables or unOptimizeTables
  • clear tables: remove tables from validTables (optimizeTables or unOptimizeTables)
  • enable optimize: move tables from unOptimizeTables to optimizeTables
  • disable optimize: move tables from optimizeTables to unOptimizeTables

@zhoujinsong zhoujinsong merged commit 55a7aa3 into apache:master Mar 27, 2023
baiyangtx pushed a commit that referenced this pull request Mar 28, 2023
#1145)

* [Optimize] Only load self-optimizing enabled tables into cache

* checkstyle

* fix log level

* refactor code base on the review

* load un-optimized table into db

* [Optimize] Only load self-optimizing enabled tables into cache

* checkstyle

* fix log level

* refactor code base on the review

* import CatalogLoader

* checkstyle

* avoid insert duplicate record into sysdb

* [WAP] remove table from unOptimizeTables when clear table

* rewrite base on code review

* rewrite base on code review

---------

Co-authored-by: Xavier Bai <xuba@cisco.com>
Co-authored-by: luting <1004611953@qq.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>
zhoujinsong added a commit that referenced this pull request May 31, 2023
#1145)

* [Optimize] Only load self-optimizing enabled tables into cache

* checkstyle

* fix log level

* refactor code base on the review

* load un-optimized table into db

* [Optimize] Only load self-optimizing enabled tables into cache

* checkstyle

* fix log level

* refactor code base on the review

* import CatalogLoader

* checkstyle

* avoid insert duplicate record into sysdb

* [WAP] remove table from unOptimizeTables when clear table

* rewrite base on code review

* rewrite base on code review

---------

Co-authored-by: Xavier Bai <xuba@cisco.com>
Co-authored-by: luting <1004611953@qq.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>
@XBaith XBaith deleted the reduce-memo branch December 25, 2023 06:10
ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
apache#1145)

* [Optimize] Only load self-optimizing enabled tables into cache

* checkstyle

* fix log level

* refactor code base on the review

* load un-optimized table into db

* [Optimize] Only load self-optimizing enabled tables into cache

* checkstyle

* fix log level

* refactor code base on the review

* import CatalogLoader

* checkstyle

* avoid insert duplicate record into sysdb

* [WAP] remove table from unOptimizeTables when clear table

* rewrite base on code review

* rewrite base on code review

---------

Co-authored-by: Xavier Bai <xuba@cisco.com>
Co-authored-by: luting <1004611953@qq.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ams-dashboard Ams dashboard module module:ams-server Ams server module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants