Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-1951] Support parallelized planning in one optimizer group #2282

Merged
merged 34 commits into from
Dec 5, 2023

Conversation

majin1102
Copy link
Contributor

Why are the changes needed?

Close #1951 .

Brief change log

  • Support parallelized planning by a parameter named as maxPlanningParallelism
  • Refactor OptimizingQueue and DefaultOptimizingService
  • Refactor suspending optimizer checker by using DelayedQueue for better SLA
  • add two other global parameters for optimizing:
  private final long pollingTimeout; // optimizing max waiting time when polling, default 3000ms
  private final long minPlanningInterval; // min table planning interval, default 60 seconds 
  • add unit tests for OptimizingQueue and DefaultOptimizingService

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (under Configurations)

majin1102 and others added 6 commits October 31, 2023 19:24
# Conflicts:
#	ams/server/src/main/java/com/netease/arctic/server/ArcticManagementConf.java
#	ams/server/src/main/java/com/netease/arctic/server/DefaultOptimizingService.java
#	ams/server/src/main/java/com/netease/arctic/server/optimizing/OptimizingQueue.java
#	ams/server/src/main/java/com/netease/arctic/server/optimizing/SchedulingPolicy.java
#	ams/server/src/main/java/com/netease/arctic/server/optimizing/TaskRuntime.java
#	ams/server/src/main/java/com/netease/arctic/server/persistence/mapper/OptimizingMapper.java
#	ams/server/src/main/java/com/netease/arctic/server/resource/OptimizerInstance.java
#	ams/server/src/main/java/com/netease/arctic/server/table/DefaultTableService.java
#	ams/server/src/test/java/com/netease/arctic/server/optimizing/TestOptimizingQueue.java
#	ams/server/src/test/java/com/netease/arctic/server/table/TableServiceTestBase.java
merge from newest master
@github-actions github-actions bot added module:ams-server Ams server module module:ams-dashboard Ams dashboard module labels Nov 10, 2023
majin1102 added 3 commits November 21, 2023 16:50
# Conflicts:
#	ams/server/src/main/java/com/netease/arctic/server/optimizing/OptimizingQueue.java
#	ams/server/src/main/java/com/netease/arctic/server/optimizing/TaskRuntime.java
Copy link

codecov bot commented Nov 21, 2023

Codecov Report

Attention: 63 lines in your changes are missing coverage. Please review.

Comparison is base (365100a) 52.97% compared to head (e6efd97) 53.52%.

Files Patch % Lines
...ease/arctic/server/optimizing/OptimizingQueue.java 80.00% 15 Missing and 8 partials ⚠️
...etease/arctic/server/DefaultOptimizingService.java 88.32% 13 Missing and 3 partials ⚠️
...ase/arctic/server/optimizing/SchedulingPolicy.java 63.63% 7 Missing and 5 partials ⚠️
...etease/arctic/server/resource/OptimizerThread.java 66.66% 5 Missing ⚠️
...rctic/server/persistence/StatedPersistentBase.java 84.61% 4 Missing ⚠️
.../netease/arctic/server/optimizing/TaskRuntime.java 91.66% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2282      +/-   ##
============================================
+ Coverage     52.97%   53.52%   +0.55%     
- Complexity     4311     4359      +48     
============================================
  Files           516      517       +1     
  Lines         29805    29831      +26     
  Branches       2902     2903       +1     
============================================
+ Hits          15789    15968     +179     
+ Misses        12738    12576     -162     
- Partials       1278     1287       +9     
Flag Coverage Δ
core 54.01% <83.76%> (+0.66%) ⬆️
trino 50.93% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CLAassistant
Copy link

CLAassistant commented Nov 22, 2023

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


majin1102 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@HuangFru HuangFru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it with max-planning-parallelism 1 and 5. Works fine.

@HuangFru
Copy link
Contributor

There is a very important point here. In iceberg native plan logic, multi-threaded plan is used by default. And this multi-threaded thread pool is still public (including commit and other operations will also use this thread pool). So when a large table (which may have unhealthy metadata) takes up a lot of resources during planning. Even if multiple tables are allowed to plan at the same time, the plan speed of other tables will become very slow, and even the speed of commit will be greatly affected. This PR has not yet solved this problem.

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, please take another look.
The code style seems to be incorrect, you can follow this guide to correct it:https://github.com/NetEase/amoro/blob/master/CONTRIBUTING.md#code-formatting

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majin1102 I left some comments, please task a look.

Copy link
Contributor

@zhongqishang zhongqishang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majin1102 I left some comments, PTAL.

And I tested the following scenario and it didn't work properly:
When AMS is restarted in the optimizing running state, the execution of unfinished tasks is not resumed.

@github-actions github-actions bot added type:docs Improvements or additions to documentation module:core Core module labels Dec 4, 2023
majin1102 added 2 commits December 4, 2023 16:37
# Conflicts:
#	ams/server/src/main/java/com/netease/arctic/server/optimizing/OptimizingQueue.java
#	ams/server/src/main/java/com/netease/arctic/server/optimizing/SchedulingPolicy.java
Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@zhongqishang zhongqishang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@majin1102 majin1102 merged commit add2836 into apache:master Dec 5, 2023
7 checks passed
@majin1102 majin1102 deleted the concurrent-plan branch December 5, 2023 07:13
ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
…ache#2282)

* [AMORO-1951] Support parallelized planning in one optimizer group

* [AMORO-1951] add unit test for OptimizingQueue and DefaultOptimizingService

* [AMORO-1951] optimize default parameters

* fix bugs

* fix warnings and spotless issues

* merge from apache#2290

* add apache license and fix spotless

* fix config error

* Update ams/server/src/main/java/com/netease/arctic/server/DefaultOptimizingService.java

Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>

* add annotations

* fix compile errors

* fix import problem

* remove isDebugEnabled()

* spotless apply

* Update ArcticManagementConf.java

* fix reboot bug and supply document content

* use MoreObjects.toStringHelper for OptimizerThread.java

* Merged from [AMORO-2376] Print right log info after calculating and sorting tables

* fix import problem

* remove unused codes

* spotless

* remove incorrect comments

* add max-planning-parallelism to config

---------

Co-authored-by: majin1102 <majin1102@163.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>
zhoujinsong added a commit that referenced this pull request Feb 22, 2024
* [AMORO-1812] support spark-based external optimizer

* resolve code style error

* [AMORO-1951] Support parallelized planning in one optimizer group (#2282)

* [AMORO-1951] Support parallelized planning in one optimizer group

* [AMORO-1951] add unit test for OptimizingQueue and DefaultOptimizingService

* [AMORO-1951] optimize default parameters

* fix bugs

* fix warnings and spotless issues

* merge from #2290

* add apache license and fix spotless

* fix config error

* Update ams/server/src/main/java/com/netease/arctic/server/DefaultOptimizingService.java

Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>

* add annotations

* fix compile errors

* fix import problem

* remove isDebugEnabled()

* spotless apply

* Update ArcticManagementConf.java

* fix reboot bug and supply document content

* use MoreObjects.toStringHelper for OptimizerThread.java

* Merged from [AMORO-2376] Print right log info after calculating and sorting tables

* fix import problem

* remove unused codes

* spotless

* remove incorrect comments

* add max-planning-parallelism to config

---------

Co-authored-by: majin1102 <majin1102@163.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>

* [AMORO-2378] The optimizer based on Flink on YARN should prioritize loading the optimizer-job.jar (#2379)

* load optimizer jar first

* fix code style

* change config name

* add config taskmanager.memory.managed.fraction

* fix

* [AMORO-2222] [Improvement]: Skip cleaning up dangling delete files for Iceberg V1 table (#2361)

* [AMORO-2222] [Improvement]: Skip cleaning up dangling delete files for Iceberg V1 table

* Update IcebergTableMaintainer.java

The `total-delete-files` could be 0.

---------

Co-authored-by: wangtaohz <103108928+wangtaohz@users.noreply.github.com>

* [AMORO-2404] fix Mixed Hive table mistakenly deletes hive files during expiring snapshots (#2405)

get hive locations return the uri path

* [AMORO-2407] Fix access data file from dashboard of non-partitioned table (#2408)

* fix null partition

* fix listing files of non-partitioned iceberg table

* [AMORO-2383] Add serialVersionUID to RewriteFilesInput (#2384)

* add serialVersionUID

* fix comment

* [AMORO-1720] Fix Mixed Format KeyedTable expiring all the snapshots with optimized sequence (#2394)

* should not expire the latest snapshot contains optimized sequence

* add visible for testing

* add fetchLatestNonOptimizedSnapshotTime for base store

* get hive locations return the uri path

* refactor codes and fix comments

* improve for exclude files is empty for expring snapshots

---------

Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>

* [AMORO-2386][AMS] Configure `iceberg.worker.num-threads` in the config.yaml (#2393)

* [AMORO-2386][AMS] Configure `iceberg.worker.num-threads` in the config.yaml

* Fix

* [AMORO-2386][AMS] reuse config `table-manifest-io.thread-count` and reuse thread pool

* Add comment

* [AMORO-1716] [Improvement]: sort the table list returned by server (#2362)

* improve: sort the table list returned by server

* optimize: sort tables by format

* optimize: optimiz tables sorting

* style: udpate comment

---------

Co-authored-by: chenyuzhi <chenyuzhi@corp.netease.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>

* [HotFix] Re-add table-filter to Server ExternalCatalog (#2310)

* re add table filter

* implement in external catalog

* add ut case

* fix comment

* fix comment

* fix comment

* fix ut

* fix update properties

* roll back the engine side's filter

* resolve conflicts

* add ut

---------

Co-authored-by: baiyangtx <xiangnebula@163.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>

* [AMORO-2299]: Cancel the running optimizing process from ams web console (#2297)

* cancel the running opimizing process from ams web console

* refact code to avoid NPE

* add o comment for com.netease.arctic.server.table.TableService#getServerTableIdentifier

* change the cancel post api to be more restful style

* [AMORO-2415] Print GC date stamps  (#2416)

add gc timestamp

* Update wrong comments in SnapshotsExpiringExecutor.java (#2422)

* [AMORO-2276]: UnifiiedCatalog for Spark Engine (#2269)

* Add UnifiedSparkCatalog under spark common module
* Extract MixedSparkCatalogBase and MixedSparkSessionCatalogBase to spark common module
* Refactor spark unit test framework to adapt unifed catalog tests and mixed format tests.

* [AMORO-2261] Extract the deleting dangling files from the cleaning orphan files (#2403)

* [Improvement]: Extract the deleting dangling files from the cleaning orphan files

* [Improvement]: Extract the deleting dangling files from the cleaning orphan files

* [Improvement]: Extract the deleting dangling files from the cleaning orphan files

* [AMORO-1341] [Flink]: Support UnifiedCatalog to contain Mixed format table in Flink Engine (#2370)

* [AMORO-1341] [Flink]: Support UnifiedCatalog to contain Mixed format table in Flink Engine

* [AMORO-2413] Need to select the first db after switching to another Catalog (#2419)

* fix: If the current catalog is not the one in the query, the first db is selected by default.

* build dashboard frontend

---------

Co-authored-by: wangtao <wangtao3@corp.netease.com>

* [HotFix] Fix loading the optimizing snapshot id of change store for Mixed Format KeyedTable (#2430)

fix load target change snapshot id

* [AMORO-2260] Show the format version of iceberg table (#2425)

[AMORO-2260] Show the format version of Iceberg Table

Signed-off-by: tcodehuber <tcodehuber@gmail.com>

* [AMORO-2115] Support displaying Optimizing tasks (#2322)

* dashboard: rename optimized to optimizing

* dashboard: support optimizing taskes

* add optimizer token

* dashboard: modify column width

* dashboard: build

* sort the metrics field and change record cnt to long

* modify MetricsSummary Compatibility

* dashbard: build

* Update ams/server/src/main/java/com/netease/arctic/server/optimizing/TaskRuntime.java

Co-authored-by: Qishang Zhong <zhongqishang@gmail.com>

* fix

* support input metrics and output metrics for optimizing process

* dashboard: support optimizing metrics

* dashbard: build

* dashboard:rebuild

* support MetricsSummary to map

* optimizing task supports input output

* dashboard: optimizing tasks support input and output

* dashboard: not display seconds when longer than 1 hour

* dashboard: optimizing process show summary

* remove useless import

* dashboard: build

* as head

* dashbard: build

* change process status to CLOSED after cancel process

* remove useless log

* dashboard: refresh after cancelled

* support cancel optimizing tasks

* dashboard: handle exception when can't cancel optimizing process

* throw exception when can't cancel optimizing process

* dashboard: build

* dashboard: refresh optimizing process when exist optimizing detail page

* dashboard: build

* fix cost time is 0ms

* change metrics name

* fix task startTime and endTime

* fix costTime

* using Preconditions.checkArgument

* fix task reset

* add comments

* cancel tasks before closing optimizing process

* fix unit test

* fix cancel task

* as head

* Revert "as head"

This reverts commit e469e71.

* dashboard: build

---------

Co-authored-by: Qishang Zhong <zhongqishang@gmail.com>

* [AMORO-2385] Make the maximum input file size for per optimize thread configurable (#2387)

* add config self-optimizing.max-input-file-size-per-thread

* add doc

* add resource group property max-input-file-size-per-thread

* add doc

* fix compile

* [Hotfix] Add database filter to Server ExternalCatalog (#2414)

* [Hotfix] Add database filter to Server ExternalCatalog

* [Hotfix] Add database filter to Server ExternalCatalog

* Rename config database.filter-regular-expression to database-filter

---------

Co-authored-by: baiyangtx <xiangnebula@163.com>

* [AMORO-2423] [Flink]: Using 'mixed_iceberg' and 'mixed_hive' indentifier to CREATE CATALOG and deprecate 'arctic' identifier (#2424)

* [AMORO-2423] [Flink]: Using 'mixed_iceberg' and 'mixed_hive' identifiers to CREATE CATALOG and deprecate 'arctic' identifier

* [AMORO-2316] The Files page supports filtering by partition name and sorting by dictionary value. (#2420)

* AMORO-2316: The Files page supports filtering by partition name and sorting by dictionary value.

* build dashboard frontend

* dashboard: build

---------

Co-authored-by: wangtao <wangtao3@corp.netease.com>

* [AMORO-1892] Improve the SQL Shortcuts in Terminal Web UI (#2434)

* [AMORO-1892] Improve the SQL Shortcuts in Terminal Web UI

* refactor some code

---------

Signed-off-by: tcodehuber <tcodehuber@gmail.com>

* [AMORO-2440] Fix the batch deletion of Change Store files for Mixed Format Table (#2439)

fix remove change files

* [AMORO-2418] Exclude kryo dependency from flink-optimizer (#2437)

exclude kryo

* [AMORO-2441] Fix `TableEntriesScan` when no file format is specified in the file name (#2442)

* fix TableEntriesScan without format suffix

* using the file format from entries

* [Hotfix] Fix fetchLatestNonOptimizedSnapshotTime (#2396)

* fix fetchLatestNonOptimizedSnapshotTime

* fix

* spotless

* fix ut

* rename data-expire.since

* using UTC zone for snapshot timestamp

* resolve conflict

* rerview

* spotless

* review

* review

* [AMORO-2344] [Flink]: Support UnifiedCatalog to contain Iceberg format table in Flink Engine (#2427)

* [AMORO-2330] Improve major plan (#2332)

* [AMORO-2330][AMS] Improve major plan

* [AMORO-2330][AMS] remove filter dataFileWith1Pos

* Fix comments

* Fix

* Move the rollback logic of undersized segment to the split task stage

* Rename

* Fix

* Rollback method name `fileShouldRewrite`

* Rollback mixed table format full trigger condition & reuse `isUndersizedSegmentFile`

* Rollback testSegmentFilesBase()

* TreeNodeTaskSplitter logic keep same with bin-pack

* Improve code duplicate

---------

Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>

* [AMORO-1810] Check the validity of the heatbeat interval when an opti… (#2432)

* [AMORO-1810] Check the validity of the heatbeat interval when an optimizer start

* adjust the import way

* resolve some logic code

* refactor code

* refactor code

* fix ut error

* resolve ut error

* fix ut error

* fix ut error

* fix ut error

* fix ci error

* refactor code

* refactor code

* refactor code

* [AMORO-1812] support spark-based external optimizer

* resolve code style error

* refactor code

* refactor code

* bugfix

* refactor code

* refactor code

* code style

* bugfix

* bugfix

---------

Signed-off-by: tcodehuber <tcodehuber@gmail.com>
Co-authored-by: JinMat <majin1102@gmail.com>
Co-authored-by: majin1102 <majin1102@163.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>
Co-authored-by: wangzeyu <hameizi369@gmail.com>
Co-authored-by: ConradJam <jam.gzczy@gmail.com>
Co-authored-by: wangtaohz <103108928+wangtaohz@users.noreply.github.com>
Co-authored-by: yeatsliao <liaoyt66066@gmail.com>
Co-authored-by: Qishang Zhong <zhongqishang@gmail.com>
Co-authored-by: chenyuzhi459 <553673833@qq.com>
Co-authored-by: chenyuzhi <chenyuzhi@corp.netease.com>
Co-authored-by: HuangFru <68625618+HuangFru@users.noreply.github.com>
Co-authored-by: baiyangtx <xiangnebula@163.com>
Co-authored-by: xujiangfeng001 <104614523+xujiangfeng001@users.noreply.github.com>
Co-authored-by: Xianxun Ye <yesorno828423@gmail.com>
Co-authored-by: liuweimin <minteliu.l@gmail.com>
Co-authored-by: wangtao <wangtao3@corp.netease.com>
Co-authored-by: Xavier Bai <xuba@cisco.com>
@zhoujinsong zhoujinsong mentioned this pull request Jun 25, 2024
66 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ams-dashboard Ams dashboard module module:ams-server Ams server module module:core Core module type:docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement]: Support parallelized planning in one optimizer group
8 participants