Implement cluster shrink (2nd phase) by whitehawk · Pull Request #2247 · arenadata/gpdb

whitehawk · 2026-02-16T03:03:42Z

Implement cluster shrink (2nd phase)

List of changes:

Add support for redistribution of materialized views, external writable
tables, partitioned tables, unlogged tables. Skip processing of temp tables.
It is done to comply with the requirements.
Add checks that the database and the table exists before we actually start
to rebalance the table. It is needed as one could drop it in parallel after we
have created the rebalance table list.
Add retry logic into table rebalance worker. It is needed, when for ex.,
other session opens a transaction after we have created the rebalance table
list, drops the table before we started to rebalance it, and commits the
transaction when we started to rebalance the table (and are hanging on the
table's locks).
Change the order of shrunk segment processes stopping. Now mirrors are
stopped strictly after primaries in order to avoid hanging replication
processes.
Do not stop the tool execution in case we couldn't stop some of the shrinked
segments. Now we only emit a warning. It is done to comply with the
requirements.
Rework fault injection when stopping a segment due to the item above, as now
we will not stop in case of an exception inside the 'SegmentStopAfterShrink'
worker. So now, when a fault is injected, send SIGINT to the ggrebalance
process to halt its work.
Improve logging inside 'SegmentStopAfterShrink'.
Remove not used flag 'needs_repopulate'.
Add new behave test cases and update old ones to cover the new functionality.
Add new behave step definitions to support the updates in the tests.
Fix behave test steps for view/matview creation - they opened a connection,
but didn't use it. Instead, they tried to use the connection from the context,
which was not properly configured.
Update code in the behave utils to support new test step definitions for
materialized views and unlogged tables.
Add into the fault injector the ability to suspend execution instead of
crashing it.

…gging

whitehawk · 2026-02-19T10:07:20Z

2nd to perform 'REFRESH MATERIALIZED VIEW'.

Why is just rebalancing not enough? does gpexpand refresh mat view after expanding?

After f2f discussion, we need to evaluate CTAS approach for mat views, as current approach can have potential issues with race condition, if one mat view depends on another mat view. Created GG-225.

KnightMurloc · 2026-02-19T10:24:28Z

Skip processing of temp tables

Why? If the database is still available to users during shrink, what will be the state of their temporary tables after that?

bimboterminator1 · 2026-02-19T10:51:42Z

Created GG-225.

Current approach won't be cut off for now?

gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py

gpMgmt/bin/gprebalance_modules/shrink.py

gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py

Problem description: Before this patch, in order to rebalance a materialized view, 2 steps were required: the actual rebalance where distribution policy was updated, and the refresh step to update the data in the materialized view. This approach had 2 problems with respect to usage in 'ggrebalance' tool for cluster shrink: 1. It could change the actual data in the materialized view before the cluster shrink, and after the shrink, if the view was not up-to-date. We intend to keep the logical data in the cluster not altered. 2. If a materialized view depends on another materialized view, there could be a race condition when doing the refresh, when we try to refresh based on the yet-not-refreshed one. Fix: Use the CTAS approach from the EXPAND TABLE specifically when we are rebalancing a materialized view. It creates a temp table with a correct distribution policy, where all data from the materialized view is copied, and then the relfilenode of the materialized view is swapped with the temp table. It keeps the data as it was before the rebalance, even if it was not up-to-date (therefore we will not surprise the user with the not expected view content), and it eliminates dependencies on other objects besides the materialized view itself. (cherry picked from commit 37dc7e7)

whitehawk · 2026-02-20T00:41:21Z

2nd to perform 'REFRESH MATERIALIZED VIEW'.

Why is just rebalancing not enough? does gpexpand refresh mat view after expanding?

After f2f discussion, we need to evaluate CTAS approach for mat views, as current approach can have potential issues with race condition, if one mat view depends on another mat view. Created GG-225.

I've updated handling of matviews. Now they do not require REFRESH step.

Please note that there are changes in src/backend/* and src/test/regress/*. They are presented here for convenience (to make changes in the shrink workable in this branch). They will be reviewed and commited in other PR (#2249) prior to this PR merging.

whitehawk · 2026-02-20T01:03:44Z

Skip processing of temp tables

Why? If the database is still available to users during shrink, what will be the state of their temporary tables after that?

According to requirements, we need to rebalance tables with "relpersistence = 'p' | relpersistence = 'u'". gpexpand also skips temp tables.
In normal workflow, by the end of the shrink procedure all sessions should be disconnected in order to stop shrunk segments, therefore in normal conditions all temp tables should not survive the shrink procedure anyway.

This reverts commit af75169.

gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py

gpMgmt/bin/gprebalance_modules/shrink.py

whitehawk added 12 commits February 10, 2026 12:57

Exclude temp tables from shrink + test unlogged tables

affd3c0

Support shrink of matviews

ae5133e

Add check for shrink of partitioned tables

ff134f9

Support ext writable tables in shrink

afec7ba

Rework matviews handling, add more table types into tests, improve lo…

11024d2

…gging

Add more interruption points into the test with cluster restart

8014899

Check table and db existence

1c3f8fe

Updates for mat views

0848321

Update segments stop procedure

7e1ecf2

Fix the case when table is dropped in a parallel transaction

643f838

Cosmetic changes

ad6430e

Merge branch 'feature/ADBDEV-6608' into GG-110

3ff91f0

whitehawk changed the title ~~Gg 110~~ Implement cluster shrink (2nd phase) Feb 18, 2026

whitehawk added 3 commits February 18, 2026 15:21

Fix tests

0c4487d

Remove redundant test

6fec958

Cosmetic changes

35bab85

whitehawk marked this pull request as ready for review February 18, 2026 05:48

This comment was marked as resolved.

Sign in to view

bimboterminator1 reviewed Feb 19, 2026

View reviewed changes

gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py Outdated Show resolved Hide resolved

KnightMurloc reviewed Feb 19, 2026

View reviewed changes

gpMgmt/bin/gprebalance_modules/shrink.py Outdated Show resolved Hide resolved

gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py Outdated Show resolved Hide resolved

whitehawk added 3 commits February 20, 2026 09:35

Remove refresh step for MV handling

cabac88

Reduce delta

aa00ef2

whitehawk added 2 commits February 20, 2026 11:59

Use existing test steps instead of new ones

793a963

Add timeout into wait for logs step

57e45a9

whitehawk added 3 commits February 20, 2026 14:52

Reduce delta accross files

cd8394d

Revert "Use CTAS approach for rebalancing the materialized view"

08481d6

This reverts commit af75169.

Merge branch 'feature/ADBDEV-6608' into GG-110

2718221

bimboterminator1 approved these changes Feb 24, 2026

View reviewed changes

KnightMurloc reviewed Feb 24, 2026

View reviewed changes

gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py Outdated Show resolved Hide resolved

gpMgmt/bin/gprebalance_modules/shrink.py Show resolved Hide resolved

Use stderr

f090cb9

KnightMurloc approved these changes Feb 25, 2026

View reviewed changes

bimboterminator1 approved these changes Feb 25, 2026

View reviewed changes

whitehawk merged commit a57039b into feature/ADBDEV-6608 Feb 25, 2026
1 check passed

whitehawk deleted the GG-110 branch February 25, 2026 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement cluster shrink (2nd phase)#2247

Implement cluster shrink (2nd phase)#2247
whitehawk merged 24 commits intofeature/ADBDEV-6608from
GG-110

whitehawk commented Feb 16, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

whitehawk commented Feb 19, 2026

Uh oh!

KnightMurloc commented Feb 19, 2026

Uh oh!

bimboterminator1 commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whitehawk commented Feb 20, 2026

Uh oh!

whitehawk commented Feb 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

whitehawk commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

whitehawk commented Feb 19, 2026

Uh oh!

KnightMurloc commented Feb 19, 2026

Uh oh!

bimboterminator1 commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whitehawk commented Feb 20, 2026

Uh oh!

whitehawk commented Feb 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

whitehawk commented Feb 16, 2026 •

edited

Loading