Skip to content

[GG-181] Use intermediate host for swap moves#2248

Merged
bimboterminator1 merged 13 commits intofeature/ADBDEV-6608from
GG-181
Mar 6, 2026
Merged

[GG-181] Use intermediate host for swap moves#2248
bimboterminator1 merged 13 commits intofeature/ADBDEV-6608from
GG-181

Conversation

@bimboterminator1
Copy link
Member

@bimboterminator1 bimboterminator1 commented Feb 18, 2026

Previously, primary and mirror could coexist at the same host during
execution of moves, where segments just swap their hosts. This violates
the HA rule for the whole cluster.

When the suboptimal rebalance plan requires swapping the locations
of a primary segment and its mirror, the planner now decomposes
this into three safe phases using an intermediate host to prevent
primary-mirror coexistence violations.

planner.py now detects swap moves in form_moves() and
chooses the appropriate 3rd host for mirror movement.
The search is performed based on available space, considering
other moves, host status, and on other swap counts.
Thus, plan, which previously looked like:


---------------------------------BALANCE MOVES----------------------------------
Total moves planned: 2

  [1] Move Segment(content=3, dbid=5, role=p) [254.73 MB]
      From: sdw1:7005 → /home/gpadmin/.data/primary/gpseg3
      To:   sdw2:7005 → /home/gpadmin/.data/primary/gpseg3

  [2] Move Segment(content=3, dbid=11, role=m) [190.44 MB]
      From: sdw2:7053 → /home/gpadmin/.data/mirror/gpseg3
      To:   sdw1:7053 → /home/gpadmin/.data/mirror/gpseg3

now expands into three moves


---------------------------------BALANCE MOVES----------------------------------
Total moves planned: 3

  [1] Move Segment(content=2, dbid=10, role=m) [190.45 MB]
      From: sdw2:7052 → /home/gpadmin/.data/mirror/gpseg2
      To:   sdw3:7052 → /home/gpadmin/.data/mirror/gpseg2

  [2] Move Segment(content=2, dbid=4, role=p) [254.74 MB]
      From: sdw1:7004 → /home/gpadmin/.data/primary/gpseg2
      To:   sdw2:7004 → /home/gpadmin/.data/primary/gpseg2

  [3] Move Segment(content=2, dbid=10, role=m) [190.45 MB]
      From: sdw3:7052 → /home/gpadmin/.data/mirror/gpseg2
      To:   sdw1:7054 → /home/gpadmin/.data/mirror/gpseg2

Moreover, available space check for intermediate host now uses
cached filesystem info. Thus, the ResourceEstimator class is refactored.
It's unit tests are adjusted.

Additionally some unit tests were fixed, because we've forgotten to check them
in previous patches.

Example of configuration with single swap for manual testing:

conf
QD_PRIMARY_ARRAY=cdw~cdw~7000~/home/gpadmin/.data/gpseg-1~1~-1~0
declare -a PRIMARY_ARRAY=(
sdw1~sdw1~7002~/home/gpadmin/.data/primary/gpseg0~2~0~11100
sdw1~sdw1~7003~/home/gpadmin/.data/primary/gpseg1~3~1~11110
sdw1~sdw1~7004~/home/gpadmin/.data/primary/gpseg2~4~2~11220
sdw2~sdw2~7003~/home/gpadmin/.data/primary/gpseg3~5~3~11350
sdw3~sdw3~7004~/home/gpadmin/.data/primary/gpseg4~6~4~11360
sdw3~sdw3~7005~/home/gpadmin/.data/primary/gpseg5~7~5~11370
)
declare -a MIRROR_ARRAY=(
sdw3~sdw3~7050~/home/gpadmin/.data/mirror/gpseg0~8~0~51130
sdw3~sdw3~7051~/home/gpadmin/.data/mirror/gpseg1~9~1~51140
sdw2~sdw2~7052~/home/gpadmin/.data/mirror/gpseg2~10~2~51160
sdw1~sdw1~7053~/home/gpadmin/.data/mirror/gpseg3~11~3~51160
sdw2~sdw2~7054~/home/gpadmin/.data/mirror/gpseg4~12~4~51200
sdw2~sdw2~7055~/home/gpadmin/.data/mirror/gpseg5~13~5~51136
)

@whitehawk

This comment was marked as resolved.

@whitehawk

This comment was marked as resolved.

Copy link
Member Author

@bimboterminator1 bimboterminator1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to see cases of cluster configuration, where several swaps are required

I couldn't come up with 3 host configuration where 2 swaps is required so far

@whitehawk

This comment was marked as resolved.

@bimboterminator1
Copy link
Member Author

Existing tests like

occured after suggested refactoring, fixed

@whitehawk

This comment was marked as resolved.

@whitehawk

This comment was marked as resolved.

# Try parent directory
parent_dir = TemplateParser.extract_parent_directory(directory)
if parent_dir in self.space_info_by_host[host_address]:
return self.space_info_by_host[host_address][parent_dir]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding explicit return at the end of the function:

Suggested change
return self.space_info_by_host[host_address][parent_dir]
return self.space_info_by_host[host_address][parent_dir]
return None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not yet addressed.

@whitehawk

This comment was marked as resolved.

@bimboterminator1
Copy link
Member Author

The issue still exists

I'll check it out.

@bimboterminator1
Copy link
Member Author

I'll check it out.

Now it's ok. rebalance_basics are adjusted in order to avoid demo cluster recreation each time

@whitehawk
Copy link

It looks that unit tests don't pass on latest code version:

log
======================================================================
ERROR: test_validate_complex_overlapping_filesystems_insufficient (gprebalance_modules.test.test_unit_resources.TestResourceEstimator)
Test multiple segments with overlapping filesystems - insufficient space
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/test/test_unit_resources.py", line 958, in test_validate_complex_overlapping_filesystems_insufficient
    estimator._validate_and_build_allocations(moves)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1448, in _validate_and_build_allocations
    issues = self._find_space_issues()
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1521, in _find_space_issues
    'target_dirs': sorted(fs_req.datadir_paths | fs_req.tablespace_paths),
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1270, in datadir_paths
    paths.add(item[0])
AttributeError: 'dict' object has no attribute 'add'

======================================================================
ERROR: test_validate_multiple_moves_same_filesystem_insufficient (gprebalance_modules.test.test_unit_resources.TestResourceEstimator)
Test validation aggregates space requirements on same filesystem
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/test/test_unit_resources.py", line 487, in test_validate_multiple_moves_same_filesystem_insufficient
    estimator._validate_and_build_allocations(moves)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1448, in _validate_and_build_allocations
    issues = self._find_space_issues()
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1521, in _find_space_issues
    'target_dirs': sorted(fs_req.datadir_paths | fs_req.tablespace_paths),
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1270, in datadir_paths
    paths.add(item[0])
AttributeError: 'dict' object has no attribute 'add'

======================================================================
ERROR: test_validate_multiple_tablespaces_different_filesystems (gprebalance_modules.test.test_unit_resources.TestResourceEstimator)
Test validation with multiple tablespaces on different filesystems
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/test/test_unit_resources.py", line 721, in test_validate_multiple_tablespaces_different_filesystems
    estimator._validate_and_build_allocations(moves)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1448, in _validate_and_build_allocations
    issues = self._find_space_issues()
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1521, in _find_space_issues
    'target_dirs': sorted(fs_req.datadir_paths | fs_req.tablespace_paths),
TypeError: unsupported operand type(s) for |: 'dict' and 'set'

======================================================================
ERROR: test_validate_no_double_counting_same_segment (gprebalance_modules.test.test_unit_resources.TestResourceEstimator)
Test that the same segment is not double-counted on same filesystem
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/test/test_unit_resources.py", line 1020, in test_validate_no_double_counting_same_segment
    estimator._validate_and_build_allocations(moves)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1448, in _validate_and_build_allocations
    issues = self._find_space_issues()
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1521, in _find_space_issues
    'target_dirs': sorted(fs_req.datadir_paths | fs_req.tablespace_paths),
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1270, in datadir_paths
    paths.add(item[0])
AttributeError: 'dict' object has no attribute 'add'

======================================================================
ERROR: test_validate_tablespace_same_filesystem_as_datadir_insufficient (gprebalance_modules.test.test_unit_resources.TestResourceEstimator)
Test when tablespace and datadir share the same filesystem - insufficient space
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/test/test_unit_resources.py", line 771, in test_validate_tablespace_same_filesystem_as_datadir_insufficient
    estimator._validate_and_build_allocations(moves)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1448, in _validate_and_build_allocations
    issues = self._find_space_issues()
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1521, in _find_space_issues
    'target_dirs': sorted(fs_req.datadir_paths | fs_req.tablespace_paths),
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1270, in datadir_paths
    paths.add(item[0])
AttributeError: 'dict' object has no attribute 'add'

======================================================================
ERROR: test_validate_tablespace_space_insufficient (gprebalance_modules.test.test_unit_resources.TestResourceEstimator)
Test validation fails when tablespace has insufficient space
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/test/test_unit_resources.py", line 666, in test_validate_tablespace_space_insufficient
    estimator._validate_and_build_allocations(moves)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1448, in _validate_and_build_allocations
    issues = self._find_space_issues()
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1521, in _find_space_issues
    'target_dirs': sorted(fs_req.datadir_paths | fs_req.tablespace_paths),
TypeError: unsupported operand type(s) for |: 'dict' and 'set'

======================================================================
ERROR: test_validate_target_space_insufficient (gprebalance_modules.test.test_unit_resources.TestResourceEstimator)
Test validation fails when insufficient space for datadir
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/test/test_unit_resources.py", line 433, in test_validate_target_space_insufficient
    estimator._validate_and_build_allocations(moves)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1448, in _validate_and_build_allocations
    issues = self._find_space_issues()
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1521, in _find_space_issues
    'target_dirs': sorted(fs_req.datadir_paths | fs_req.tablespace_paths),
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/gprebalance_modules/planner.py", line 1270, in datadir_paths
    paths.add(item[0])
AttributeError: 'dict' object has no attribute 'add'

----------------------------------------------------------------------
Ran 185 tests in 317.503s

FAILED (errors=7)
make: *** [Makefile:112: unitdevel] Error 1
make: Leaving directory '/home/gpadmin/gpdb_src/gpMgmt/bin'

@whitehawk
Copy link

Typo in the description:
'the ResourceEstimator cllass' -> 'the ResourceEstimator class'

@whitehawk
Copy link

Approving the PR, but please fix the couple of the remaining minor issues.

@bimboterminator1 bimboterminator1 merged commit 0c8cb58 into feature/ADBDEV-6608 Mar 6, 2026
1 check passed
@bimboterminator1 bimboterminator1 deleted the GG-181 branch March 6, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants