Skip to content

[ADBDEV-6857] Rollback handler#1265

Merged
bimboterminator1 merged 36 commits intofeature/ADBDEV-6608from
ADBDEV-6857
Mar 20, 2025
Merged

[ADBDEV-6857] Rollback handler#1265
bimboterminator1 merged 36 commits intofeature/ADBDEV-6608from
ADBDEV-6857

Conversation

@bimboterminator1
Copy link
Member

@bimboterminator1 bimboterminator1 commented Mar 9, 2025

This PR intoduces the rollback handler in gprebalance MVP. The rollback
function creates new plan of movements by calculating the difference between
current configuration and original state loaded from previously pickled plan.

How to test:
Generate imbalanced configuration either in docker or cloud as
described in #1236. Run gprebalance, interrupt it in the middle, or
wait until movements are completed. Then run gprebalance --rollback

@hilltracer hilltracer self-requested a review March 10, 2025 11:35
@hilltracer
Copy link

@bimboterminator1 is it expected behaviour after --rollback? How can I run rebalance again?

log
gpadmin@gpdb7u:~/src$ gprebalance -n 1
20250311:07:11:44:111672 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:07:11:44:111672 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 12:59:24 (with assert checking) Bhuvnesh C.'
20250311:07:11:44:111672 gprebalance:cdw:gpadmin-[INFO]:-Rebalance has already completed
20250311:07:11:44:111672 gprebalance:cdw:gpadmin-[INFO]:-If you want to rebalance again, run gprebalance -c to perform cleanup
20250311:07:11:44:111672 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@gpdb7u:~/src$ gprebalance -c
20250311:07:12:02:111712 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:07:12:02:111712 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 12:59:24 (with assert checking) Bhuvnesh C.'
20250311:07:12:02:111712 gprebalance:cdw:gpadmin-[INFO]:-Rebalance has already completed
20250311:07:12:02:111712 gprebalance:cdw:gpadmin-[INFO]:-If you want to rebalance again, run gprebalance -c to perform cleanup
20250311:07:12:02:111712 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@gpdb7u:~/src$ gprebalance -c -n 1
20250311:07:12:28:111783 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:07:12:28:111783 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 12:59:24 (with assert checking) Bhuvnesh C.'
20250311:07:12:28:111783 gprebalance:cdw:gpadmin-[INFO]:-Rebalance has already completed
20250311:07:12:28:111783 gprebalance:cdw:gpadmin-[INFO]:-If you want to rebalance again, run gprebalance -c to perform cleanup
20250311:07:12:28:111783 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

@hilltracer
Copy link

Please fix --help output about rollback. Now it prints

  -r, --rollback        remove the rebalance schema.

@bimboterminator1
Copy link
Member Author

is it expected behaviour after --rollback? How can I run rebalance again?

updated the branch

Please fix --help output about rollback. Now it prints

fixed

@hilltracer
Copy link

See first rollback after clearing. Is it expected behavior?

log
gpadmin@gpdb7u:~/src$ gprebalance -c
20250311:10:55:59:245343 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:10:55:59:245343 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:10:55:59:245343 gprebalance:cdw:gpadmin-[INFO]:-Dropping status file
20250311:10:55:59:245343 gprebalance:cdw:gpadmin-[INFO]:-Dropping rebalance schema
20250311:10:55:59:245343 gprebalance:cdw:gpadmin-[INFO]:-Dropping rebalance directory
20250311:10:55:59:245343 gprebalance:cdw:gpadmin-[INFO]:-Cleanup Finished.  exiting...
20250311:10:55:59:245343 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@gpdb7u:~/src$ gprebalance -r
20250311:10:56:04:245396 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:10:56:04:245396 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:10:56:04:245396 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be 
using more hosts than the number of segments per host for spread mirroring. 
Grouped mirroring places all of a given hosts segments on a single 
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> 
20250311:10:56:05:245396 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250311:10:56:05:245396 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250311:10:56:05:245396 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema
20250311:10:56:07:245396 gprebalance:cdw:gpadmin-[ERROR]:-%d format: a real number is required, not NoneType
20250311:10:56:07:245396 gprebalance:cdw:gpadmin-[ERROR]:-%d format: a real number is required, not NoneType
20250311:10:56:07:245396 gprebalance:cdw:gpadmin-[ERROR]:-%d format: a real number is required, not NoneType
20250311:10:56:07:245396 gprebalance:cdw:gpadmin-[ERROR]:-%d format: a real number is required, not NoneType
20250311:10:56:07:245396 gprebalance:cdw:gpadmin-[ERROR]:-%d format: a real number is required, not NoneType
20250311:10:56:12:245396 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@gpdb7u:~/src$ gprebalance -r
20250311:10:56:14:245607 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:10:56:14:245607 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:10:56:14:245607 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

</details>

@hilltracer
Copy link

hilltracer commented Mar 11, 2025

These proposals can be ignored: I would prefer not to use single quote in log messages. To make text viewers with color schemes more happy.

Log with single quote
gpadmin@gpdb7u:~/src$ gprebalance -n 1
20250310:15:28:10:081314 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250310:15:28:10:081314 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 12:59:24 (with assert checking) Bhuvnesh C.'
20250310:15:28:10:081314 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be 
using more hosts than the number of segments per host for spread mirroring. 
Grouped mirroring places all of a given hosts segments on a single 
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> 
20250310:15:28:30:081314 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250310:15:28:30:081314 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250310:15:28:30:081314 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema
20250310:15:28:35:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 11, content = 5) sdw2|50250|/home/gpadmin/.data/sdw2/mirror/gpseg5 sdw1|50151|/home/gpadmin/.data/sdw1/mirror/gpseg5
20250310:15:29:34:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 11): /home/gpadmin/.data/sdw2/mirror/gpseg5
20250310:15:29:34:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1 sdw2|10223|/home/gpadmin/.data/sdw2/mirror/gpseg1
20250310:15:30:38:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 5): /home/gpadmin/.data/sdw3/mirror/gpseg1
20250310:15:30:38:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3 sdw3|50137|/home/gpadmin/.data/sdw3/mirror/gpseg3
20250310:15:31:44:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 7): /home/gpadmin/.data/sdw1/mirror/gpseg3
20250310:15:31:44:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 15, content = 7) sdw1|50170|/home/gpadmin/.data/sdw1/mirror/gpseg7 sdw1|50154|/home/gpadmin/.data/sdw1/primary/gpseg7
20250310:15:32:48:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 15): /home/gpadmin/.data/sdw1/mirror/gpseg7
20250310:15:32:48:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 16, content = 8) sdw1|50180|/home/gpadmin/.data/sdw1/mirror/gpseg8 sdw2|10236|/home/gpadmin/.data/sdw2/primary/gpseg8
20250310:15:33:55:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 16): /home/gpadmin/.data/sdw1/mirror/gpseg8
20250310:15:34:00:081314 gprebalance:cdw:gpadmin-[INFO]:-Executing role swaps for 2 segments
20250310:15:34:14:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw3|10370|/home/gpadmin/.data/sdw3/primary/gpseg7 sdw2|10235|/home/gpadmin/.data/sdw2/mirror/gpseg7
20250310:15:35:18:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 18): /home/gpadmin/.data/sdw3/primary/gpseg7
20250310:15:35:18:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 19, content = 8) sdw3|10380|/home/gpadmin/.data/sdw3/primary/gpseg8 sdw3|50147|/home/gpadmin/.data/sdw3/mirror/gpseg8
20250310:15:36:24:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 19): /home/gpadmin/.data/sdw3/primary/gpseg8
20250310:15:36:29:081314 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
Log without single quote
gpadmin@gpdb7u:~/src$ gprebalance -n 1
20250310:15:28:10:081314 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250310:15:28:10:081314 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 12:59:24 (with assert checking) Bhuvnesh C.'
20250310:15:28:10:081314 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You have not specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be 
using more hosts than the number of segments per host for spread mirroring. 
Grouped mirroring places all of a given hosts segments on a single 
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> 
20250310:15:28:30:081314 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250310:15:28:30:081314 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250310:15:28:30:081314 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema
20250310:15:28:35:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 11, content = 5) sdw2|50250|/home/gpadmin/.data/sdw2/mirror/gpseg5 sdw1|50151|/home/gpadmin/.data/sdw1/mirror/gpseg5
20250310:15:29:34:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment datadir (dbidi = 11): /home/gpadmin/.data/sdw2/mirror/gpseg5
20250310:15:29:34:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1 sdw2|10223|/home/gpadmin/.data/sdw2/mirror/gpseg1
20250310:15:30:38:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment datadir (dbidi = 5): /home/gpadmin/.data/sdw3/mirror/gpseg1
20250310:15:30:38:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3 sdw3|50137|/home/gpadmin/.data/sdw3/mirror/gpseg3
20250310:15:31:44:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment datadir (dbidi = 7): /home/gpadmin/.data/sdw1/mirror/gpseg3
20250310:15:31:44:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 15, content = 7) sdw1|50170|/home/gpadmin/.data/sdw1/mirror/gpseg7 sdw1|50154|/home/gpadmin/.data/sdw1/primary/gpseg7
20250310:15:32:48:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment datadir (dbidi = 15): /home/gpadmin/.data/sdw1/mirror/gpseg7
20250310:15:32:48:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 16, content = 8) sdw1|50180|/home/gpadmin/.data/sdw1/mirror/gpseg8 sdw2|10236|/home/gpadmin/.data/sdw2/primary/gpseg8
20250310:15:33:55:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment datadir (dbidi = 16): /home/gpadmin/.data/sdw1/mirror/gpseg8
20250310:15:34:00:081314 gprebalance:cdw:gpadmin-[INFO]:-Executing role swaps for 2 segments
20250310:15:34:14:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw3|10370|/home/gpadmin/.data/sdw3/primary/gpseg7 sdw2|10235|/home/gpadmin/.data/sdw2/mirror/gpseg7
20250310:15:35:18:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment datadir (dbidi = 18): /home/gpadmin/.data/sdw3/primary/gpseg7
20250310:15:35:18:081314 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 19, content = 8) sdw3|10380|/home/gpadmin/.data/sdw3/primary/gpseg8 sdw3|50147|/home/gpadmin/.data/sdw3/mirror/gpseg8
20250310:15:36:24:081314 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment datadir (dbidi = 19): /home/gpadmin/.data/sdw3/primary/gpseg8
20250310:15:36:29:081314 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

@bimboterminator1
Copy link
Member Author

See first rollback after clearing. Is it expected behavior?

Now logging an error.

I would prefer not to use single quote in log messages. To make text viewers with color schemes more happy.

Yep, this should be done in final implementation.

@hilltracer
Copy link

Behavior with error rollback. Is it expected?

make rebalance
gpadmin@gpdb7u:~/src$ gprebalance
20250311:11:56:51:297736 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:11:56:51:297736 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:11:56:51:297736 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be 
using more hosts than the number of segments per host for spread mirroring. 
Grouped mirroring places all of a given hosts segments on a single 
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> 
20250311:11:56:53:297736 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250311:11:56:53:297736 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250311:11:56:53:297736 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema
20250311:11:56:57:297736 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3 sdw3|50137|/home/gpadmin/.data/sdw3/mirror/gpseg3
20250311:11:56:57:297736 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 15, content = 7) sdw1|50170|/home/gpadmin/.data/sdw1/mirror/gpseg7 sdw1|50154|/home/gpadmin/.data/sdw1/primary/gpseg7
20250311:11:56:57:297736 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 11, content = 5) sdw2|50250|/home/gpadmin/.data/sdw2/mirror/gpseg5 sdw1|50151|/home/gpadmin/.data/sdw1/mirror/gpseg5
20250311:11:56:57:297736 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1 sdw2|10223|/home/gpadmin/.data/sdw2/mirror/gpseg1
20250311:11:57:57:297736 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 7): /home/gpadmin/.data/sdw1/mirror/gpseg3
20250311:11:57:57:297736 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 5): /home/gpadmin/.data/sdw3/mirror/gpseg1
20250311:11:57:57:297736 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 11): /home/gpadmin/.data/sdw2/mirror/gpseg5
20250311:11:57:57:297736 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 15): /home/gpadmin/.data/sdw1/mirror/gpseg7
20250311:11:57:57:297736 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 16, content = 8) sdw1|50180|/home/gpadmin/.data/sdw1/mirror/gpseg8 sdw2|10236|/home/gpadmin/.data/sdw2/primary/gpseg8
20250311:11:59:02:297736 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 16): /home/gpadmin/.data/sdw1/mirror/gpseg8
20250311:11:59:07:297736 gprebalance:cdw:gpadmin-[INFO]:-Executing role swaps for 2 segments
20250311:11:59:21:297736 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 19, content = 8) sdw3|10380|/home/gpadmin/.data/sdw3/primary/gpseg8 sdw3|50147|/home/gpadmin/.data/sdw3/mirror/gpseg8
20250311:11:59:21:297736 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw3|10370|/home/gpadmin/.data/sdw3/primary/gpseg7 sdw2|10235|/home/gpadmin/.data/sdw2/mirror/gpseg7
20250311:12:00:28:297736 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 18): /home/gpadmin/.data/sdw3/primary/gpseg7
20250311:12:00:28:297736 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 19): /home/gpadmin/.data/sdw3/primary/gpseg8
20250311:12:00:31:297736 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@gpdb7u:~/src$ mkdir /home/gpadmin/.data/sdw2/mirror/gpseg5
gpadmin@gpdb7u:~/src$ chmod 000 /home/gpadmin/.data/sdw2/mirror/gpseg5
Make poorly for rollback
gpadmin@gpdb7u:~/src$ mkdir /home/gpadmin/.data/sdw2/mirror/gpseg5
gpadmin@gpdb7u:~/src$ chmod 000 /home/gpadmin/.data/sdw2/mirror/gpseg5
Rollback finished with error
gpadmin@gpdb7u:~/src$ gprebalance -r
20250311:12:01:22:307326 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:12:01:22:307326 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:12:01:23:307326 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 19, content = 8) sdw3|50147|/home/gpadmin/.data/sdw3/mirror/gpseg8 sdw3|10380|/home/gpadmin/.data/sdw3/primary/gpseg8
20250311:12:01:23:307326 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw2|10223|/home/gpadmin/.data/sdw2/mirror/gpseg1 sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1
20250311:12:01:23:307326 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw3|50137|/home/gpadmin/.data/sdw3/mirror/gpseg3 sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3
20250311:12:01:23:307326 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 11, content = 5) sdw1|50151|/home/gpadmin/.data/sdw1/mirror/gpseg5 sdw2|50250|/home/gpadmin/.data/sdw2/mirror/gpseg5
20250311:12:02:32:307326 gprebalance:cdw:gpadmin-[ERROR]:-Could not perform mirror dbid=11 move with content 5 due to recoverseg error: Gprecoverseg failed with exit code: 1. See the /home/gpadmin/.data/qddir/demoDataDir-1/rebalance/gprecoverseg_dbid11_20250311_120123.log
Check the gprecoverseg l og file, fix any problems, and re-run
20250311:12:02:35:307326 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw2|10235|/home/gpadmin/.data/sdw2/mirror/gpseg7 sdw3|10370|/home/gpadmin/.data/sdw3/primary/gpseg7
20250311:12:02:37:307326 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 7): /home/gpadmin/.data/sdw3/mirror/gpseg3
20250311:12:02:38:307326 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 19): /home/gpadmin/.data/sdw3/mirror/gpseg8
20250311:12:02:38:307326 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 5): /home/gpadmin/.data/sdw2/mirror/gpseg1
20250311:12:03:46:307326 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 18): /home/gpadmin/.data/sdw2/mirror/gpseg7
20250311:12:03:48:307326 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

note: see typo inside Check the gprecoverseg l og file.

gprecoverseg_dbid11_20250311_120123.log (passage)
20250311:12:01:25:307326 gprebalance:cdw:gpadmin-[INFO]:-Starting to create new pg_hba.conf on primary segments
20250311:12:01:25:307326 gprebalance:cdw:gpadmin-[INFO]:-killing existing walsender process on primary sdw3:10350 to refresh replication connection
20250311:12:01:25:307326 gprebalance:cdw:gpadmin-[INFO]:-Successfully modified pg_hba.conf on primary segments to allow replication connections
20250311:12:01:25:307326 gprebalance:cdw:gpadmin-[INFO]:-1 segment(s) to recover
20250311:12:01:25:307326 gprebalance:cdw:gpadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped
20250311:12:01:26:307326 gprebalance:cdw:gpadmin-[INFO]:-302194: /home/gpadmin/.data/sdw1/mirror/gpseg5
..
20250311:12:01:28:307326 gprebalance:cdw:gpadmin-[INFO]:-Waiting for segments to be marked down.
20250311:12:01:28:307326 gprebalance:cdw:gpadmin-[INFO]:-This may take up to 1800 seconds on large clusters.
............................................................
 20250311:12:02:29:307326 gprebalance:cdw:gpadmin-[INFO]:-1 of 1 segments have been marked down.
20250311:12:02:29:307326 gprebalance:cdw:gpadmin-[INFO]:-Setting up the required segments for recovery
20250311:12:02:29:307326 gprebalance:cdw:gpadmin-[INFO]:-Updating configuration for mirrors
20250311:12:02:29:307326 gprebalance:cdw:gpadmin-[INFO]:-Initiating segment recovery. Upon completion, will start the successfully recovered segments
20250311:12:02:29:307326 gprebalance:cdw:gpadmin-[INFO]:-era is 638eae55d28b225e_250311115600
sdw2 (dbid 11): pg_basebackup: error: could not access directory "/home/gpadmin/.data/sdw2/mirror/gpseg5": Permission denied
20250311:12:02:30:307326 gprebalance:cdw:gpadmin-[INFO]:-----------------------------------------------------------
20250311:12:02:30:307326 gprebalance:cdw:gpadmin-[INFO]:-Failed to recover the following segments
20250311:12:02:30:307326 gprebalance:cdw:gpadmin-[INFO]:- hostname: sdw2; port: 50250; logfile: /home/gpadmin/gpAdminLogs/pg_basebackup.20250311_120229.dbid11.out; recoverytype: full; error: pg_basebackup: error: could not access directory "/home/gpadmin/.data/sdw2/mirror/gpseg5": Permission denied
20250311:12:02:31:307326 gprebalance:cdw:gpadmin-[INFO]:-Triggering FTS probe
20250311:12:02:32:307326 gprebalance:cdw:gpadmin-[ERROR]:-gprecoverseg failed. Please check the output for more details.
Fix and rerun rollback
gpadmin@gpdb7u:~/src$ sudo rm -d /home/gpadmin/.data/sdw2/mirror/gpseg5
gpadmin@gpdb7u:~/src$ gprebalance -r
20250311:12:04:15:312978 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:12:04:15:312978 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:12:04:16:312978 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1 sdw2|10223|/home/gpadmin/.data/sdw2/mirror/gpseg1
20250311:12:04:16:312978 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3 sdw3|50137|/home/gpadmin/.data/sdw3/mirror/gpseg3
20250311:12:04:16:312978 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 19, content = 8) sdw3|10380|/home/gpadmin/.data/sdw3/primary/gpseg8 sdw3|50147|/home/gpadmin/.data/sdw3/mirror/gpseg8
20250311:12:04:16:312978 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw3|10370|/home/gpadmin/.data/sdw3/primary/gpseg7 sdw2|10235|/home/gpadmin/.data/sdw2/mirror/gpseg7
20250311:12:05:03:312978 gprebalance:cdw:gpadmin-[ERROR]:-Could not perform mirror dbid=19 move with content 8 due to recoverseg error: Error in gprecoverseg process: [Errno 2] No such file or directory: '/home/gpadmin/gpAdminLogs/recovery_progress.file'
Check the gprecoverseg l og file, fix any problems, and re-run
20250311:12:05:04:312978 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 7): /home/gpadmin/.data/sdw1/mirror/gpseg3
20250311:12:05:04:312978 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 5): /home/gpadmin/.data/sdw3/mirror/gpseg1
20250311:12:05:04:312978 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 18): /home/gpadmin/.data/sdw3/primary/gpseg7
20250311:12:05:06:312978 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

Why rollback rolls out his previous operations?
Why it could not perform mirror dbid=19 move?

Next rollback
gpadmin@gpdb7u:~/src$ gprebalance -r
20250311:12:40:48:319682 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:12:40:48:319682 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:12:40:49:319682 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 19, content = 8) sdw3|50147|/home/gpadmin/.data/sdw3/mirror/gpseg8 sdw3|10380|/home/gpadmin/.data/sdw3/primary/gpseg8
20250311:12:40:49:319682 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw3|50137|/home/gpadmin/.data/sdw3/mirror/gpseg3 sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3
20250311:12:40:49:319682 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw2|10223|/home/gpadmin/.data/sdw2/mirror/gpseg1 sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1
20250311:12:40:49:319682 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw2|10235|/home/gpadmin/.data/sdw2/mirror/gpseg7 sdw3|10370|/home/gpadmin/.data/sdw3/primary/gpseg7
20250311:12:42:14:319682 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 7): /home/gpadmin/.data/sdw3/mirror/gpseg3
20250311:12:42:14:319682 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 19): /home/gpadmin/.data/sdw3/mirror/gpseg8
20250311:12:42:14:319682 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 5): /home/gpadmin/.data/sdw2/mirror/gpseg1
20250311:12:42:14:319682 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 18): /home/gpadmin/.data/sdw2/mirror/gpseg7
20250311:12:42:19:319682 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
Cluster condition before rebalance
postgres=# SELECT * FROM gp_segment_configuration ORDER BY dbid;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                 datadir                 
------+---------+------+----------------+------+--------+-------+----------+---------+-----------------------------------------
    1 |      -1 | p    | p              | n    | u      |  7000 | cdw      | cdw     | /home/gpadmin/.data/qddir/demoDataDir-1
    2 |       0 | p    | p              | s    | u      | 10100 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg0
    3 |       1 | p    | p              | s    | u      | 10110 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg1
    4 |       0 | m    | m              | s    | u      | 50200 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg0
    5 |       1 | m    | m              | s    | u      | 50310 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg1
    6 |       2 | m    | m              | s    | u      | 50320 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg2
    7 |       3 | m    | m              | s    | u      | 50130 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg3
    8 |       4 | m    | m              | s    | u      | 50140 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg4
    9 |       2 | p    | p              | s    | u      | 10220 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg2
   10 |       3 | p    | p              | s    | u      | 10230 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg3
   11 |       5 | m    | m              | s    | u      | 50250 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg5
   12 |       6 | m    | m              | s    | u      | 50160 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg6
   13 |       4 | p    | p              | s    | u      | 10340 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg4
   14 |       5 | p    | p              | s    | u      | 10350 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg5
   15 |       7 | m    | m              | s    | u      | 50170 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg7
   16 |       8 | m    | m              | s    | u      | 50180 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg8
   17 |       6 | p    | p              | s    | u      | 10360 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg6
   18 |       7 | p    | p              | s    | u      | 10370 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg7
   19 |       8 | p    | p              | s    | u      | 10380 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg8
(19 rows)
Cluster condition after rebalance
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                 datadir                 
------+---------+------+----------------+------+--------+-------+----------+---------+-----------------------------------------
    1 |      -1 | p    | p              | n    | u      |  7000 | cdw      | cdw     | /home/gpadmin/.data/qddir/demoDataDir-1
    2 |       0 | p    | p              | s    | u      | 10100 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg0
    3 |       1 | p    | p              | s    | u      | 10110 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg1
    4 |       0 | m    | m              | s    | u      | 50200 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg0
    5 |       1 | m    | m              | s    | u      | 10223 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg1
    6 |       2 | m    | m              | s    | u      | 50320 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg2
    7 |       3 | m    | m              | s    | u      | 50137 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg3
    8 |       4 | m    | m              | s    | u      | 50140 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg4
    9 |       2 | p    | p              | s    | u      | 10220 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg2
   10 |       3 | p    | p              | s    | u      | 10230 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg3
   11 |       5 | m    | m              | s    | u      | 50151 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg5
   12 |       6 | m    | m              | s    | u      | 50160 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg6
   13 |       4 | p    | p              | s    | u      | 10340 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg4
   14 |       5 | p    | p              | s    | u      | 10350 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg5
   15 |       7 | p    | p              | s    | u      | 50154 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg7
   16 |       8 | p    | p              | s    | u      | 10236 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg8
   17 |       6 | p    | p              | s    | u      | 10360 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg6
   18 |       7 | m    | m              | s    | u      | 10235 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg7
   19 |       8 | m    | m              | s    | u      | 50147 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg8
(19 rows)
Cluster condition after last rollback
postgres=# SELECT * FROM gp_segment_configuration ORDER BY dbid;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                 datadir                 
------+---------+------+----------------+------+--------+-------+----------+---------+-----------------------------------------
    1 |      -1 | p    | p              | n    | u      |  7000 | cdw      | cdw     | /home/gpadmin/.data/qddir/demoDataDir-1
    2 |       0 | p    | p              | s    | u      | 10100 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg0
    3 |       1 | p    | p              | s    | u      | 10110 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg1
    4 |       0 | m    | m              | s    | u      | 50200 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg0
    5 |       1 | m    | m              | s    | u      | 50310 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg1
    6 |       2 | m    | m              | s    | u      | 50320 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg2
    7 |       3 | m    | m              | s    | u      | 50130 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg3
    8 |       4 | m    | m              | s    | u      | 50140 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg4
    9 |       2 | p    | p              | s    | u      | 10220 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg2
   10 |       3 | p    | p              | s    | u      | 10230 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg3
   11 |       5 | m    | m              | n    | d      | 50151 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg5
   12 |       6 | m    | m              | s    | u      | 50160 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg6
   13 |       4 | p    | p              | s    | u      | 10340 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg4
   14 |       5 | p    | p              | n    | u      | 10350 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg5
   15 |       7 | p    | p              | s    | u      | 50154 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg7
   16 |       8 | p    | p              | s    | u      | 10236 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg8
   17 |       6 | p    | p              | s    | u      | 10360 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg6
   18 |       7 | m    | m              | s    | u      | 10370 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg7
   19 |       8 | m    | m              | s    | u      | 10380 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg8
(19 rows)

The same experiment but without sudo rm -d /home/gpadmin/.data/sdw2/mirror/gpseg5:

Don't fix and rerun rollback
gpadmin@gpdb7u:~/src$ mkdir /home/gpadmin/.data/sdw2/mirror/gpseg5
gpadmin@gpdb7u:~/src$ chmod 000 /home/gpadmin/.data/sdw2/mirror/gpseg5
gpadmin@gpdb7u:~/src$ gprebalance -r
20250311:11:41:03:262182 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:11:41:03:262182 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:11:41:05:262182 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 19, content = 8) sdw3|50147|/home/gpadmin/.data/sdw3/mirror/gpseg8 sdw3|10380|/home/gpadmin/.data/sdw3/primary/gpseg8
20250311:11:41:05:262182 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw3|50137|/home/gpadmin/.data/sdw3/mirror/gpseg3 sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3
20250311:11:41:05:262182 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw2|10223|/home/gpadmin/.data/sdw2/mirror/gpseg1 sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1
20250311:11:41:05:262182 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 11, content = 5) sdw1|50151|/home/gpadmin/.data/sdw1/mirror/gpseg5 sdw2|50250|/home/gpadmin/.data/sdw2/mirror/gpseg5
20250311:11:42:00:262182 gprebalance:cdw:gpadmin-[ERROR]:-Could not perform mirror dbid=11 move with content 5 due to recoverseg error: Gprecoverseg failed with exit code: 1. See the /home/gpadmin/.data/qddir/demoDataDir-1/rebalance/gprecoverseg_dbid11_20250311_114105.log
Check the gprecoverseg l og file, fix any problems, and re-run
20250311:11:42:01:262182 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw2|10235|/home/gpadmin/.data/sdw2/mirror/gpseg7 sdw3|10370|/home/gpadmin/.data/sdw3/primary/gpseg7
20250311:11:42:05:262182 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 5): /home/gpadmin/.data/sdw2/mirror/gpseg1
20250311:11:42:05:262182 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 19): /home/gpadmin/.data/sdw3/mirror/gpseg8
20250311:11:42:06:262182 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 7): /home/gpadmin/.data/sdw3/mirror/gpseg3
20250311:11:43:11:262182 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 18): /home/gpadmin/.data/sdw2/mirror/gpseg7
20250311:11:43:15:262182 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@gpdb7u:~/src$ gprebalance -r
20250311:11:44:10:267808 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250311:11:44:10:267808 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250311:11:44:11:267808 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3 sdw3|50137|/home/gpadmin/.data/sdw3/mirror/gpseg3
20250311:11:44:11:267808 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw3|10370|/home/gpadmin/.data/sdw3/primary/gpseg7 sdw2|10235|/home/gpadmin/.data/sdw2/mirror/gpseg7
20250311:11:44:11:267808 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 19, content = 8) sdw3|10380|/home/gpadmin/.data/sdw3/primary/gpseg8 sdw3|50147|/home/gpadmin/.data/sdw3/mirror/gpseg8
20250311:11:44:11:267808 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1 sdw2|10223|/home/gpadmin/.data/sdw2/mirror/gpseg1
20250311:11:45:22:267808 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 7): /home/gpadmin/.data/sdw1/mirror/gpseg3
20250311:11:45:22:267808 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 18): /home/gpadmin/.data/sdw3/primary/gpseg7
20250311:11:45:22:267808 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 5): /home/gpadmin/.data/sdw3/mirror/gpseg1
20250311:11:45:22:267808 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 19): /home/gpadmin/.data/sdw3/primary/gpseg8
20250311:11:45:26:267808 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Mar 11, 2025

Why it could not perform mirror dbid=19 move?

I am fixing this currently in ADBDEV-6855, as soon the commit be ready I'll merge it here

Why rollback rolls out his previous operations?

I'll check it now

And about reruns, proper rerun (resume operation) after an error is not implemented. So rollback continuation after rollback failure is simply not implemented

@hilltracer
Copy link

So rollback continuation after rollback failure is simply not implemented

Should it be done in MVP?

@bimboterminator1
Copy link
Member Author

Should it be done in MVP?

No, it is confirmed in the comments in parent ticket for mvp . Resumation is not implemented.

Why rollback rolls out his previous operations?

Now this is stubbed by an error.

Overall, for MVP current implementation is enough

Base automatically changed from ADBDEV-6855 to feature/ADBDEV-6608 March 13, 2025 11:34
@hilltracer
Copy link

hilltracer commented Mar 17, 2025

I see some strange behavior of rollback after failed gprebance:

Init unbalanced condition
postgres=# SELECT * FROM gp_segment_configuration ORDER BY dbid;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                 datadir                 
------+---------+------+----------------+------+--------+-------+----------+---------+-----------------------------------------
    1 |      -1 | p    | p              | n    | u      |  7000 | cdw      | cdw     | /home/gpadmin/.data/qddir/demoDataDir-1
    2 |       0 | p    | p              | s    | u      | 10100 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg0
    3 |       1 | p    | p              | s    | u      | 10110 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg1
    4 |       0 | m    | m              | s    | u      | 50200 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg0
    5 |       1 | m    | m              | s    | u      | 50310 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg1
    6 |       2 | m    | m              | s    | u      | 50320 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg2
    7 |       3 | m    | m              | s    | u      | 50130 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg3
    8 |       4 | m    | m              | s    | u      | 50140 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg4
    9 |       2 | p    | p              | s    | u      | 10220 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg2
   10 |       3 | p    | p              | s    | u      | 10230 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg3
   11 |       5 | m    | m              | s    | u      | 50250 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg5
   12 |       6 | m    | m              | s    | u      | 50160 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg6
   13 |       4 | p    | p              | s    | u      | 10340 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg4
   14 |       5 | p    | p              | s    | u      | 10350 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg5
   15 |       7 | m    | m              | s    | u      | 50170 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg7
   16 |       8 | m    | m              | s    | u      | 50180 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg8
   17 |       6 | p    | p              | s    | u      | 10360 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg6
   18 |       7 | p    | p              | s    | u      | 10370 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg7
   19 |       8 | p    | p              | s    | u      | 10380 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg8
(19 rows)
Perform pgrebalance and crush it with help `pkill postgres` in other session
gpadmin@gpdb7u:~/src$ gprebalance
20250317:08:17:09:529966 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250317:08:17:09:529966 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250317:08:17:09:529966 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be 
using more hosts than the number of segments per host for spread mirroring. 
Grouped mirroring places all of a given hosts segments on a single 
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> 
20250317:08:17:11:529966 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250317:08:17:11:529966 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250317:08:17:11:529966 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema
20250317:08:17:15:529966 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 7, content = 3) sdw1|50130|/home/gpadmin/.data/sdw1/mirror/gpseg3 sdw3|10347|/home/gpadmin/.data/sdw3/mirror/gpseg3
20250317:08:17:15:529966 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 5, content = 1) sdw3|50310|/home/gpadmin/.data/sdw3/mirror/gpseg1 sdw2|50203|/home/gpadmin/.data/sdw2/mirror/gpseg1
20250317:08:17:15:529966 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 15, content = 7) sdw1|50170|/home/gpadmin/.data/sdw1/mirror/gpseg7 sdw1|50144|/home/gpadmin/.data/sdw1/primary/gpseg7
20250317:08:17:15:529966 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 11, content = 5) sdw2|50250|/home/gpadmin/.data/sdw2/mirror/gpseg5 sdw1|50141|/home/gpadmin/.data/sdw1/mirror/gpseg5
20250317:08:18:19:529966 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 7): /home/gpadmin/.data/sdw1/mirror/gpseg3
20250317:08:18:19:529966 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 5): /home/gpadmin/.data/sdw3/mirror/gpseg1
20250317:08:18:19:529966 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 11): /home/gpadmin/.data/sdw2/mirror/gpseg5
20250317:08:18:19:529966 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 15): /home/gpadmin/.data/sdw1/mirror/gpseg7
20250317:08:18:19:529966 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 16, content = 8) sdw1|50180|/home/gpadmin/.data/sdw1/mirror/gpseg8 sdw2|50216|/home/gpadmin/.data/sdw2/primary/gpseg8
20250317:08:18:23:529966 gprebalance:cdw:gpadmin-[ERROR]:-Could not perform mirror dbid=16 move with content 8 due to recoverseg error: Error in gprecoverseg process: could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 7000?
could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 7000?

Check the gprecoverseg l og file, fix any problems, and re-run
20250317:08:18:23:529966 gprebalance:cdw:gpadmin-[ERROR]:-terminating connection due to administrator command
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
20250317:08:18:25:529966 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: terminating connection due to administrator command
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
 

Exiting...
20250317:08:18:25:529966 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@gpdb7u:~/src$ gprebalance --rollback
20250317:08:18:39:535453 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250317:08:18:39:535453 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 7000?
could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 7000?
 

Exiting...
Start cluster
gpadmin@gpdb7u:~/src$ gpstart -a
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Starting gpstart with args: -a
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Gathering information and validating the environment...
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Greenplum Catalog Version: '302307241'
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Starting Coordinator instance in admin mode
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /home/gpadmin/.data/qddir/demoDataDir-1 -l /home/gpadmin/.data/qddir/demoDataDir-1/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Obtaining Greenplum Coordinator catalog information
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Setting new coordinator era
20250317:08:18:50:535456 gpstart:cdw:gpadmin-[INFO]:-Coordinator Started...
20250317:08:18:51:535456 gpstart:cdw:gpadmin-[INFO]:-Shutting down coordinator
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-Process results...
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-----------------------------------------------------
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-   Successful segment starts                                            = 18
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-----------------------------------------------------
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-Successfully started 18 of 18 segment instances 
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-----------------------------------------------------
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-Starting Coordinator instance cdw directory /home/gpadmin/.data/qddir/demoDataDir-1 
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=638eae55d28b225e_250317081850 $GPHOME/bin/pg_ctl -D /home/gpadmin/.data/qddir/demoDataDir-1 -l /home/gpadmin/.data/qddir/demoDataDir-1/log/startup.log -w -t 600 -o " -c gp_role=dispatch " start
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-Command pg_ctl reports Coordinator cdw instance active
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-Connecting to db template1 on host localhost
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-No standby coordinator configured.  skipping...
20250317:08:18:53:535456 gpstart:cdw:gpadmin-[INFO]:-Database successfully started
Try to perform rollback
gpadmin@gpdb7u:~/src$ gprebalance --rollback
20250317:08:20:23:536548 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g191b04220ce build dev'
20250317:08:20:23:536548 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g191b04220ce build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 11 2025 10:34:13 (with assert checking) Bhuvnesh C.'
20250317:08:20:23:536548 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

Expected behavior after rollback - cluster in init condition

Actual behaviour:

No changes after failed gprebalance
postgres=# SELECT * FROM gp_segment_configuration ORDER BY dbid;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                 datadir                 
------+---------+------+----------------+------+--------+-------+----------+---------+-----------------------------------------
    1 |      -1 | p    | p              | n    | u      |  7000 | cdw      | cdw     | /home/gpadmin/.data/qddir/demoDataDir-1
    2 |       0 | p    | p              | s    | u      | 10100 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg0
    3 |       1 | p    | p              | s    | u      | 10110 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg1
    4 |       0 | m    | m              | s    | u      | 50200 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg0
    5 |       1 | m    | m              | s    | u      | 50203 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/mirror/gpseg1
    6 |       2 | m    | m              | s    | u      | 50320 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg2
    7 |       3 | m    | m              | s    | u      | 10347 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/mirror/gpseg3
    8 |       4 | m    | m              | s    | u      | 50140 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg4
    9 |       2 | p    | p              | s    | u      | 10220 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg2
   10 |       3 | p    | p              | s    | u      | 10230 | sdw2     | sdw2    | /home/gpadmin/.data/sdw2/primary/gpseg3
   11 |       5 | m    | m              | s    | u      | 50141 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg5
   12 |       6 | m    | m              | s    | u      | 50160 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg6
   13 |       4 | p    | p              | s    | u      | 10340 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg4
   14 |       5 | p    | p              | s    | u      | 10350 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg5
   15 |       7 | m    | m              | s    | u      | 50144 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/primary/gpseg7
   16 |       8 | m    | m              | s    | u      | 50180 | sdw1     | sdw1    | /home/gpadmin/.data/sdw1/mirror/gpseg8
   17 |       6 | p    | p              | s    | u      | 10360 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg6
   18 |       7 | p    | p              | s    | u      | 10370 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg7
   19 |       8 | p    | p              | s    | u      | 10380 | sdw3     | sdw3    | /home/gpadmin/.data/sdw3/primary/gpseg8
(19 rows)

Is it expected?

Copy link

@hilltracer hilltracer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok that PR into the branch ADBDEV-6855. Which already merged into feature/ADBDEV-6608?

@bimboterminator1
Copy link
Member Author

I see some strange behavior of rollback after failed gprebance:

Fixed this case.

Currently there is no proper state machine for gprebalance, so all possible errnoneous cases are not covered and not implemented. Overall, it has been discussed that current functionality if enough for MVP. All extra suggestions are welcome and wil definetely be considered for final SRS.

@hilltracer
Copy link

It's still doesn't perform rollback in this case:

  • Init unbalanced condition
  • Perform pgrebalance and crush it with help pkill postgres in other session
  • !!! Try to perform rollback <-- I've added this command
log message
gpadmin@cdw:~/src$ gprebalance --rollback
20250318:12:09:10:831077 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.64.ge3dd99372dc build dev'
20250318:12:09:10:831077 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 7000?
could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 7000?
Exiting...
  • Start cluster
  • Try to perform rollback

@bimboterminator1
Copy link
Member Author

It's still doesn't perform rollback in this case:

Probably last changes will improve the situation

@hilltracer
Copy link

Still doesn't work in this scenario. I printed status:

Last rollback
gpadmin@gpdb7u:~/src$ gprebalance --rollback
20250320:05:20:51:078408 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.64.ge3dd99372dc build dev'
20250320:05:20:51:078408 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.64.ge3dd99372dc build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 18 2025 10:51:28 (with assert checking) Bhuvnesh C.'
20250320:05:20:51:078408 gprebalance:cdw:gpadmin-[INFO]:-RebalanceStatus.COMPLETED <<<------ see status here
20250320:05:20:51:078408 gprebalance:cdw:gpadmin-[INFO]:-Rollback has already completed
20250320:05:20:51:078408 gprebalance:cdw:gpadmin-[INFO]:-If you want to rebalance again, run gprebalance -c to perform cleanup
20250320:05:20:51:078408 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
--- a/gpMgmt/bin/gprebalance
+++ b/gpMgmt/bin/gprebalance
@@ -413,7 +413,7 @@ def main(options, args, parser):
             sys.exit(0)
 
         from gprebalance_modules.rebalance_status import RebalanceStatus  # nopep8
-
+        logger.info(gprebalance_db_status)
         if (gprebalance_db_status == RebalanceStatus.COMPLETED or
             gprebalance_file_status == 'EXECUTION_DONE') and not options.rollback:
             logger.info('Rebalance has already completed')

But LGTM. I'm ready to approve. If needed.

@bimboterminator1 bimboterminator1 merged commit fcc28c4 into feature/ADBDEV-6608 Mar 20, 2025
1 check passed
@bimboterminator1 bimboterminator1 deleted the ADBDEV-6857 branch March 20, 2025 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants