New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: re-implementation of sp exit #1279
Conversation
…ge-provider into adapt-sp-exit
…ge-provider into adapt-sp-exit
…ge-provider into adapt-sp-exit
…ge-provider into adapt-sp-exit
…ge-provider into adapt-sp-exit
cmd/storage_provider/main.go
Outdated
@@ -131,6 +131,16 @@ func init() { | |||
command.SetQuotaCmd, | |||
// block syncer | |||
bs_data_migration.BsDataMigrationCmd, | |||
// sp exit v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we officially call it "v2"?
} | ||
|
||
func SpExitAction(ctx *cli.Context) error { | ||
cfg, err := utils.MakeConfig(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This action looks very critical.
It will be nice to have a "re-confirm" mechanism ( e.g. output a warning and let SP operator to input its sp address to reconfirm)
BTW, can this operation be cancelable?
The operation method has been added |
The commit message can be resolved using a squash merge, and other checks have passed |
@@ -156,12 +156,32 @@ func (gc *GCWorker) checkGVGMatchSP(ctx context.Context, objectInfo *storagetype | |||
|
|||
if redundancyIndex == piecestore.PrimarySPRedundancyIndex { | |||
if gvg.GetPrimarySpId() != spID { | |||
swapInInfo, err := gc.e.baseApp.Consensus().QuerySwapInInfo(ctx, gvg.FamilyId, virtualgrouptypes.NoSpecifiedGVGId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a check in isAllowGCCheck
, if the sp is in STATUS_GRACEFUL_EXITING
, we do not gc for this sp;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the exiting SP, It can still do the GC, it does not matter
if !ok { | ||
return true | ||
} | ||
return len(stats.SucceedSegments)+len(stats.FailedSegments) == stats.SegmentCount && len(stats.FailedSegments) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If either of these two occurs, it is an error?
len(stats.SucceedSegments)+len(stats.FailedSegments) != stats.SegmentCount || len(stats.FailedSegments) > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(stats.SucceedSegments)+len(stats.FailedSegments) != stats.SegmentCount
will be true if there is only 1 piece responded yet.
…ge-provider into adapt-sp-exit
cd432e9
to
7e6bca6
Compare
8f510f1
to
720ca0e
Compare
Description
This PR aims at implementing the SP exit data recover process and required CMDs.
This SP exit process is outlined below:
MsgStorageProviderExit
to Greenfield, it is status will becomeSTATUS_GRACEFUL_EXITING
MsgReserveSwapIn
to reserve the position.the recovery will happen between the successor and secondary SPs of VGF. If the exiting SP is the secondary SP, the recovery will happen between the successor and the primary SP of GVG.
MsgCompleteSwapIn
to ack the success, within GVGF/GVG, SP replacement will take place.MsgCompleteStorageProviderExit
to complete such SP's exit.For more details, refer to bnb-chain/BEPs#338
Specifically, this PR will be mainly focus on step 3 and 4. Which allows successor SP to recover data to achieve the SP exit.
Implementation:
RecoverGVGScheduler
: GVG is the unit to init a scheduler, the scheduler will constantly fetch every batch of object by meta apiListObjectsInGVG
with paramsStartAfter
andLimit
. And push object's related recover piece tasks torecoverQueue
. After it iterates all objects in the GVG, regardless objects are all recovered or there exists failure, it will mark the GVG status toprocessed
.RecoverFailedObjectScheduler
: a scheduler that specifically for recovering objects found failed to be recovered. The failure coming from theRecoverGVGScheduler
andVerifyGVGScheduler
.VerifyGVGScheduler
: verify that every object is indeed recovered or not, if not, the object will be picked up byRecoverFailedObjectScheduler
. Once all objects found recovered, it will automatically send a txMsgCompleteSwapIn
to chain and then stop all schedulers. If there are objects that cant be recovered and exceeding the retrial limit. The scheduler will also stop and user need to query failed objects by CMD listed below. Either discontinue it or retry.Sp exit CMD
exit sp
./gnfd-sp spExit --config ./config.toml
./gnfd-sp completeSpExit --config ./config.toml
successor sp CMD
./gnfd-sp swapIn --config ./config.toml -f vgf id --gid gvg id -sp target sp id
./gnfd-sp recover-gvg --config ./config.toml --gid gvg id
./gnfd-sp recover-vgf --config ./config.toml -f vgf id
./gnfd-sp query-recover-p --config ./config.toml -f vgf id --gid gvg id
./gndf-sp completeSwapIn --config ./config.toml -f vgf id --gid gvg id
exit tool CMD
./gnfd-sp query-gvg-by-sp --config ./config.toml -sp spid
./gnfd-sp query-vgf-by-sp --config ./config.toml -sp spid
Rationale
The process for sp exit needs to be simpler and easier to maintain
Example
To exit an sp, only need to run two cmd commands
Changes
Notable changes:
Potential Impacts