Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bucket migration state improvement #1250

Merged
merged 24 commits into from
Dec 18, 2023
Merged

Conversation

jingjunLi
Copy link
Contributor

@jingjunLi jingjunLi commented Nov 15, 2023

Description

Optimizations for bucket migration:

  • Persistence of Quota State
  • Optimization of state persistence in DB for each phase of bucket migration
  • Bucket Migration Progress Query
  • Confirming Complete Event & RejectMigrateBucketEvent
  • Bucket Migration GC Trigger

Feat 1: Persistence of Quota State

In the previous setup, the source storage provider (Src SP) utilized an in-memory map (ManageModular::migratingBuckets) to keep track of whether the quota for bucket migration had been deducted. However, this approach presented challenges in scenarios where the source storage provider experienced an outage and needed to restart, making it difficult to maintain accurate state. To address this issue, the in-memory map has been replaced with a persistent MySQL table.
image

Abstract some logic from preMigrateBucketHandler and postMigrateBucketHandler into Manager::NotifyPreMigrateBucketAndDeductQuota & Manager::NotifyPostMigrateBucketAndRecoupQuota.

    1. After Src SP deducts Quota, the corresponding state is persisted in the database, and a check is performed. This mainly involves the deducted quota amount and the recouped quota amount.
    1. Src SP recoups Quota.
      image

feat 2: Optimization of state persistence in DB for each phase of bucket migration

  • Introduction of the MigrateBucketProgressTable table to record the state and meta of bucket migration. Additionally, encapsulation of interfaces for modifying state and meta.

feat 3: Bucket Migration Progress Query

In both MigrateGVGUnitMeta (core/spdb/entity.go) and MigrateGVGTable, add a field named MigratedBytesSize.

  1. Upon each successful migration, accumulate the MigratedBytesSize from the task. When reporting and calling updateMigrateGVGStatus during the update, use UpdateMigrateGVGMigratedBytesSize to update the MigratedBytesSize field.
  2. Users can query the progress of all bucket migrations through listExecutePlan.

Rationale

Example

NA

Changes

Notable changes:

Notification from Destination SP to Source SP - PreMigrateBucket:
In the processing flow (preMigrateBucketHandler) at the source SP upon receiving notification:

  1. Query the MigrateBucketTable for the MigrateStatus status. If the quota has already been deducted, return "ok" directly.
  2. If quota deduction is pending, invoke DeductQuotaForBucketMigrate to deduct the quota.
  3. After quota deduction, call NotifyDonePreMigrateBucket to update the MigrateStatus status in the MigrateBucketTable to "PreNotified."

Notification from Destination SP to Source SP - PostMigrateBucket:

  • PostMigrateBucket can only be invoked once.

Potential Impacts

  • add potential impacts for other components here
  • ...

@jingjunLi jingjunLi added the wip Working in process label Nov 16, 2023
@jingjunLi jingjunLi changed the title feat: add bucket status feat: pre & post MigrateBucketHandler persist improvement Nov 17, 2023
@jingjunLi jingjunLi added r4r Ready for review wip Working in process and removed wip Working in process r4r Ready for review labels Nov 17, 2023
@jingjunLi jingjunLi force-pushed the feat-add-bucket-status branch 2 times, most recently from 8560aa1 to 29c40d6 Compare December 6, 2023 06:06
@jingjunLi jingjunLi changed the title feat: pre & post MigrateBucketHandler persist improvement feat: bucket migration state improvement Dec 6, 2023
@jingjunLi jingjunLi added r4r Ready for review and removed wip Working in process labels Dec 6, 2023
@jingjunLi jingjunLi assigned RenRick and sysvm and unassigned RenRick and sysvm Dec 6, 2023
@jingjunLi jingjunLi force-pushed the feat-add-bucket-status branch 2 times, most recently from 8a68460 to 578d0e0 Compare December 11, 2023 05:20
Copy link
Collaborator

@sysvm sysvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

base/gfspapp/manage_server.go Show resolved Hide resolved
base/gfspapp/manage_server.go Show resolved Hide resolved
NotifyPostMigrateBucket(ctx context.Context, bmInfo *gfsptask.GfSpBucketMigrationInfo) error
GetMigrationBucketState(ctx context.Context, bucketID uint64) (*gfspserver.MigrateBucketProgressMeta, error)
NotifyPostMigrateBucketAndRecoupQuota(ctx context.Context, bmInfo *gfsptask.GfSpBucketMigrationInfo) (*gfsptask.GfSpBucketQuotaInfo, error)
NotifyPreMigrateBucketAndDeductQuota(ctx context.Context, bucketID uint64) (*gfsptask.GfSpBucketQuotaInfo, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to keep a consistent style.

@@ -127,6 +128,7 @@ type MetadataAPI interface {
ListGlobalVirtualGroupsByBucket(ctx context.Context, bucketID uint64, opts ...grpc.DialOption) ([]*virtualgrouptypes.GlobalVirtualGroup, error)
ListGlobalVirtualGroupsBySecondarySP(ctx context.Context, spID uint32, opts ...grpc.DialOption) ([]*virtualgrouptypes.GlobalVirtualGroup, error)
ListMigrateBucketEvents(ctx context.Context, blockID uint64, spID uint32, opts ...grpc.DialOption) ([]*types.ListMigrateBucketEvents, error)
ListCompleteMigrationBucketEvents(ctx context.Context, blockID uint64, srcSpID uint32, opts ...grpc.DialOption) ([]*storagetypes.EventCompleteMigrationBucket, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to distinguish the difference between ListMigrateBucketEvents and ListCompleteMigrationBucketEvents through naming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of the interfaces are currently a bit different:

  • ListMigrateBucketEvents returns all events between (0-height), while ListCompleteMigrationBucketEvents returns the current block height.
  • ListMigrateBucketEvents does not return CompleteMigrationBucketEvent. ListCompleteMigrationBucketEvents is mainly used for the src SP to listen for CompleteMigrationBucketEvent.
  • ListMigrateBucketEvents has an internal table, EventMigrationBucket, using (dst_primary_sp_id). There is no src_sp_id here, making the implementation a bit more complex.

Considering the above reasons, ListCompleteMigrationBucketEvents has been implemented."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. src sp , dest sp comment

@@ -373,3 +373,38 @@ func (m *GfSpGCBucketMigrationTask) GetBucketID() uint64 {
func (m *GfSpGCBucketMigrationTask) SetBucketID(bucketID uint64) {
m.BucketId = bucketID
}

func (m *GfSpGCBucketMigrationTask) SetLastGCObjectID(lastGCObjectID uint64) {
m.LastGcObjectId = lastGCObjectID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gc or GC? keep a consistent style.

// used by persist bucket migration progress and meta
type MigrateBucketProgressTable struct {
BucketID uint64 `gorm:"primary_key"`
SubscribedBlockHeight uint64 `gorm:"primary_key"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of the two primary keys in gorm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 -> xx init, migrate bucket process
+++ 101 102;
103 -> migrate finished;
add more comment;

base/gfspclient/manager.go Outdated Show resolved Hide resolved
@@ -119,8 +119,9 @@ type MigrateGVGUnitMeta struct {
SrcSPID uint32
DestSPID uint32
LastMigratedObjectID uint64
MigrateStatus int // scheduler assign unit status.
RetryTime int //
MigrateStatus int // scheduler assign unit status.
Copy link
Collaborator

@will-2012 will-2012 Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status or state, pls distinguish semantics and be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the migrate gvg task 's migrate status

@@ -384,13 +384,16 @@ func (e *ExecuteModular) gcMetaReadRecord(ctx context.Context, task coretask.GCM
}

func (e *ExecuteModular) HandleGCBucketMigrationBucket(ctx context.Context, task coretask.GCBucketMigrationTask) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether to support continuing gc from the last breakpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. parallel
  2. resumable gc, breakpoint

modular/gater/router.go Show resolved Hide resolved
store/sqldb/migrate.go Outdated Show resolved Hide resolved
store/sqldb/migrate.go Outdated Show resolved Hide resolved
store/sqldb/migrate.go Outdated Show resolved Hide resolved
base/gfspapp/manage_server.go Show resolved Hide resolved
modular/gater/bucket_handler.go Show resolved Hide resolved
modular/gater/migrate_handler.go Show resolved Hide resolved
modular/metadata/metadata_sp_exit_service.go Show resolved Hide resolved
err = e.gcWorker.deleteObjectSegmentsAndIntegrity(ctx, objectInfo)
log.CtxInfow(ctx, "succeed to delete objects by gvg and bucket for gc", "object", objectInfo, "error", err)
for {
if objectList, err = e.baseApp.GfSpClient().ListObjectsByGVGAndBucketForGC(ctx, gvg.GetId(), bucketID, startAfter, limit); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not find out startAfter is updated in this for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

log.CtxErrorw(reqCtx.Context(), "failed to pre migrate bucket, the bucket may already notified", "bucket_id",
bucketID, "error", err)
// if the bucket has already pre notified ignore the error
if strings.Contains(err.Error(), "the bucket has already notified") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the err can be nil here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if the quota has been deducted, the preMigrateBucket should be return true;

@@ -526,32 +555,68 @@ func (s *BucketMigrateScheduler) doneMigrateBucket(bucketID uint64) error {
s.deleteExecutePlanByBucketID(bucketID)
executePlan.stopSPSchedule()
err = s.manager.baseApp.GfSpDB().DeleteMigrateGVGUnitsByBucketID(bucketID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this records stay in DB if skipping?

if err = UpdateBucketMigrationProgress(s.manager.baseApp, bucketID, storetypes.BucketMigrationState_MIGRATION_FINISHED); err != nil {
return
}
log.CtxInfow(ctx, "succeed to confirm complete events", "EventMigrationBucket", event, "error", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

// get bucket quota and check
if bucketSize, err = m.getBucketTotalSize(ctx, bucketID); err != nil {
log.Errorf("failed to get bucket total object size", "bucket_id", bucketID, "error", err)
return latestQuota, err
}

if bmInfo.GetFinished() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this useless if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted, this is the old bucket migration gc trigger way.

@@ -243,7 +245,7 @@ func (s *GfSpClient) QueryLatestBucketQuota(ctx context.Context, endpoint string
return gfsptask.GfSpBucketQuotaInfo{}, fmt.Errorf("failed to query latest bucket quota, bucket(%s), status_code(%d), endpoint(%s)", queryMsg, resp.StatusCode, endpoint)
}

signedMsg, err := hex.DecodeString(resp.Header.Get(GnfdSignedApprovalMsgHeader))
signedMsg, err := hex.DecodeString(resp.Header.Get(GnfdQuotaInfoHeader))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signedMsg may be misleading, it can be optimized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

type MigrateBucketProgressMeta struct {
BucketID uint64 // as primary key
SubscribedBlockHeight uint64
MigrationState int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrateState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jingjunLi jingjunLi merged commit e7aeec1 into develop Dec 18, 2023
12 checks passed
@jingjunLi jingjunLi deleted the feat-add-bucket-status branch December 18, 2023 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r4r Ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants