Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: gc for zombie piece & metaTask & bucket migration #1190

Merged
merged 10 commits into from
Nov 24, 2023

Conversation

jingjunLi
Copy link
Contributor

@jingjunLi jingjunLi commented Oct 18, 2023

Description

  • GCZombiePiece
    GCZombiePieceTask is an abstract interface to record the information for collecting the piece store space by deleting zombie pieces data that dues to any exception, the piece data meta is not on chain but the pieces have been stored in piece store.

  • GCMeta
    GCMetaTask is an abstract interface to record the information for collecting the SP meta store space by deleting the expired data.

  • GC for Bucket Migration
    When bucket migration is completed or has failed, we need to delete redundant data to free up space:

    1. If the bucket migration is successful, we delete the data from the source node.
    2. If the bucket migration fails, we delete the data from the dest node

Implementation of the GCZombie

  • New Configurations
    • GCConfig::EnableGCZombie: Enables or disables the GCZombie feature.
    • GCConfig::GCZombiePieceTimeInterval: Time interval for generating GCZombie tasks (default: DefaultGlobalBatchGcZombiePieceTimeInterval - 10*60 seconds).
    • ParallelConfig::GlobalGCZombieParallel: Maximum allowed parallel GCZombie tasks (default: 1).
    • GCZombieSafeObjectIDDistance: A reserve of object IDs during GCZombie deletion. If the scanned object ID range plus DefaultGlobalGcZombieSafeObjectIDDistance exceeds the current maximum system ID, the scan resumes from 0.
    • GCZombiePieceObjectIDInterval: Interval between generated object IDs for each GCZombie task. For example, task 1 might handle IDs 0-100, task 2 101-200, and so on (default: DefaultGlobalGcZombiePieceObjectIDInterval - 100).

Core Processes

image
  • Generation of GCZombiePieceTask: Triggered by gcZombiePieceTicker resulting in the creation of GCZombiePieceTask.
  • Processing of GCZombiePieceTask: ExecuteModular::HandleGCZombiePieceTask.
    • gcZombiePieceFromIntegrityMeta: Determines whether a piece is a ZombiePiece based on the IntegrityMeta table. Scans all IntegrityMeta within the current object ID range specified in GCZombiePieceTask (StartObjectId, EndObjectId).
    • gcZombiePieceFromPieceHash: Determines whether a piece is a ZombiePiece based on the PieceHash table. Scans all PieceHash within the current object ID range specified in GCZombiePieceTask (StartObjectId, EndObjectId).
  • Report Handling of GCZombiePieceTask: ManageModular::HandleGCZombiePieceTask.
    • If the task is successful, it is removed from the gcZombieQueue.
    • If the task fails, it is reinserted into the queue. If the retry limit is exceeded, the task is canceled.

Implementation of the Meta GC

Mainly for the GC (Garbage Collection) of different GC meta, currently periodically deleting expired data from two tables: bucketTraffic and readRecord.

New Configurations:

  • Expiration Time and Interval Configuration:
    • bucketTrafficKeepLatestDay: Configured in ExecutorConfig::BucketTrafficKeepTimeDay, defaults to DefaultExecutorBucketTrafficKeepTimeDay (retaining the latest 180 days).
    • readRecordKeepLatestDay: Configured in ExecutorConfig::ReadRecordKeepTimeDay, defaults to DefaultExecutorReadRecordKeepTimeDay (retaining the latest 30 days).
    • DefaultGlobalGCMetaTimeInterval: Configures the time interval for the generation of GC Meta Tasks, set to 10 * 60 (equivalent to 10 minutes).
    • ReadRecordDeleteLimit: Maximum number of records to be deleted from the read record table in each gc meta task. (default 100)
  • Functional Configuration:
    • GCConfig::EnableGCMeta: Controls whether GCMeta is enabled or not.
    • SQLDBConfig::EnableTracePutEvent: Controls whether Trace Put Event is enabled or not.

Core Process:
image

  1. Generation of GCMetaTask:
    Triggered by gcMetaTicker, resulting in the creation of GCMetaTask.
  2. Processing of GCMetaTask:
    Handled in ExecuteModular::HandleGCMetaTask
    • gcMetaBucketTraffic: Deletes entries from BucketTraffic using SpDBImpl::DeleteAllBucketTrafficExpired for expired BucketTrafficTable.
    • gcMetaReadRecord: Deletes entries from ReadRecord using SpDBImpl::DeleteAllReadRecordExpired for expired ReadRecord table.

Implementation of the BucketMigration GC

Purpose:

  1. After successful bucket migration, delete data from the source node.
  2. In case of failed or canceled bucket migration, delete migrated data from the destination node.

New Configurations:

  • GCBucketMigrationTask-related settings and parameters:
    • Two new parameters:
      • gcBucketMigrationTimeout (MinGCBucketMigrationTime 0.5-1 hour): Timeout duration for the task.
      • gcBucketMigrationRetry (MinGCBucketMigrationRetry 3-5 times): Retry attempts for the task.
    • Task Type: GfSpGCBucketMigrationTask.
    • Task scheduling priority: TypeTaskGCZombiePiece. Adjusted priority:
      • TypeTaskGCMeta DefaultSmallerPriority / 4
      • TypeTaskGCBucketMigration DefaultSmallerPriority / 4

Core Workflow:

  • Generation of GCBucketMigrationTask:
    • For successful migration (source node deletes old data):
      1. Destination SP notifies source SP of migration completion: GfSpClient::NotifyPostMigrateBucketGfSpNotifyPostMigrateManageModular::NotifyPostMigrateBucket.
      2. Upon successful migration, ManageModular::GenerateGCBucketMigrationTask generates a GCBucketMigrationTask.
      3. Enqueues the task in ManageModular::gcBucketMigrationQueue.
    • For failed migration (destination node deletes migrated data):
      • BucketMigrateScheduler::PostMigrateBucket generates a GCBucketMigrationTask on the destination node if migration fails.
  • Execution of GCBucketMigrationTask:
    • ExecuteModular::HandleGCBucketMigrationBucket.
    • Iterates through all GVGs of the bucket and retrieves objects within each GVG using the ListObjectsByGVGAndBucketForGC interface.
    • Checks if the location information of each object is valid. If a segment should not be on this SP, it is cleaned.
  • Report Handling of GCBucketMigrationTask:
    • If the task is successful, it is removed from gcBucketMigrationQueue.
    • If the task fails, it is reinserted into the queue. If the retry limit is exceeded, the task is canceled.

Changes

Notable changes:

  • add each change in a bullet point here
  • ...

@jingjunLi jingjunLi added the wip Working in process label Oct 18, 2023
@jingjunLi jingjunLi changed the title feat: gc zombine feat: gc zombine piece Oct 18, 2023
@jingjunLi jingjunLi force-pushed the feat-gc-zombine branch 2 times, most recently from 7f39fe6 to 197925f Compare October 24, 2023 07:13
@jingjunLi jingjunLi force-pushed the feat-gc-zombine branch 5 times, most recently from 298ba56 to 1188bae Compare November 1, 2023 01:51
@jingjunLi jingjunLi changed the title feat: gc zombine piece feat: gc for zombine piece & metaTask & bucket migration Nov 1, 2023
@jingjunLi jingjunLi force-pushed the feat-gc-zombine branch 2 times, most recently from 4ca9246 to 0a63e38 Compare November 2, 2023 01:37
@jingjunLi jingjunLi changed the title feat: gc for zombine piece & metaTask & bucket migration feat: gc for zombie piece & metaTask & bucket migration Nov 2, 2023
@jingjunLi jingjunLi force-pushed the feat-gc-zombine branch 2 times, most recently from 1c05242 to a2e18f0 Compare November 2, 2023 11:17
@jingjunLi jingjunLi force-pushed the feat-gc-zombine branch 3 times, most recently from 4614ebb to 3d180ca Compare November 8, 2023 01:22
@jingjunLi jingjunLi added r4r Ready for review and removed wip Working in process labels Nov 8, 2023
base/gfspconfig/config.go Outdated Show resolved Hide resolved
base/gfspconfig/config.go Outdated Show resolved Hide resolved
deployment/localup/localup.sh Outdated Show resolved Hide resolved
modular/executor/execute_task.go Show resolved Hide resolved
}

func (e *ExecuteModular) HandleGCBucketMigrationBucket(ctx context.Context, task coretask.GCBucketMigrationTask) {
// TODO gc progress persist in db
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO here, which will be solved in the next PR.
In the event of a failure or crash of the BucketMigrationBucket, it will lead to the inability to proceed with data GC tasks. This will be addressed in the future by reusing the state of the bucketMigrateTable to drive the process and record the GC status.

modular/executor/execute_task.go Outdated Show resolved Hide resolved
modular/executor/execute_task.go Outdated Show resolved Hide resolved
modular/manager/manage_task.go Outdated Show resolved Hide resolved
modular/manager/manager.go Outdated Show resolved Hide resolved
base/types/gfsptask/gc.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@sysvm sysvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

base/gfspapp/task_options.go Outdated Show resolved Hide resolved
base/gfspapp/task_options.go Outdated Show resolved Hide resolved
modular/manager/bucket_migrate_scheduler.go Outdated Show resolved Hide resolved
modular/manager/migrate_service.go Outdated Show resolved Hide resolved
modular/executor/execute_gc.go Show resolved Hide resolved
modular/executor/execute_gc.go Show resolved Hide resolved
modular/executor/execute_gc.go Show resolved Hide resolved
modular/executor/execute_gc.go Outdated Show resolved Hide resolved
modular/executor/execute_gc.go Outdated Show resolved Hide resolved
modular/executor/execute_gc.go Show resolved Hide resolved
}
}()

if includePrivate {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't have to distinguish private or public objects here, as SP need gc for all zombie pieces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will remove the handling logic related to includePrivate=false.

@ruojunm
Copy link
Collaborator

ruojunm commented Nov 24, 2023

LGTM

@jingjunLi jingjunLi merged commit 56498b0 into develop Nov 24, 2023
12 checks passed
@jingjunLi jingjunLi deleted the feat-gc-zombine branch November 24, 2023 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r4r Ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants