Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan and Garbage collection #82

Open
dtcray opened this issue Jul 24, 2017 · 1 comment
Open

Scan and Garbage collection #82

dtcray opened this issue Jul 24, 2017 · 1 comment

Comments

@dtcray
Copy link
Contributor

dtcray commented Jul 24, 2017

Would it be difficult to modify GC code to be more parallel ?
workflow would be (might be missing some details)

  • Define new flavor of worker or stage that manages GC (STAGE_GC)

  • (stage gc) push not dir entries in SOFT_RM as needed and remove from other tables

  • (stage gc) push dir entries in gc temp table

  • (gc thread) at end of scan:

  1. (gc thread) create gc temp table
  2. get list of "old" file ids (serial)
  3. (gc thread) push entries in stage queue
  4. (gc thread) wait for STAGE_GC to be empty
  5. get entries from gc temp table, push to SOFT_RM as needed and remove from other tables
@tl-cea
Copy link
Member

tl-cea commented Sep 5, 2017

Define new flavor of worker or stage that manages GC (STAGE_GC)

It is easy to define a new pipeline stage, but probably a simple queue with a configurable number of workers would be enough to manage this case.

(stage gc) push not dir entries in SOFT_RM as needed and remove from other tables
(stage gc) push dir entries in gc temp table

Do that mean there would be 2 GC requests at the end of the scan, 1 to select dir and 1 to select non dirs? or are they split by the GC thread you mention below?

(gc thread) at end of scan:

Previous tasks are also run at the end of the scan. Aren't they?

(gc thread) create gc temp table
get list of "old" file ids (serial)
(gc thread) push entries in stage queue
(gc thread) wait for STAGE_GC to be empty
get entries from gc temp table, push to SOFT_RM as needed and remove from other tables

So to summarize, if I correctly understand, this would parallelize the steps of inserting entry in SOFT_RM and dropping from other tables. According to your experience, is it the longest operation? I guess creating the temp table is also a long step...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants