You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Define new flavor of worker or stage that manages GC (STAGE_GC)
It is easy to define a new pipeline stage, but probably a simple queue with a configurable number of workers would be enough to manage this case.
(stage gc) push not dir entries in SOFT_RM as needed and remove from other tables
(stage gc) push dir entries in gc temp table
Do that mean there would be 2 GC requests at the end of the scan, 1 to select dir and 1 to select non dirs? or are they split by the GC thread you mention below?
(gc thread) at end of scan:
Previous tasks are also run at the end of the scan. Aren't they?
(gc thread) create gc temp table
get list of "old" file ids (serial)
(gc thread) push entries in stage queue
(gc thread) wait for STAGE_GC to be empty
get entries from gc temp table, push to SOFT_RM as needed and remove from other tables
So to summarize, if I correctly understand, this would parallelize the steps of inserting entry in SOFT_RM and dropping from other tables. According to your experience, is it the longest operation? I guess creating the temp table is also a long step...
Would it be difficult to modify GC code to be more parallel ?
workflow would be (might be missing some details)
Define new flavor of worker or stage that manages GC (STAGE_GC)
(stage gc) push not dir entries in SOFT_RM as needed and remove from other tables
(stage gc) push dir entries in gc temp table
(gc thread) at end of scan:
The text was updated successfully, but these errors were encountered: