-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Colossus] Add pruning service #4845
[Colossus] Add pruning service #4845
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good and will be quite valuable. Left some feedback. Also needs an update from master to fix some merge conflicts.
storage-node/src/commands/server.ts
Outdated
@@ -62,6 +66,16 @@ export default class Server extends ApiCommandBase { | |||
description: 'Interval between synchronizations (in minutes)', | |||
default: 1, | |||
}), | |||
cleanup: flags.boolean({ | |||
char: 's', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char 's' already assigned for sync. Better remove it as I don't know what the behavior is if short flag is used and multiple flags assigned same char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, changed the short flag to c
. WDYT?
storage-node/src/commands/server.ts
Outdated
cleanupInterval: flags.integer({ | ||
char: 'i', | ||
description: 'Interval between periodic cleanup actions (in minutes)', | ||
default: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say this should be much higher like 1h, maybe even longer?
Depends on how much work/time it takes per cleanup run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ideally it should be higher that the sync
interval, I just set it to 1min, basically test the pruning behavior locally. Now I have set it to 6hrs
): Promise<{ dataObjectId: DataObjectId; pinnedCount: DataObjectPinCount } | undefined> { | ||
if (idCache.has(dataObjectId)) { | ||
await lock.acquireAsync() | ||
const id = { dataObjectId, pinnedCount: idCache.get(dataObjectId) as DataObjectPinCount } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type of function doesn't seem to match id
, dataObjectId
is a string where as in the return type it is DataObjectId
(u64)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DataObjectId is also a string, as you can see
type DataObjectId = string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, i didn't spot that type definition.
|
||
const timeoutMs = 60 * 1000 // 1 minute since it's only a HEAD request | ||
const deletionTasks: DeleteLocalFileTask[] = [] | ||
await Promise.all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await Promise.all( | |
await Promise.allSettled( |
To avoid the promise being rejected when HEAD request to one data object fails, Promise.allSettled() is more appropriate.
Also I can image if the number of dataobjects could potentially be very large, doing a massive number of HTTP HEAD requests in parallel might be a bad idea. Perhaps we should do it in batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but I have to change target
option in tsconfig.json
to es2020
as Promise.allSettled
does not seem to be available in es2017
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final small change and this should be ready to test in production :)
And updating from master to resolve merge conflict.
dataObjectId: string | ||
): Promise<{ dataObjectId: DataObjectId; pinnedCount: DataObjectPinCount } | undefined> { | ||
if (idCache.has(dataObjectId)) { | ||
await lock.acquireAsync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the lock be acquired before checking idCache.has()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. However, placing a lock before the if condition can lead to unnecessary blocking if the dataObjectId
is not present in the cache. So I reimplemented by first checking the cache without acquiring the lock. This is a quick operation and if the dataObjectId
is not present, just return without the overhead of acquiring the lock.
5e4be6e
to
f692d15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge this work into a new branch colossus-beta
where we can do final tweaks for a "beta" release.
addresses #4813
This PR adds:
How the pruning of assets work
The
cleanupService
runs the data objects cleanup/pruning workflow. It removes all the locally stored data objects that the operator is no longer obliged to keep, because either the data object has been deleted from the runtime or it has been moved to some other bucket/sPRECONDITIONS:
MAXIMUM_QN_LAGGING_THRESHOLD
blocks, otherwise the cleanup workflow would not be performed in order to avoid pruning assets based on outdated state.MINIMUM_REPLICATION_THRESHOLD