-
Notifications
You must be signed in to change notification settings - Fork 166
Utilize BucketObjectEntry in GC logic #8238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,7 +26,7 @@ final _log = Logger('api_export:exported_bucket'); | |
| /// This ensures that we don't delete files we've just created. | ||
| /// It's entirely possible that one process is writing files, while another | ||
| /// process is running garbage collection. | ||
| const _minGarbageAge = Duration(days: 1); | ||
| const _minGarbageAge = Duration(hours: 3); | ||
|
|
||
| /// Interface for [Bucket] containing exported API that is served directly from | ||
| /// Google Cloud Storage. | ||
|
|
@@ -98,40 +98,37 @@ final class ExportedApi { | |
| Future<void> _gcPrefix(String prefix, Set<String> allPackageNames) async { | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Change here simplified a bit, just list all files, extract package name from the object name and if the package doesn't exist, we delete it if it's older than |
||
| _log.info('Garbage collecting "$prefix"'); | ||
|
|
||
| await _listBucket(prefix: prefix + '/api/packages/', delimiter: '/', | ||
| (item) async { | ||
| final String packageName; | ||
| if (item.isObject) { | ||
| assert(!item.name.endsWith('/')); | ||
| packageName = item.name.split('/').last; | ||
| } else { | ||
| assert(item.name.endsWith('/')); | ||
| packageName = item.name.without(suffix: '/').split('/').last; | ||
| } | ||
| if (!allPackageNames.contains(packageName)) { | ||
| final info = await _bucket.info(item.name); | ||
| if (info.updated.isBefore(clock.agoBy(_minGarbageAge))) { | ||
| // Only delete the item if it's older than _minGarbageAge | ||
| // This avoids any races where we delete files we've just created | ||
| await package(packageName).delete(); | ||
| } | ||
| final gcFilesBefore = clock.agoBy(_minGarbageAge); | ||
|
|
||
| final pkgPrefix = '$prefix/api/packages/'; | ||
| await _listBucket(prefix: pkgPrefix, delimiter: '', (entry) async { | ||
| final item = switch (entry) { | ||
| final BucketDirectoryEntry _ => throw AssertionError('unreachable'), | ||
| final BucketObjectEntry item => item, | ||
| }; | ||
| final packageName = item.name.without(prefix: pkgPrefix).split('/').first; | ||
| if (!allPackageNames.contains(packageName) && | ||
| item.updated.isBefore(gcFilesBefore)) { | ||
| // Only delete the item if it's older than _minGarbageAge | ||
| // This avoids any races where we delete files we've just created | ||
| // TODO: Conditionally deletion API from package:gcloud would be better! | ||
| await _bucket.tryDelete(item.name); | ||
| } | ||
| }); | ||
|
|
||
| await _listBucket(prefix: prefix + '/api/archives/', delimiter: '-', | ||
| (item) async { | ||
| if (item.isObject) { | ||
| throw AssertionError('Unknown package archive at ${item.name}'); | ||
| } | ||
| assert(item.name.endsWith('-')); | ||
| final packageName = item.name.without(suffix: '-').split('/').last; | ||
| if (!allPackageNames.contains(packageName)) { | ||
| final info = await _bucket.info(item.name); | ||
| if (info.updated.isBefore(clock.agoBy(_minGarbageAge))) { | ||
| // Only delete the item if it's older than _minGarbageAge | ||
| // This avoids any races where we delete files we've just created | ||
| await package(packageName).delete(); | ||
| } | ||
| final archivePrefix = '$prefix/api/archives/'; | ||
| await _listBucket(prefix: archivePrefix, delimiter: '', (entry) async { | ||
| final item = switch (entry) { | ||
| final BucketDirectoryEntry _ => throw AssertionError('unreachable'), | ||
| final BucketObjectEntry item => item, | ||
| }; | ||
| final packageName = | ||
| item.name.without(prefix: archivePrefix).split('-').first; | ||
| if (!allPackageNames.contains(packageName) && | ||
| item.updated.isBefore(gcFilesBefore)) { | ||
| // Only delete the item if it's older than _minGarbageAge | ||
| // This avoids any races where we delete files we've just created | ||
| await _bucket.tryDelete(item.name); | ||
| } | ||
| }); | ||
| } | ||
|
|
@@ -452,14 +449,14 @@ const _validatedCustomHeader = 'validated'; | |
| /// | ||
| /// This allows us to detect files that are present, but not being updated | ||
| /// anymore. We classify such files as _stray files_ and write alerts to logs. | ||
| const _updateValidatedAfter = Duration(days: 1); | ||
| const _updateValidatedAfter = Duration(days: 3); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Idea being that we don't want to update meta-data all the time. |
||
|
|
||
| /// Duration after which a file that haven't been updated is considered stray! | ||
| /// | ||
| /// We don't delete stray files, because there shouldn't be any, so a stray file | ||
| /// is always indicative of a bug. Nevertheless, we write alerts to logs, so | ||
| /// that these inconsistencies can be detected. | ||
| const _unvalidatedStrayFileAfter = Duration(days: 7); | ||
| const _unvalidatedStrayFileAfter = Duration(days: 12); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's wait a LONG time before we consider something a stray file
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering: will we look at the logs or have alerts that fire within this period?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, after 12 days we'll consider a file stray, and then we'll see it in logs. |
||
|
|
||
| /// Interface for an exported JSON file. | ||
| /// | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be high, if something is published, it'll be at-least 3 hours before we can delete it.
I think that's perhaps somewhat reasonable. But we don't want this to be too high.
However, 3 hour should be more than enough for any eventual consistency issues between datastore and GCS to have been resolved.