Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 31 additions & 34 deletions app/lib/package/api_export/exported_api.dart
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ final _log = Logger('api_export:exported_bucket');
/// This ensures that we don't delete files we've just created.
/// It's entirely possible that one process is writing files, while another
/// process is running garbage collection.
const _minGarbageAge = Duration(days: 1);
const _minGarbageAge = Duration(hours: 3);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be high, if something is published, it'll be at-least 3 hours before we can delete it.

I think that's perhaps somewhat reasonable. But we don't want this to be too high.

However, 3 hour should be more than enough for any eventual consistency issues between datastore and GCS to have been resolved.


/// Interface for [Bucket] containing exported API that is served directly from
/// Google Cloud Storage.
Expand Down Expand Up @@ -98,40 +98,37 @@ final class ExportedApi {
Future<void> _gcPrefix(String prefix, Set<String> allPackageNames) async {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change here simplified a bit, just list all files, extract package name from the object name and if the package doesn't exist, we delete it if it's older than _minGarbageAge

_log.info('Garbage collecting "$prefix"');

await _listBucket(prefix: prefix + '/api/packages/', delimiter: '/',
(item) async {
final String packageName;
if (item.isObject) {
assert(!item.name.endsWith('/'));
packageName = item.name.split('/').last;
} else {
assert(item.name.endsWith('/'));
packageName = item.name.without(suffix: '/').split('/').last;
}
if (!allPackageNames.contains(packageName)) {
final info = await _bucket.info(item.name);
if (info.updated.isBefore(clock.agoBy(_minGarbageAge))) {
// Only delete the item if it's older than _minGarbageAge
// This avoids any races where we delete files we've just created
await package(packageName).delete();
}
final gcFilesBefore = clock.agoBy(_minGarbageAge);

final pkgPrefix = '$prefix/api/packages/';
await _listBucket(prefix: pkgPrefix, delimiter: '', (entry) async {
final item = switch (entry) {
final BucketDirectoryEntry _ => throw AssertionError('unreachable'),
final BucketObjectEntry item => item,
};
final packageName = item.name.without(prefix: pkgPrefix).split('/').first;
if (!allPackageNames.contains(packageName) &&
item.updated.isBefore(gcFilesBefore)) {
// Only delete the item if it's older than _minGarbageAge
// This avoids any races where we delete files we've just created
// TODO: Conditionally deletion API from package:gcloud would be better!
await _bucket.tryDelete(item.name);
}
});

await _listBucket(prefix: prefix + '/api/archives/', delimiter: '-',
(item) async {
if (item.isObject) {
throw AssertionError('Unknown package archive at ${item.name}');
}
assert(item.name.endsWith('-'));
final packageName = item.name.without(suffix: '-').split('/').last;
if (!allPackageNames.contains(packageName)) {
final info = await _bucket.info(item.name);
if (info.updated.isBefore(clock.agoBy(_minGarbageAge))) {
// Only delete the item if it's older than _minGarbageAge
// This avoids any races where we delete files we've just created
await package(packageName).delete();
}
final archivePrefix = '$prefix/api/archives/';
await _listBucket(prefix: archivePrefix, delimiter: '', (entry) async {
final item = switch (entry) {
final BucketDirectoryEntry _ => throw AssertionError('unreachable'),
final BucketObjectEntry item => item,
};
final packageName =
item.name.without(prefix: archivePrefix).split('-').first;
if (!allPackageNames.contains(packageName) &&
item.updated.isBefore(gcFilesBefore)) {
// Only delete the item if it's older than _minGarbageAge
// This avoids any races where we delete files we've just created
await _bucket.tryDelete(item.name);
}
});
}
Expand Down Expand Up @@ -452,14 +449,14 @@ const _validatedCustomHeader = 'validated';
///
/// This allows us to detect files that are present, but not being updated
/// anymore. We classify such files as _stray files_ and write alerts to logs.
const _updateValidatedAfter = Duration(days: 1);
const _updateValidatedAfter = Duration(days: 3);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea being that we don't want to update meta-data all the time.


/// Duration after which a file that haven't been updated is considered stray!
///
/// We don't delete stray files, because there shouldn't be any, so a stray file
/// is always indicative of a bug. Nevertheless, we write alerts to logs, so
/// that these inconsistencies can be detected.
const _unvalidatedStrayFileAfter = Duration(days: 7);
const _unvalidatedStrayFileAfter = Duration(days: 12);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait a LONG time before we consider something a stray file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering: will we look at the logs or have alerts that fire within this period?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, after 12 days we'll consider a file stray, and then we'll see it in logs.


/// Interface for an exported JSON file.
///
Expand Down
Loading