Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a StorageIndex for the source storage to reduce LIST calls #547

Merged
merged 4 commits into from
Sep 28, 2023

Conversation

blootsvoets
Copy link
Member

@blootsvoets blootsvoets commented Sep 14, 2023

Added a StorageIndex for the source storage to reduce LIST calls. Addresses part of #543. Before every restructuring or cleaning operation, the index is updated. The StorageIndex can be updated with a configurable sync time to make a full sync, otherwise it just updates directories that have files in them. A separate sync time can be set to also scan empty directories. The first implementation is only a InMemoryStorageIndex. For very large datasets, a file-based index might be needed. During partial updates, it uses the start-after flag in S3 to only list newer files than the last one scanned.

This is tested in radar-k3s-test and gives the following results:
old behaviour: 128 list operations, every time
full scan (once per hour, configurable): 110 list operations
partial update (most frequent): 17 operations
partial update including empty directories (once per 15 minutes, configurable): 97 operations

Before every restructuring or cleaning operation, the index is updated.
The StorageIndex can be updated with a configurable sync time to make a
full sync, otherwise it just updates directories that have files in them.
A separate sync time can be set to also scan empty directories.
The first implementation is only a InMemoryStorageIndex. For
very large datasets, a file-based index might be needed.
Copy link
Member

@Bdegraaf1234 Bdegraaf1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, see comments about interfaces.

I would be interested to go through a bit of the code to get a feel for some of the design/decision making, maybe tomorrow?

@blootsvoets blootsvoets merged commit 05ec3b7 into dev Sep 28, 2023
2 checks passed
@blootsvoets blootsvoets deleted the addStorageIndex branch September 28, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants