-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine depth level for directories #3
Comments
From the Physical Layout design doc:
If we move to 3 levels, we get 256^3 directories == 16,777,216 ~ 16M -- so a big jump. So the main discussion is whether 65K or 16M directories is better for organizing our content. How many files do we have now and expect in the future, and, assuming they are fairly evenly distributed across, how many files per directory would we expect? |
Referencing the ADC Report Y7Q2 - we have approximately 7000 datasets uploaded consisting of ~1,000,000 objects from 2016-2022 totalling to ~75-80TB of data.
The type of dataset we are trying to accommodate better is one like KNB Ofir Levy, which is about 2TB in size and has ~450K files.
|
Context: If we follow the existing growth rate of ~ 12TB and 1 million objects per year, over 10 years that would be 10 million objects and ~ 120TB of data. Assuming an even distribution of objects:
If we use an aggressive estimate of ~60TB and 5 million objects per year per year, over 10 years that would be 50 million objects and ~600TB of data.
Findings & Rationale:
Recommended Directory Levels:
Next Steps:
|
After discussing with the team, we have agreed to proceed with 3 directory levels deep. |
Discuss the implications of having exponentially large dataset submissions over time - and how that will affect our proposed hashstore solution at its current depth.
We may need to increase the depth level to ensure that our solution accounts for future submissions that may exceed what is estimated.
The text was updated successfully, but these errors were encountered: