You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #31 and #43@e-belfer ran into an issue where if we download all the EQR data at once it uses a ton of disk space - the full EQR dataset is 15.5 GB.
The problem, apart from this just being slow, is that GH actions runners only have 14G of disk space, so we'd have to manually partition across multiple runners or try to reduce the disk usage by only keeping a small subset of the data on disk at any one time.
I think we can try to basically lazily load the files from the downloader:
initialize new deposition version
for each resource we have, individually download/checksum/upload to deposition, with some concurrency set at the dataset level
delete files that we didn't see in the above step from the pending deposition
regenerate datapackage
update settings etc.
Also, only 3 users globally can download EQR data at a time.
Scope:
we only run one concurrent EQR download at once
we only store one EQR dataset on disk at once
Next steps:
allow datasets to force aiohttp client pool size
add a method to AbstractDatasetArchiver that spits out a generator of (name, resource) tuples instead of a dict from name to resources
hope this is effectively limited by the session concurrency limits above
in DepositorOrchestrator, diff files by md5 one-by-one instead of all at once
The text was updated successfully, but these errors were encountered:
In #31 and #43 @e-belfer ran into an issue where if we download all the EQR data at once it uses a ton of disk space - the full EQR dataset is 15.5 GB.
The problem, apart from this just being slow, is that GH actions runners only have 14G of disk space, so we'd have to manually partition across multiple runners or try to reduce the disk usage by only keeping a small subset of the data on disk at any one time.
I think we can try to basically lazily load the files from the downloader:
Also, only 3 users globally can download EQR data at a time.
Scope:
Next steps:
The text was updated successfully, but these errors were encountered: