-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Description
Discuss how EFS might be used for inputs/outputs folders
inputs/outputs folders are different from workspace folder:
- registered in the osparc DB (projects, comp_tasks, file_meta_data)
- when an output port has more than one file, it is added to a zip file that is always uploaded to S3, this happens everytime, since there is no way to know to compare the zip files in S3 and local,
- when pulling inputs, the same happen in reverse,
--> if we go away from zipping then we need:
- to find a way to have these inputs, outputs in the projects table for reproducibility of the projects or copying, (this might need discussion with @pcrespov , I am not sure if the JSON definition of inputs/outputs could help there), --> changes in how we handle the projects and comp_tasks table, --> simcore-sdk package, webserver, director-v2, dask-sidecar, and possibly api-server
- the file_meta_data, which could maybe be completely erased, --> refactoring of storage and we dump the whole table, but expect slower times on listing from S3 instead of the DB, (unless accessing directly S3 metadata is fast which I honestly doubt),
- changes in the simcore-sdk package to not zip/unzip and be backward compatible (at least when a zip is found),
- changes in the dask-sidecar service to not zip/unzip and be backward compatible,
On another hand, we could also not list what is inside the folders but just the base folder name.
Nevertheless we currently rely on outputs hashes to know whether a computational service needs to re-compute, this would have to change as well. If some file change inside the folder we need to mark it somehow and this currently is done automatically without accessing S3 but directly via the projects/comp_tasks tables.