Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Best approach to dealing with institution-shared datasets on a crippled file system #7547

Open
mattcieslak opened this issue Jan 11, 2024 · 0 comments

Comments

@mattcieslak
Copy link
Contributor

Hi! I was hoping to see if anyone in the community has any ideas on a problem we ran into recently.

Background

An HPC has a giant crippled file system with many large open datasets stored on it. These are free to access and it's very inexpensive to store data on this file system too. We'll call this system A.

There is a second file system available that is not crippled but is more expensive to store data on. This one is system B

Both A and B are network-mounted and available to the entire HPC.

Goal

We'd like to use datalad/BABS to process these large datasets without copying their content from A to B. We would do the processing on B and would like to ultimately store the results as a RIA store on A.

Current plan

Our current idea is to create a DataLad dataset on file system B similarly to how the HCP openaccess dataset was created. We would use addurls with the urls being paths to files on system A. Then we's use BABS/fairlybig to process the data on system B. After all the results branches are merged, we'd move the results RIA store to system A.

Question

Does this sound reasonable?

In this setup we'd have data stored only on system A with shasums and remote locations stored in the datalad dataset on B. Is there a way to regularly check that the data on A hasn't been changed?

Has anyone else attempted to use BABS/fairlybig on a crippled file system?

Thanks in advance! Tagging @shreyagudapati9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant