New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case: Citing very large volume datasets that are too large for current repositories #17

hsu000001 opened this Issue Jan 23, 2015 · 1 comment


None yet
4 participants

hsu000001 commented Jan 23, 2015

Use Case Title: Citation of very large volume datasets that are too large for current repositories

  • Contributors: Leslie Hsu @hsu000001
  • label: citing data

Goals and Summary

Investigator runs experiments where the main raw data type is high resolution images and videos. Raw data is about 4 TB per experiment. Processed data is still 1 TB in order to provide a dataset that would allow reproducing the results. Current data repositories usually do not offer this much storage, so it is very hard to obtain a citable DOI for such a large dataset. Usually, dataset DOIs are not assigned unless the "trusted" allocating agent has possession of the data resource (so that the DOI will not point to a resource that moves or is changed).

Why is it important and to whom?

  • Important to investigators who produce such large datasets but are told by funding agencies to make their data available to the public.
  • Important to data repositories who serve communities that produce such large, long-tail datasets.

Why hasn’t it been solved yet?

  • Storage of large datasets is expensive.

Actionable Outcomes

If guidelines, best practices, or some sort of solution is found during the workshop, they will be disseminated through the EarthCube Research Coordination Network SEN (Sediment Experimentalist Network), and also shared with the several investigators who have asked me about this for large image/video datasets.

Additional Information and Links


This comment has been minimized.

jklump commented Jan 28, 2015

Also, it might be impracticable to move a very large data set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment