Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Use Case: Citing very large volume datasets that are too large for current repositories #17
Use Case Title: Citation of very large volume datasets that are too large for current repositories
Goals and Summary
Investigator runs experiments where the main raw data type is high resolution images and videos. Raw data is about 4 TB per experiment. Processed data is still 1 TB in order to provide a dataset that would allow reproducing the results. Current data repositories usually do not offer this much storage, so it is very hard to obtain a citable DOI for such a large dataset. Usually, dataset DOIs are not assigned unless the "trusted" allocating agent has possession of the data resource (so that the DOI will not point to a resource that moves or is changed).
Why is it important and to whom?
Why hasn’t it been solved yet?
If guidelines, best practices, or some sort of solution is found during the workshop, they will be disseminated through the EarthCube Research Coordination Network SEN (Sediment Experimentalist Network), and also shared with the several investigators who have asked me about this for large image/video datasets.
Additional Information and Links