Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GREI 5: HDV Task - Large Data Support #176

Open
4 of 12 tasks
cmbz opened this issue Feb 3, 2024 · 5 comments
Open
4 of 12 tasks

GREI 5: HDV Task - Large Data Support #176

cmbz opened this issue Feb 3, 2024 · 5 comments
Assignees
Labels
Dataverse Project Issues related to Dataverse Project software GREI 5 Use Cases Harvard Dataverse Issues related to Harvard Dataverse Repository Project: NIH GREI Tasks related to the NIH GREI project

Comments

@cmbz
Copy link
Contributor

cmbz commented Feb 3, 2024

Overview

Support the sharing of very large datasets (>TBs) by integrating the metadata in the repository with the data in the research computing storage" (Source: NIH OTA)

Tasks

Issues

Pending

Completed

Resources

@cmbz cmbz added the GREI 5 Use Cases label Feb 3, 2024
@cmbz cmbz self-assigned this Feb 3, 2024
@cmbz cmbz mentioned this issue Feb 3, 2024
8 tasks
@cmbz
Copy link
Contributor Author

cmbz commented Feb 3, 2024

Status: January 2024

Several Globus improvements and bug fixes were made to support large data deposits and integration with research computing services such as the Northeast Storage Exchange (NESE).

Completed

@cmbz
Copy link
Contributor Author

cmbz commented Mar 1, 2024

Status: February 2024

A containerized Dataverse was deployed on Mass Open Cloud (MOC), using Northeast Storage Exchange (NESE) compute and resources to demonstrate how Dataverse can support computing on large Dataverse datasets stored on NESE tape resources. A demo of the proof-of-concept was presented at the Mass Open Cloud Alliance Conference on 2024/02/28. Work has begun to fully operationalize the strategy in Epic: Operationalize Large Data and Compute Infrastructure.

Completed

@cmbz
Copy link
Contributor Author

cmbz commented Mar 28, 2024

Status: April 2024

Large Data Support Working Group

Large Data Support Pilot

@cmbz cmbz added Harvard Dataverse Issues related to Harvard Dataverse Repository Dataverse Project Issues related to Dataverse Project software Project: NIH GREI Tasks related to the NIH GREI project labels May 7, 2024
@cmbz
Copy link
Contributor Author

cmbz commented May 7, 2024

Status: May 2024

  • Followed up on Geos-Chem to inquire about the PDF files in this collection.
  • Tested Globus connection and NESE download of one file from OMAMA, successfully
  • Demo Large data dataset will be created by @landreev so we can test download by non-Harvard affiliates, to help write instructions for data download
  • See other updates and details here: Project: Pilot Large Data Support Service #178 (comment)
  • Added two projects to the new section on "under consideration for support" in Pilot Large Data Issue

@cmbz
Copy link
Contributor Author

cmbz commented May 7, 2024

Status: June 2024

  • TBD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dataverse Project Issues related to Dataverse Project software GREI 5 Use Cases Harvard Dataverse Issues related to Harvard Dataverse Repository Project: NIH GREI Tasks related to the NIH GREI project
Projects
None yet
Development

No branches or pull requests

1 participant