Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Locality Module (DLM) support (an alternative to LOCKSS?) #3403

Closed
pameyer opened this issue Oct 7, 2016 · 8 comments
Closed

Data Locality Module (DLM) support (an alternative to LOCKSS?) #3403

pameyer opened this issue Oct 7, 2016 · 8 comments
Labels
Feature: File Upload & Handling User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh Vote to Close: pdurbin

Comments

@pameyer
Copy link
Contributor

pameyer commented Oct 7, 2016

Overall issue for supporting Data Locality Module (DLM) within Dataverse. Why would a user want to install/configure a DLM? - replicating datasets to remote storage sites (preservation, facilitating remote access, facilitating local access at the remote sites, data close to compute resources).

Similar to DCM this will be another separate component coupled to Dataverse application communicating with HTTP API calls and sharing a filesystem. This will most likely need more changes on the Dataverse end, since Dataverse's "model of the world" will need to be expanded ("remote sites" and "storage locations for datasets"), and there will be more administrator and user level interactions.

We should be getting this more specified next week.

@pdurbin pdurbin added the SBGrid label Oct 7, 2016
@pdurbin pdurbin changed the title DLM support Data Locality Module (DLM) support (an alternative to LOCKSS?) Oct 23, 2016
@pdurbin
Copy link
Member

pdurbin commented Oct 30, 2016

On Friday @joehand presented Dat ( http://dat-data.com ) to me, @djbrooke @scolapasta @landreev @sekmiller @kcondon and @bsilverstein . Joe knows all about LOCKSS and explained the vision for using Dat to replicate data to various data centers. I showed Joe https://data.sbgrid.org/dataset/1/ (the system we are migrating to Dataverse) and how there are rsync URLs to sites in the US, Sweden, Uruguay, and China:

sbdg_1_-_2016-10-30_08 16 21

Joe's reaction was that you could replace all those rsync links with a single Dat link and have the data in multiple data centers. While this is interesting, it's not quite what we have in mind. The technology we plan to use to replicate data from one site to others is Globus/GridFTP. (I still owe @pameyer feedback on his DLM write up at https://docs.google.com/document/d/1VCblZjSnC71MuX78GBKDuPk0HyweswNkJxbzinMq2Kw/edit?ts=57f5026a ). The next logical step in my mind is to open source the DLM code so developers like Joe can see how it works.

Note that #3249 is the issue tracking what the end users sees on a dataset page in terms how how to download the files (such as rsync). There's a big difference between pushing data around between data centers and the download mechanisms available to end users for ultimately downloading the data from their data center of choice (probably the one geographically closest to them). Anyway, great presentation by Joe. Dat is a very interesting and promising new technology.

@eugene-barsky
Copy link

Maybe this would of interest to @axfelix over at Compute Canada

@axfelix
Copy link

axfelix commented Nov 7, 2016

Yup, we've done some work with Globus, would be nice if Dataverse somehow implemented the Globus APIs in a way that facilitated easy transfer of data from a Dataverse instance to a Globus Endpoint.

@eugene-barsky
Copy link

Similar to work Dataverse did with Open Science Framework...

E.

On Mon, Nov 7, 2016 at 8:44 AM, axfelix notifications@github.com wrote:

Yup, we've done some work with Globus, would be nice if Dataverse somehow
implemented the Globus APIs in a way that facilitated easy transfer of data
from a Dataverse instance to a Globus Endpoint.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#3403 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKYX-DMKo0kZTOTHWAoPtjW41gSYJPBXks5q71VkgaJpZM4KRWVE
.

@pameyer
Copy link
Contributor Author

pameyer commented Nov 7, 2016

@axfelix - Globus is the first protocol we're working on. This is more 'move dataset from a Globus Endpoint in the same installation as Dataverse to another Endpoint' than 'Dataverse with no endpoint into Globus'.

I'm not fully up to speed on the Open Science Framework, but I imagine @pdurbin could bring me up to speed if necessary.

@pdurbin pdurbin added the User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh label Jul 4, 2017
@pdurbin
Copy link
Member

pdurbin commented Jan 11, 2018

Related: #4396

@pdurbin
Copy link
Member

pdurbin commented Oct 24, 2018

@pameyer I think visuals help so I'm attaching slide 7 from the slides you used during the Biomedical Dataverse: Structural Biology and Beyond talk over the summer during the 2018 Dataverse Community Meeting:

screen shot 2018-10-24 at 12 20 02 pm

@pdurbin
Copy link
Member

pdurbin commented Sep 30, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: File Upload & Handling User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh Vote to Close: pdurbin
Projects
None yet
Development

No branches or pull requests

5 participants