New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Globus transfer? #71

Closed
dlebauer opened this Issue Jan 19, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@dlebauer

dlebauer commented Jan 19, 2018

It would be great to have support for Globus transfer within Parsl. We are interested in, e.g., transferring 10s-1000sGB from a server at NPCF to an XSEDE resource such as Comet or Bridges for (re)processing.

We aren't currently using Parsl but the abstraction of queue managers is an attractive feature. Globus transfer would be another particularly useful feature, that we will need to implement somehow. The docs say Globus support will appear "in the near future" and I see it listed in the roadmap. However, I can't find an open issue so I am opening one.

My questions are:

  1. What is the status of this feature?
  2. How might it be implemented?
@kylechard

This comment has been minimized.

Collaborator

kylechard commented Jan 19, 2018

We have started working on Globus integration but don't yet have a timeline for when it will be released.

The basic model is built around a Globus file abstraction that will allow scripts to be written with references to remote (Globus accessible data). Parsl will handle the staging of data to the endpoint/worker node based on the site configuration. As part of this work we're adding Globus Auth support to authenticate through the Parsl script.

Do you have any thoughts or requirements for how you'd like Globus to be integrated? We could also put together a proof of concept by using Parsl + the Globus SDK to get a better idea of how this could/should work.

@dlebauer

This comment has been minimized.

dlebauer commented Jan 19, 2018

We have a meeting Feb 2 with your group that @danielskatz set up. We can discuss then, but I wanted to get the discussion started so that we can make the most of that time. For requirements, I'll defer to the expert, @max-zilla, but here is a start:

  • we want to be able to transfer data, run scripts, then return output data via globus. There is also a lot of communication with APIs.
  • between July-Nov we would like to process about a PB of data this way.
  • we also want users to be able to execute and develop pipelines on XSEDE and other computers using our datasets.
  • the file abstraction sounds nice. From the user perspective, is it similar to how THREDDS works, where I can just replace a file path with a url?

@yadudoc yadudoc added this to the Parsl-0.5.0 milestone Feb 2, 2018

@yadudoc yadudoc self-assigned this Feb 2, 2018

yadudoc added a commit that referenced this issue Mar 24, 2018

@yadudoc

This comment has been minimized.

Contributor

yadudoc commented Mar 27, 2018

@dlebauer, We have globus support going into the upcoming release. Please note that this feature is still in an experimental state, and we'll be pushing refinements to the File model quickly in minor releases.
If your model is to transport data in -> compute workflows -> transport data out, the current explicit staging model should be sufficient.

@yadudoc

This comment has been minimized.

Contributor

yadudoc commented Mar 30, 2018

Explicit staging support is ready for v0.5.0, but implicit staging is deferred for 0.6.0

@yadudoc yadudoc modified the milestones: Parsl-0.5.0, Parsl-0.6.0 Mar 30, 2018

@yadudoc yadudoc assigned lukaszlacinski and unassigned yadudoc Apr 23, 2018

@yadudoc

This comment has been minimized.

Contributor

yadudoc commented Jul 11, 2018

Explicit staging is now supported in master along with Globus support.

@yadudoc yadudoc closed this Jul 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment