Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Management, representation and implementation #281

Closed
yadudoc opened this Issue May 16, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@yadudoc
Copy link
Member

yadudoc commented May 16, 2018

We are moving away from a config definition that is used by the DFK to initialize resources to a model in which the user uses a Config class that takes initialized resources as parameters. This discussion is here: #133. Each executor in this model represents an elastic pool of compute resources. How data assets/methods should map to this model is unclear given only preliminary work has been done on the representation of data management in our current static config based system.

Some terminology

  1. We have the concept of a DataManager(DM) which is an executor private to the DFK. The DM is simply responsible for moving files
  2. An endpoint(EP) is a block of attributes that define the transfer methods and details like filepaths, ports, auth.
  3. An executor represents a collection of resources gathered from a computation site. For a single computation site like NERSC you could have several executors that handle specific queues, node requirements etc.

The DM assumes that it can move files between any pair of EPs. Since the DM is generally hidden from the user one possibility is that it can be a default kwarg like this :
Config(data_manager=DataManager(threads=4), .... )

If the EP definition naturally matches to the notion of an executor, the new class structure and our implementation of the executors should be updated to match that. I do not know if this is the case.

@yadudoc yadudoc added the enhancement label May 16, 2018

@yadudoc yadudoc added this to the Parsl-0.6.0 milestone May 16, 2018

@yadudoc yadudoc self-assigned this May 16, 2018

@annawoodard

This comment has been minimized.

Copy link
Collaborator

annawoodard commented May 16, 2018

So to summarize, I think the question is: is the current model (one EP / executor) sufficient, or do we need to tweak it?

@danielskatz @lukaszlacinski @kylechard Does this match your take on yesterday's Slack discussion? If not, can you flesh out your concerns here and we'll go from there?

@danielskatz

This comment has been minimized.

Copy link
Collaborator

danielskatz commented May 16, 2018

where's the picture?

@annawoodard

This comment has been minimized.

Copy link
Collaborator

annawoodard commented May 31, 2018

I think we should support multiple schemes for accessing the same physical location on disk. In that way the user can specify fallback methods for moving data, in case the primary method fails. This could also easily provide rudimentary load-balancing (have a transfer scheme chosen at random).

@annawoodard

This comment has been minimized.

Copy link
Collaborator

annawoodard commented Jun 5, 2018

I am copying and pasting this from the current DataManager docstring. We shouldn't lose it, but it is not describing what is currently implemented, so I do not think it belongs in the docstring.

In general a site where remote file is going to be staged in is unknown until
the Executer submits an app that depends on the file. However, in most practical
cases, a site where an app is executed and a file needs to be staged in is
known. Such cases should be detected by DataManager to optimize file
transfers. Possible cases are:
1. Config defines one site only.
2. Config defines several sites but all of the sites share the filesystem /
use the same Globus endpoint.
3. Config defines several sites with different Globus endpoints but a user
specified explicitely that apps must be executed on a particular site.

@yadudoc

This comment has been minimized.

Copy link
Member Author

yadudoc commented Jul 11, 2018

I believe this is safe to close with the assumption that the current implementation is being documented for 0.6.0.

@yadudoc yadudoc closed this Jul 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.