Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Ad-hoc clusters #941

Closed
yadudoc opened this issue May 14, 2019 · 0 comments · Fixed by #1297
Closed

Support for Ad-hoc clusters #941

yadudoc opened this issue May 14, 2019 · 0 comments · Fixed by #1297
Assignees
Milestone

Comments

@yadudoc
Copy link
Member

yadudoc commented May 14, 2019

Ad-hoc clusters are a bunch of machines that can be reached via ssh but otherwise do not have a batch scheduler or container orchestrator. Our approach in general for these situations is to create an executor for each node, with the workers launched via an SSH channel. Creating separate executors means that load-balancing across nodes will not work. This sort of situation is common in clouds and private clusters and the driving use-case is from DESC (@jchiang87).

We could prototype an ad-hoc provider that takes a list channels pointing to each of the available nodes like this :

node_addresses = ['node1', 'node2', 'node3', ...]
config = Config(
    executors=[
        HighThroughputExecutor(
            label="Ad-Hoc",
            worker_debug=True,
            cores_per_worker=1,
            provider=AdHocProvider(
                channels=[SSHChannel(hostname=n) for n in node_addresses],
            ),
        )
    ],
    strategy=None,
)
@yadudoc yadudoc self-assigned this May 14, 2019
@yadudoc yadudoc added this to the Parsl-0.8.0 milestone May 14, 2019
@annawoodard annawoodard modified the milestones: Parsl-0.8.0, Parsl-0.9.0 May 22, 2019
yadudoc added a commit that referenced this issue Oct 9, 2019
* Adding the new ad-hoc provider

* Updating dfk to support executor scaling when multiple channels exist for ad-hoc

* Minor updates

* Adding an ad-hoc test

* Part 1 of fixes to Ben's comments

* Updating docstrings

* Fixing the script dir checks in dflow

* Removing min/max/init blocks as configurables

* Removing redundant options from test

* Removing redundant translate table and fixing spaces

* Minor future proofing

* Removing fstring

* Adding new least_loaded method to determine appropriate channels for manager restart

* Splitting out a helper to handle channel dir creation and other minor cleanups

* convert adhoc cluster test config to use AdHoc provider (#1302)

* convert adhoc cluster test config to use AdHoc provider

* fix flake8

* Removing redundant _roundrobin method

* Fixed two assign vs equality check issues

* Updating kill command with Ben's recommended string

* Removing wording around round-robin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants