Support for Ad-hoc clusters #941

yadudoc · 2019-05-14T16:06:32Z

Ad-hoc clusters are a bunch of machines that can be reached via ssh but otherwise do not have a batch scheduler or container orchestrator. Our approach in general for these situations is to create an executor for each node, with the workers launched via an SSH channel. Creating separate executors means that load-balancing across nodes will not work. This sort of situation is common in clouds and private clusters and the driving use-case is from DESC (@jchiang87).

We could prototype an ad-hoc provider that takes a list channels pointing to each of the available nodes like this :

node_addresses = ['node1', 'node2', 'node3', ...]
config = Config(
    executors=[
        HighThroughputExecutor(
            label="Ad-Hoc",
            worker_debug=True,
            cores_per_worker=1,
            provider=AdHocProvider(
                channels=[SSHChannel(hostname=n) for n in node_addresses],
            ),
        )
    ],
    strategy=None,
)

* Adding the new ad-hoc provider * Updating dfk to support executor scaling when multiple channels exist for ad-hoc * Minor updates * Adding an ad-hoc test * Part 1 of fixes to Ben's comments * Updating docstrings * Fixing the script dir checks in dflow * Removing min/max/init blocks as configurables * Removing redundant options from test * Removing redundant translate table and fixing spaces * Minor future proofing * Removing fstring * Adding new least_loaded method to determine appropriate channels for manager restart * Splitting out a helper to handle channel dir creation and other minor cleanups * convert adhoc cluster test config to use AdHoc provider (#1302) * convert adhoc cluster test config to use AdHoc provider * fix flake8 * Removing redundant _roundrobin method * Fixed two assign vs equality check issues * Updating kill command with Ben's recommended string * Removing wording around round-robin

yadudoc self-assigned this May 14, 2019

yadudoc added the enhancement label May 14, 2019

yadudoc added this to the Parsl-0.8.0 milestone May 14, 2019

annawoodard modified the milestones: Parsl-0.8.0, Parsl-0.9.0 May 22, 2019

yadudoc mentioned this issue May 29, 2019

Update example configurations in docs with htex #813

Closed

yadudoc mentioned this issue Jul 23, 2019

Split LocalProvider functionality #1157

Closed

yadudoc mentioned this issue Sep 20, 2019

Add ad-hoc provider #941 #1297

Merged

yadudoc closed this as completed in #1297 Oct 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Ad-hoc clusters #941

Support for Ad-hoc clusters #941

yadudoc commented May 14, 2019

Support for Ad-hoc clusters #941

Support for Ad-hoc clusters #941

Comments

yadudoc commented May 14, 2019