Experimental/funcx #712

jlnav · 2021-11-08T21:43:08Z

Addresses #713

Initial attempts to draft an optional funcX interface so workers can easily launch non-persistent user functions on remote resources:

https://funcx.org/

Some notes:

Security is taken care of by funcX. Users must authenticate with Globus when initializing an endpoint.
This user-prompted security makes CI testing currently-impossible?: Allow globus login from command line utility globus/globus-compute#619
funcX endpoints have a limit of 20 function calls every 10 seconds unless batching is used.
Endpoints can only be instantiated on Linux. If using clusters, endpoints can (and should) be configured by the user to launch their functions to compute nodes. Many examples, including for Theta, are available on the funcX docs.
This (currently) isn't an alternative to MPI or local comms, or any Executor. This currently only allows user functions to run on a different resource than libEnsemble's processes.

To try this out locally:

pip install funcx funcx-endpoint
funcx-endpoint configure my-endpoint
(You may be asked to authenticate with Globus. Do so using the funxc-endpoint generated URL)
funcx-endpoint start my-endpoint. A message will confirm startup and print the endpoint's uuid.
Set sim_specs['funcx_endpoint'] to this uuid.
Run like normal: python test_funcx.py --comms local --nworkers 4

Documentation and/or other improvements coming soon?

TODO:

Integration with yaml interface
Test use with executor
~~- [ ] Test cancellations~~ Comm is necessary for manager.poll - so this doesn't work. function cancellation is a work-in-progress with the funcX team: Support for cancelling tasks in HighThroughputExecutor globus/globus-compute#606
Documentation

…user funcs with funcx, pass in endpoint ids

coveralls · 2021-11-08T22:05:03Z

Pull Request Test Coverage Report for Build 1491745447

17 of 35 (48.57%) changed or added relevant lines in 3 files are covered.
18 unchanged lines in 3 files lost coverage.
Overall coverage increased (+40.2%) to 95.163%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
libensemble/worker.py	12	30	40.0%

Files with Coverage Reduction	New Missed Lines	%
libensemble/comms/comms.py	1	95.25%
libensemble/resources/mpi_resources.py	1	88.44%
libensemble/executors/mpi_runner.py	16	83.72%

Totals
Change from base Build 1473635638:	40.2%
Covered Lines:	6316
Relevant Lines:	6575

💛 - Coveralls

…, creates new remote-test user funcs

jmlarson1

I'd remove the changes in the alloc_fs and make it so the manager doesn't error when receiving a None persis_info

…e how worker.py detects of funcx enabled

jmlarson1 · 2021-11-15T02:36:51Z

Thoughts on loss of coverage in openmpi runners?

jlnav · 2021-11-15T15:01:16Z

Thoughts on loss of coverage in openmpi runners?

~~- [ ] still need to fix it by allowing the executor_hworld tests to run with local comms when using openmpi~~

This was never the problem in the first place since the simple routines were only running with local comms anyway! And all the runners are covered in the simple coverage jobs... https://coveralls.io/jobs/90108856/source_files/14509258687 . So still trying to figure out whats up

…-mpi comms

…pps dont collide when helloworld imported

…aling test

jlnav · 2021-11-17T23:45:49Z

I've completely rewritten the funcX test to be a scaling test. Try this out on Theta:

pip install funcx-endpoint
funcx-endpoint configure forces

Configure ~/.funcx/forces/config.py to have the following:

from parsl.addresses import address_by_hostname
from parsl.launchers import AprunLauncher, SimpleLauncher
from parsl.providers import CobaltProvider

from funcx_endpoint.endpoint.utils.config import Config
from funcx_endpoint.executors import HighThroughputExecutor

# fmt: off

# PLEASE UPDATE user_opts BEFORE USE
user_opts = {
    'theta': {
        'worker_init': 'source ~/startup.sh',
        'scheduler_options': '',
        # Specify the account/allocation to which jobs should be charged
        'account': 'CSC250STMS07'
    }
}

config = Config(
    executors=[
        HighThroughputExecutor(
            max_workers_per_node=1,
            worker_debug=False,
            address=address_by_hostname(),
            provider=CobaltProvider(
                queue='debug-flat-quad',
                account=user_opts['theta']['account'],
                #launcher=AprunLauncher(),
                launcher=SimpleLauncher(),

                # string to prepend to #COBALT blocks in the submit
                # script to the scheduler eg: '#COBALT -t 50'
                scheduler_options=user_opts['theta']['scheduler_options'],

                # Command to be run before starting a worker, such as:
                # 'module load Anaconda; source activate funcx_env'.
                worker_init=user_opts['theta']['worker_init'],

                # Scale between 0-1 blocks with 2 nodes per block
                nodes_per_block=1,
                init_blocks=0,
                min_blocks=0,
                max_blocks=1,

                # Hold blocks for 30 minutes
                walltime='00:10:00'
            ),
        )
    ],
)

Then finally on Theta:

funcx-endpoint start forces

On your local machine (or any machine with internet access where you want to run libEnsemble), replace sim_specs['funcx_endpoint'] with the printed uuid. See funcx_forces.yaml. Adjust the other paths in that file as necessary.

Then on your machine where you'll run libEnsemble, run the calling script as usual. As long as the local exit_criteria have not been met, funcX will dynamically submit allocations via cobalt to the specified queue in the above config, then run the sim_f.

Using launcher=SimpleLauncher(), in the above config means the simulator function will be run on the MOM nodes as usual. Replace it with launcher=AprunLauncher(), to experiment with submitting the sim_f directly to the allocated compute nodes instead!

…ssage to manager

shuds13 · 2021-12-03T21:20:30Z

libensemble/manager.py


-        if 'persis_info' in D_recv and len(D_recv['persis_info']):
-            persis_info[w].update(D_recv['persis_info'])
+        if D_recv.get('persis_info') is not None:


Remind me - why not just if D_recv.get('persis_info') instead of 436/437

That's probably valid. The original work was to address how persis_info is handled by the manager if the user didn't specify one but the alloc tried packing it up anyway.

shuds13 · 2021-12-03T21:41:21Z

Wondering if we can avoid some of the duplication of forces stuff. Also looks like there is a binary there (forces.x).

…oves reundant forces material

jlnav · 2021-12-06T23:01:45Z

Wondering if we can avoid some of the duplication of forces stuff. Also looks like there is a binary there (forces.x).

addressed

jlnav added 5 commits November 8, 2021 15:27

initial attempt creating a funcx test, necessary options to register …

5a19356

…user funcs with funcx, pass in endpoint ids

flake8

a0f220f

add location for sims to write their inputs

d879063

tentatively improve import?

9d7d2da

otherwise false...

897e969

jlnav added 3 commits November 9, 2021 13:09

remove test from CI, remove auto endpoint setup, refactoring

765a8d6

additional refactoring

c2f4413

disable test comms, so no running on ci

6a8c481

jlnav requested review from jmlarson1 and shuds13 November 9, 2021 21:40

jlnav added 2 commits November 9, 2021 17:32

fixes empty-persis_info issue in several alloc_fs, adjusts test_funcx…

b479e3e

…, creates new remote-test user funcs

flake8

23ae611

jmlarson1 marked this pull request as ready for review November 11, 2021 17:22

jmlarson1 approved these changes Nov 11, 2021

View reviewed changes

jlnav added 5 commits November 14, 2021 12:13

undo alloc_f changes, will handle persis_info=None in manager

aac9405

refactor condition for checking if persis_info not None and has length

e16933f

hopefully actually fix persis info condition, small test adjustments

2137edd

flake8

47c33e9

add funcx_endpoint parsing to api.py, small formatting changes, chang…

3973dbe

…e how worker.py detects of funcx enabled

jlnav added 8 commits November 15, 2021 10:37

update unit test sample specs to include endpoints

aa125e0

flake8

5980cfd

initial attempt to edit run-tests so open mpi tests can be run on non…

669ecad

…-mpi comms

Merge branch 'develop' into experimental/funcx

bdf0e74

replace pkg_resources instances due to deprecation error on CI

5a26d9e

put helloworld mpi content in if __name__ == __main__, so other mpi a…

6170029

…pps dont collide when helloworld imported

test mpi4py separately?

2c82c45

since the funcx test isnt testable as a regression test, make it a sc…

787f25e

…aling test

jlnav added 4 commits November 17, 2021 14:56

try changing to unique dirs, fix log message, absolute path for simf

968c639

try defining executor and registering app in sim_f

471374f

initial adjustments to readme, print hostname and pwd for debugging

1c79f0d

adjust debugging approach?

dec7bcd

jlnav marked this pull request as draft November 17, 2021 23:35

jlnav added 8 commits November 17, 2021 17:50

flake8

6b275bc

some refactorings

9f658bb

try adding a manager poll

1dcc192

experimenting with handling remote exception - will need to modify me…

537a2bf

…ssage to manager

Merge branch 'develop' into experimental/funcx

3c9d627

small adjustments, first intro of new feature in main reame

026a79d

initial docs for funcX usage

b213da7

add funcX website to latex intro

876ffcd

jlnav changed the title ~~[WIP] Experimental/funcx~~ Experimental/funcx Nov 22, 2021

jlnav marked this pull request as ready for review November 22, 2021 20:20

jlnav requested a review from jmlarson1 November 22, 2021 20:20

jlnav linked an issue Nov 22, 2021 that may be closed by this pull request

Experiment with funcX interoperability #713

Closed

Merge branch 'develop' into experimental/funcx

c9d9b9b

shuds13 reviewed Dec 3, 2021

View reviewed changes

simplifies D_recv('persis_info') checking, removes forces binary, rem…

0d66a45

…oves reundant forces material

shuds13 approved these changes Dec 7, 2021

View reviewed changes

adds funcx diagram to platforms docs

54b0a55

jlnav merged commit 0e3e2cb into develop Dec 9, 2021

jlnav deleted the experimental/funcx branch December 9, 2021 18:34

jlnav mentioned this pull request Dec 9, 2021

Experiment with funcX interoperability #713

Closed

shuds13 mentioned this pull request Mar 15, 2022

Release v0.9.0 #753

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimental/funcx #712

Experimental/funcx #712

Uh oh!

jlnav commented Nov 8, 2021 •

edited

Loading

Uh oh!

coveralls commented Nov 8, 2021 •

edited

Loading

Uh oh!

jmlarson1 left a comment

Uh oh!

jmlarson1 commented Nov 15, 2021

Uh oh!

jlnav commented Nov 15, 2021 •

edited

Loading

Uh oh!

jlnav commented Nov 17, 2021 •

edited

Loading

Uh oh!

shuds13 Dec 3, 2021

Uh oh!

jlnav Dec 6, 2021

Uh oh!

shuds13 commented Dec 3, 2021

Uh oh!

jlnav commented Dec 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Experimental/funcx #712

Experimental/funcx #712

Uh oh!

Conversation

jlnav commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 1491745447

💛 - Coveralls

Uh oh!

jmlarson1 left a comment

Choose a reason for hiding this comment

Uh oh!

jmlarson1 commented Nov 15, 2021

Uh oh!

jlnav commented Nov 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlnav commented Nov 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shuds13 Dec 3, 2021

Choose a reason for hiding this comment

Uh oh!

jlnav Dec 6, 2021

Choose a reason for hiding this comment

Uh oh!

shuds13 commented Dec 3, 2021

Uh oh!

jlnav commented Dec 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jlnav commented Nov 8, 2021 •

edited

Loading

coveralls commented Nov 8, 2021 •

edited

Loading

jlnav commented Nov 15, 2021 •

edited

Loading

jlnav commented Nov 17, 2021 •

edited

Loading