Fix launchers #11

hategan · 2021-03-30T21:41:46Z

This should fix #1. The basic idea if this PR is:

use (bash) scripts for all launchers
- there is common code in the scripts, which is stored separately from individual launchers
- the individual launchers (.sht) are processed through the C pre-processor to get the final launchers (.sh); there is a "launcher-scripts" target in the Makefile for this
there is a new abstract class, script_based_launcher, which is used by all the other (current) launchers (single, multi, mpi)

Proper testing needs to be added for all the knobs, but, for now, I'm committing the infrastructure and only doing some basic local tests.

…e used.

… and move the process property to the base class.

…ethod existed to close those files, but was not properly called. This commit fixes that.

…mation that is present in the job object can be used (notably the ID for debugging purposes).

…ng the only type of launcher that allows the spec to be implemented)

…cher scripts are not built by #including (using the standard C pre-processor) a common library. The reason for this instead of sourcing is a desire to reduce I/O if many jobs are being launched. The building of the scripts is done with a make target called "launcher-scripts". The makefile now also has two new targets, "install" and "develop" which invoke the setup script while adding a dependency to build the launcher scripts.

andre-merzky

I have some comments (more questions really), but nothing suspicious sticks out to me.

andre-merzky · 2021-03-31T19:59:17Z

src/psi/j/executors/local.py

@@ -205,14 +205,17 @@ def submit(self, job: Job) -> None:
            raise InvalidJobException('Missing specification')

        launcher = self._get_launcher(self._get_launcher_name(spec))
-        args = launcher.get_launch_command(spec)
+        args = launcher.get_launch_command(job)


The changes to the launch command later in this PR don't seem to require a job instance. On what attributes (other than the spec) would you think the LC may depend?

The job id.

andre-merzky · 2021-03-31T20:02:48Z

src/psi/j/executors/local.py


        p = _ChildProcessEntry(job, self)

        try:
            with job._status_cv:
                if job.status.state == JobState.CANCELED:
                    raise SubmitException('Job canceled')
+            if logger.isEnabledFor(logging.DEBUG):
+                logger.debug('Running {},  out={}, err={}'.format(args, spec.stdout_path,
+                                                                  spec.stderr_path))


The test for logger.isEnabledFor('DEBUG') is already performed by the implementation of logger.debug - so why are the logging calls shielded? I am probably missing something...

If it is to avoid early string expansion on the format, that is usually addressed by:

logger.debug('Running %s, out=%s, err=%s', args, spec.stdout_path, spec.stderr_path)

the logging module will the only expand if the respective log level is enabled. There is probably an option for format like string expansion as you seem to prefer that, but I don't actually know...

Yes, it is to avoid early expansion.

That said, I should probably use the built-in formatting, as you suggest.

Removed the guards in the local executor. Will also remove in other places, but not in this PR.

andre-merzky · 2021-03-31T20:08:56Z

src/psi/j/launchers/script_based_launcher.py

+    The remaining arguments to the script are the job executable and arguments.
+
+    A simple script library is provided in scripts/lib.sh. Its use is optional and it is intended
+    to be included in a main launcher script using a standard C preprocessor. It does the following:


I am a bit confused why the C preprocessor is needed - couldn't the same be achieved by this just being a shell script? But I am probably missing something again... You seem to invoke the preprocessor to build the actual scripts during install - but you could also just install lib.sh and source it at runtime? Or are you trying to optimize the sourcing away?

I'm trying to optimize the sourcing away. This is probably premature optimization.

I am also trying to avoid figuring out where lib.sh is when the parent script is invoked.

There are probably alternative ways to do things here which may be better.

I'm trying to optimize the sourcing away. This is probably premature optimization.

Indeed, I would say it is... :-)

I am also trying to avoid figuring out where lib.sh is when the parent script is invoked.
There are probably alternative ways to do things here which may be better.

Fair point. Just adding this as input:

we install shell scripts along with the python sources, so the python module always finds the script with

script = '%s/script.sh` % os.path.dirname(__file__)

Probably not foolproof, but it served us well so far. Another option is to install into bin: . and source actually search the path, so it will be found via $PATH.

script = '%s/script.sh` % os.path.dirname(__file__)

That's what's happening. But there's also the step of sourcing lib.sh from another .sh, and there is no file at that stage. It's possible to figure it out using $0, but I opted against that for now.

There's also the soon-to-be-an-issue of submit scripts (e.g., for PBS) and whether launcher scripts would be called from them or inlined into them.

…anism.

Ensure that, when loading plugins, only distinct items in sys.path ar…

Add __repr__ for _ProcessEntry, so that it's informative when logged,…

When job output is redirected, a file is opened for that purpose. A m…

The version lookup code for executors was broken. This commit fixes it.

…mation that is present in the job object can be used (notably the ID for debugging purposes).

…ng the only type of launcher that allows the spec to be implemented)

…cher scripts are not built by #including (using the standard C pre-processor) a common library. The reason for this instead of sourcing is a desire to reduce I/O if many jobs are being launched. The building of the scripts is done with a make target called "launcher-scripts". The makefile now also has two new targets, "install" and "develop" which invoke the setup script while adding a dependency to build the launcher scripts.

…anism.

hategan · 2021-04-29T22:43:23Z

@SteVwonder, would you like to take a look at this?

SteVwonder

I haven't given it a full trial run, but a review of the code generally LGTM. Seems as though there are some test issues though:

❯ make checks                                                                                                                                                           23:02:06 ()
mypy --config-file=.mypy --strict src tests
Success: no issues found in 22 source files
flake8 src
flake8 tests
tests/test_local_jobs.py:74:1: W391 blank line at end of file
make: *** [stylecheck] Error 1

❯ make tests                                                                                                                                                            23:02:55 ()
PYTHONPATH=/Users/herbein1/Repositories/exaworks/psi-j-python/src python -m pytest
=============================================================================== test session starts ================================================================================
platform darwin -- Python 3.9.4, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /Users/herbein1/Repositories/exaworks/psi-j-python
collected 8 items                                                                                                                                                                  

tests/test_doc_examples.py ..                                                                                                                                                [ 25%]
tests/test_local_jobs.py .F.F..                                                                                                                                              [100%]

===================================================================================== FAILURES =====================================================================================
_____________________________________________________________________________ test_simple_job_redirect _____________________________________________________________________________

    def test_simple_job_redirect() -> None:
        with TemporaryDirectory() as td:
            outp = Path(td, 'stdout.txt')
            job = Job(JobSpec(executable='/bin/echo', arguments=['-n', '_x_'], stdout_path=outp))
            exec = JobExecutor.get_instance('local')
            exec.submit(job)
            job.wait()
            f = outp.open("r")
            contents = f.read()
>           assert contents == '_x_'
E           AssertionError: assert '' == '_x_'
E             - _x_

tests/test_local_jobs.py:23: AssertionError
___________________________________________________________________________________ test_cancel ____________________________________________________________________________________

    def test_cancel() -> None:
        job = Job(JobSpec(executable='/bin/sleep', arguments=['1']))
        exec = JobExecutor.get_instance('local')
        exec.submit(job)
        job.wait(target_states=[JobState.ACTIVE])
        job.cancel()
        status = job.wait()
        assert status is not None
>       assert status.state == JobState.CANCELED
E       assert FAILED == CANCELED
E         Use -v to get the full diff

tests/test_local_jobs.py:47: AssertionError
============================================================================= short test summary info ==============================================================================
FAILED tests/test_local_jobs.py::test_simple_job_redirect - AssertionError: assert '' == '_x_'
FAILED tests/test_local_jobs.py::test_cancel - assert FAILED == CANCELED
=========================================================================== 2 failed, 6 passed in 1.63s ============================================================================
make: *** [tests] Error 1

src/psi/j/launchers/scripts/lib.sh

… fix_launchers

hategan · 2021-04-30T07:19:31Z

there are some test issues though:

Interesting. I am not getting these. Unfortunately, I do not have a mac close by, so I'm not sure how to approach the issue.

SteVwonder

LGTM! And we will spin out the Mac issue into a separate issue.

hategan added 9 commits March 30, 2021 14:07

Ensure that, when loading plugins, only distinct items in sys.path ar…

c7960b5

…e used.

Add __repr__ for _ProcessEntry, so that it's informative when logged,…

c6a4820

… and move the process property to the base class.

When job output is redirected, a file is opened for that purpose. A m…

0b25970

…ethod existed to close those files, but was not properly called. This commit fixes that.

The version lookup code for executors was broken. This commit fixes it.

2f55e51

Added an option to specify a launcher log in the JobExecutorConfig

dd9505d

Modified launchers to accept a job instead of a jobspec so that infor…

0f8f1f7

…mation that is present in the job object can be used (notably the ID for debugging purposes).

Added a generic script-based launcher (which may very well end up bei…

7f3ba5c

…ng the only type of launcher that allows the spec to be implemented)

Updated local executor to pass job instead of spec to launcher

28e632a

hategan requested review from andre-merzky and SteVwonder March 30, 2021 21:41

andre-merzky approved these changes Mar 31, 2021

View reviewed changes

hategan and others added 13 commits March 31, 2021 21:38

Do not guard debug statements and use the built-in interpolation mech…

d5897d5

…anism.

removed old script

794df11

Merge pull request #7 from ExaWorks/ensure_plugins_loaded_once

0a583b2

Ensure that, when loading plugins, only distinct items in sys.path ar…

Merge pull request #8 from ExaWorks/process_entry_improvements

094f102

Add __repr__ for _ProcessEntry, so that it's informative when logged,…

Merge pull request #9 from ExaWorks/ensure_files_closed_on_process_exit

9e31dfd

When job output is redirected, a file is opened for that purpose. A m…

Merge pull request #10 from ExaWorks/fix_executor_version_lookup

baa8948

The version lookup code for executors was broken. This commit fixes it.

Added an option to specify a launcher log in the JobExecutorConfig

d451538

Modified launchers to accept a job instead of a jobspec so that infor…

adca589

…mation that is present in the job object can be used (notably the ID for debugging purposes).

Added a generic script-based launcher (which may very well end up bei…

eb1da38

…ng the only type of launcher that allows the spec to be implemented)

Updated local executor to pass job instead of spec to launcher

9efe9c2

Do not guard debug statements and use the built-in interpolation mech…

6378949

…anism.

removed old script

3020350

SteVwonder reviewed Apr 30, 2021

View reviewed changes

src/psi/j/launchers/scripts/lib.sh Show resolved Hide resolved

hategan added 2 commits April 29, 2021 23:53

Merge branch 'fix_launchers' of github.com:ExaWorks/psi-j-python into…

73bcd19

… fix_launchers

Removed new line at the end of file

7004cc9

SteVwonder mentioned this pull request Apr 30, 2021

Missing dev requirements #17

Closed

SteVwonder approved these changes May 7, 2021

View reviewed changes

SteVwonder merged commit 6f6eabb into main May 7, 2021

SteVwonder deleted the fix_launchers branch May 7, 2021 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix launchers #11

Fix launchers #11

hategan commented Mar 30, 2021

andre-merzky left a comment

andre-merzky Mar 31, 2021

hategan Mar 31, 2021 •

edited

Loading

andre-merzky Mar 31, 2021

hategan Mar 31, 2021 •

edited

Loading

hategan Mar 31, 2021

hategan Apr 1, 2021

andre-merzky Mar 31, 2021

hategan Mar 31, 2021

andre-merzky Mar 31, 2021

hategan Mar 31, 2021

hategan commented Apr 29, 2021

SteVwonder left a comment

hategan commented Apr 30, 2021

SteVwonder left a comment

Fix launchers #11

Fix launchers #11

Conversation

hategan commented Mar 30, 2021

andre-merzky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hategan Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hategan Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hategan commented Apr 29, 2021

SteVwonder left a comment

Choose a reason for hiding this comment

hategan commented Apr 30, 2021

SteVwonder left a comment

Choose a reason for hiding this comment

hategan Mar 31, 2021 •

edited

Loading

hategan Mar 31, 2021 •

edited

Loading