Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix launchers #11

Merged
merged 24 commits into from
May 7, 2021
Merged

Fix launchers #11

merged 24 commits into from
May 7, 2021

Conversation

hategan
Copy link
Collaborator

@hategan hategan commented Mar 30, 2021

This should fix #1. The basic idea if this PR is:

  • use (bash) scripts for all launchers
    • there is common code in the scripts, which is stored separately from individual launchers
    • the individual launchers (.sht) are processed through the C pre-processor to get the final launchers (.sh); there is a "launcher-scripts" target in the Makefile for this
  • there is a new abstract class, script_based_launcher, which is used by all the other (current) launchers (single, multi, mpi)

Proper testing needs to be added for all the knobs, but, for now, I'm committing the infrastructure and only doing some basic local tests.

… and move the process property to the base class.
…ethod existed to close those files, but was not properly called. This commit fixes that.
…mation that is present in the job object can be used (notably the ID for debugging purposes).
…ng the only type of launcher that allows the spec to be implemented)
…cher scripts are not built by #including (using the standard C pre-processor) a common library. The reason for this instead of sourcing is a desire to reduce I/O if many jobs are being launched. The building of the scripts is done with a make target called "launcher-scripts". The makefile now also has two new targets, "install" and "develop" which invoke the setup script while adding a dependency to build the launcher scripts.
Copy link
Collaborator

@andre-merzky andre-merzky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments (more questions really), but nothing suspicious sticks out to me.

@@ -205,14 +205,17 @@ def submit(self, job: Job) -> None:
raise InvalidJobException('Missing specification')

launcher = self._get_launcher(self._get_launcher_name(spec))
args = launcher.get_launch_command(spec)
args = launcher.get_launch_command(job)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the launch command later in this PR don't seem to require a job instance. On what attributes (other than the spec) would you think the LC may depend?

Copy link
Collaborator Author

@hategan hategan Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job id.


p = _ChildProcessEntry(job, self)

try:
with job._status_cv:
if job.status.state == JobState.CANCELED:
raise SubmitException('Job canceled')
if logger.isEnabledFor(logging.DEBUG):
logger.debug('Running {}, out={}, err={}'.format(args, spec.stdout_path,
spec.stderr_path))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test for logger.isEnabledFor('DEBUG') is already performed by the implementation of logger.debug - so why are the logging calls shielded? I am probably missing something...

If it is to avoid early string expansion on the format, that is usually addressed by:

logger.debug('Running %s,  out=%s, err=%s', args, spec.stdout_path, spec.stderr_path)

the logging module will the only expand if the respective log level is enabled. There is probably an option for format like string expansion as you seem to prefer that, but I don't actually know...

Copy link
Collaborator Author

@hategan hategan Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is to avoid early expansion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I should probably use the built-in formatting, as you suggest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the guards in the local executor. Will also remove in other places, but not in this PR.

The remaining arguments to the script are the job executable and arguments.

A simple script library is provided in scripts/lib.sh. Its use is optional and it is intended
to be included in a main launcher script using a standard C preprocessor. It does the following:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused why the C preprocessor is needed - couldn't the same be achieved by this just being a shell script? But I am probably missing something again... You seem to invoke the preprocessor to build the actual scripts during install - but you could also just install lib.sh and source it at runtime? Or are you trying to optimize the sourcing away?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to optimize the sourcing away. This is probably premature optimization.

I am also trying to avoid figuring out where lib.sh is when the parent script is invoked.

There are probably alternative ways to do things here which may be better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to optimize the sourcing away. This is probably premature optimization.

Indeed, I would say it is... :-)

I am also trying to avoid figuring out where lib.sh is when the parent script is invoked.
There are probably alternative ways to do things here which may be better.

Fair point. Just adding this as input:

we install shell scripts along with the python sources, so the python module always finds the script with

script = '%s/script.sh` % os.path.dirname(__file__)

Probably not foolproof, but it served us well so far. Another option is to install into bin: . and source actually search the path, so it will be found via $PATH.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

script = '%s/script.sh` % os.path.dirname(__file__)

That's what's happening. But there's also the step of sourcing lib.sh from another .sh, and there is no file at that stage. It's possible to figure it out using $0, but I opted against that for now.

There's also the soon-to-be-an-issue of submit scripts (e.g., for PBS) and whether launcher scripts would be called from them or inlined into them.

hategan and others added 13 commits March 31, 2021 21:38
Ensure that, when loading plugins, only distinct items in sys.path ar…
Add __repr__ for _ProcessEntry, so that it's informative when logged,…
When job output is redirected, a file is opened for that purpose. A m…
The version lookup code for executors was broken. This commit fixes it.
…mation that is present in the job object can be used (notably the ID for debugging purposes).
…ng the only type of launcher that allows the spec to be implemented)
…cher scripts are not built by #including (using the standard C pre-processor) a common library. The reason for this instead of sourcing is a desire to reduce I/O if many jobs are being launched. The building of the scripts is done with a make target called "launcher-scripts". The makefile now also has two new targets, "install" and "develop" which invoke the setup script while adding a dependency to build the launcher scripts.
@hategan
Copy link
Collaborator Author

hategan commented Apr 29, 2021

@SteVwonder, would you like to take a look at this?

Copy link
Contributor

@SteVwonder SteVwonder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't given it a full trial run, but a review of the code generally LGTM. Seems as though there are some test issues though:

❯ make checks                                                                                                                                                           23:02:06 ()
mypy --config-file=.mypy --strict src tests
Success: no issues found in 22 source files
flake8 src
flake8 tests
tests/test_local_jobs.py:74:1: W391 blank line at end of file
make: *** [stylecheck] Error 1

❯ make tests                                                                                                                                                            23:02:55 ()
PYTHONPATH=/Users/herbein1/Repositories/exaworks/psi-j-python/src python -m pytest
=============================================================================== test session starts ================================================================================
platform darwin -- Python 3.9.4, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /Users/herbein1/Repositories/exaworks/psi-j-python
collected 8 items                                                                                                                                                                  

tests/test_doc_examples.py ..                                                                                                                                                [ 25%]
tests/test_local_jobs.py .F.F..                                                                                                                                              [100%]

===================================================================================== FAILURES =====================================================================================
_____________________________________________________________________________ test_simple_job_redirect _____________________________________________________________________________

    def test_simple_job_redirect() -> None:
        with TemporaryDirectory() as td:
            outp = Path(td, 'stdout.txt')
            job = Job(JobSpec(executable='/bin/echo', arguments=['-n', '_x_'], stdout_path=outp))
            exec = JobExecutor.get_instance('local')
            exec.submit(job)
            job.wait()
            f = outp.open("r")
            contents = f.read()
>           assert contents == '_x_'
E           AssertionError: assert '' == '_x_'
E             - _x_

tests/test_local_jobs.py:23: AssertionError
___________________________________________________________________________________ test_cancel ____________________________________________________________________________________

    def test_cancel() -> None:
        job = Job(JobSpec(executable='/bin/sleep', arguments=['1']))
        exec = JobExecutor.get_instance('local')
        exec.submit(job)
        job.wait(target_states=[JobState.ACTIVE])
        job.cancel()
        status = job.wait()
        assert status is not None
>       assert status.state == JobState.CANCELED
E       assert FAILED == CANCELED
E         Use -v to get the full diff

tests/test_local_jobs.py:47: AssertionError
============================================================================= short test summary info ==============================================================================
FAILED tests/test_local_jobs.py::test_simple_job_redirect - AssertionError: assert '' == '_x_'
FAILED tests/test_local_jobs.py::test_cancel - assert FAILED == CANCELED
=========================================================================== 2 failed, 6 passed in 1.63s ============================================================================
make: *** [tests] Error 1

src/psi/j/launchers/scripts/lib.sh Show resolved Hide resolved
@hategan
Copy link
Collaborator Author

hategan commented Apr 30, 2021

there are some test issues though:

Interesting. I am not getting these. Unfortunately, I do not have a mac close by, so I'm not sure how to approach the issue.

Copy link
Contributor

@SteVwonder SteVwonder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! And we will spin out the Mac issue into a separate issue.

@SteVwonder SteVwonder merged commit 6f6eabb into main May 7, 2021
@SteVwonder SteVwonder deleted the fix_launchers branch May 7, 2021 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement pre/post-launch
3 participants