-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix launchers #11
Fix launchers #11
Conversation
… and move the process property to the base class.
…ethod existed to close those files, but was not properly called. This commit fixes that.
…mation that is present in the job object can be used (notably the ID for debugging purposes).
…ng the only type of launcher that allows the spec to be implemented)
…cher scripts are not built by #including (using the standard C pre-processor) a common library. The reason for this instead of sourcing is a desire to reduce I/O if many jobs are being launched. The building of the scripts is done with a make target called "launcher-scripts". The makefile now also has two new targets, "install" and "develop" which invoke the setup script while adding a dependency to build the launcher scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some comments (more questions really), but nothing suspicious sticks out to me.
@@ -205,14 +205,17 @@ def submit(self, job: Job) -> None: | |||
raise InvalidJobException('Missing specification') | |||
|
|||
launcher = self._get_launcher(self._get_launcher_name(spec)) | |||
args = launcher.get_launch_command(spec) | |||
args = launcher.get_launch_command(job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the launch command later in this PR don't seem to require a job instance. On what attributes (other than the spec) would you think the LC may depend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job id.
src/psi/j/executors/local.py
Outdated
|
||
p = _ChildProcessEntry(job, self) | ||
|
||
try: | ||
with job._status_cv: | ||
if job.status.state == JobState.CANCELED: | ||
raise SubmitException('Job canceled') | ||
if logger.isEnabledFor(logging.DEBUG): | ||
logger.debug('Running {}, out={}, err={}'.format(args, spec.stdout_path, | ||
spec.stderr_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test for logger.isEnabledFor('DEBUG')
is already performed by the implementation of logger.debug
- so why are the logging calls shielded? I am probably missing something...
If it is to avoid early string expansion on the format
, that is usually addressed by:
logger.debug('Running %s, out=%s, err=%s', args, spec.stdout_path, spec.stderr_path)
the logging module will the only expand if the respective log level is enabled. There is probably an option for format
like string expansion as you seem to prefer that, but I don't actually know...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is to avoid early expansion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I should probably use the built-in formatting, as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the guards in the local executor. Will also remove in other places, but not in this PR.
The remaining arguments to the script are the job executable and arguments. | ||
|
||
A simple script library is provided in scripts/lib.sh. Its use is optional and it is intended | ||
to be included in a main launcher script using a standard C preprocessor. It does the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused why the C preprocessor is needed - couldn't the same be achieved by this just being a shell script? But I am probably missing something again... You seem to invoke the preprocessor to build the actual scripts during install - but you could also just install lib.sh
and source it at runtime? Or are you trying to optimize the sourcing away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to optimize the sourcing away. This is probably premature optimization.
I am also trying to avoid figuring out where lib.sh is when the parent script is invoked.
There are probably alternative ways to do things here which may be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to optimize the sourcing away. This is probably premature optimization.
Indeed, I would say it is... :-)
I am also trying to avoid figuring out where lib.sh is when the parent script is invoked.
There are probably alternative ways to do things here which may be better.
Fair point. Just adding this as input:
we install shell scripts along with the python sources, so the python module always finds the script with
script = '%s/script.sh` % os.path.dirname(__file__)
Probably not foolproof, but it served us well so far. Another option is to install into bin
: .
and source
actually search the path, so it will be found via $PATH
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
script = '%s/script.sh` % os.path.dirname(__file__)
That's what's happening. But there's also the step of sourcing lib.sh from another .sh, and there is no file at that stage. It's possible to figure it out using $0, but I opted against that for now.
There's also the soon-to-be-an-issue of submit scripts (e.g., for PBS) and whether launcher scripts would be called from them or inlined into them.
Ensure that, when loading plugins, only distinct items in sys.path ar…
Add __repr__ for _ProcessEntry, so that it's informative when logged,…
When job output is redirected, a file is opened for that purpose. A m…
The version lookup code for executors was broken. This commit fixes it.
…mation that is present in the job object can be used (notably the ID for debugging purposes).
…ng the only type of launcher that allows the spec to be implemented)
…cher scripts are not built by #including (using the standard C pre-processor) a common library. The reason for this instead of sourcing is a desire to reduce I/O if many jobs are being launched. The building of the scripts is done with a make target called "launcher-scripts". The makefile now also has two new targets, "install" and "develop" which invoke the setup script while adding a dependency to build the launcher scripts.
@SteVwonder, would you like to take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't given it a full trial run, but a review of the code generally LGTM. Seems as though there are some test issues though:
❯ make checks 23:02:06 ()
mypy --config-file=.mypy --strict src tests
Success: no issues found in 22 source files
flake8 src
flake8 tests
tests/test_local_jobs.py:74:1: W391 blank line at end of file
make: *** [stylecheck] Error 1
❯ make tests 23:02:55 ()
PYTHONPATH=/Users/herbein1/Repositories/exaworks/psi-j-python/src python -m pytest
=============================================================================== test session starts ================================================================================
platform darwin -- Python 3.9.4, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /Users/herbein1/Repositories/exaworks/psi-j-python
collected 8 items
tests/test_doc_examples.py .. [ 25%]
tests/test_local_jobs.py .F.F.. [100%]
===================================================================================== FAILURES =====================================================================================
_____________________________________________________________________________ test_simple_job_redirect _____________________________________________________________________________
def test_simple_job_redirect() -> None:
with TemporaryDirectory() as td:
outp = Path(td, 'stdout.txt')
job = Job(JobSpec(executable='/bin/echo', arguments=['-n', '_x_'], stdout_path=outp))
exec = JobExecutor.get_instance('local')
exec.submit(job)
job.wait()
f = outp.open("r")
contents = f.read()
> assert contents == '_x_'
E AssertionError: assert '' == '_x_'
E - _x_
tests/test_local_jobs.py:23: AssertionError
___________________________________________________________________________________ test_cancel ____________________________________________________________________________________
def test_cancel() -> None:
job = Job(JobSpec(executable='/bin/sleep', arguments=['1']))
exec = JobExecutor.get_instance('local')
exec.submit(job)
job.wait(target_states=[JobState.ACTIVE])
job.cancel()
status = job.wait()
assert status is not None
> assert status.state == JobState.CANCELED
E assert FAILED == CANCELED
E Use -v to get the full diff
tests/test_local_jobs.py:47: AssertionError
============================================================================= short test summary info ==============================================================================
FAILED tests/test_local_jobs.py::test_simple_job_redirect - AssertionError: assert '' == '_x_'
FAILED tests/test_local_jobs.py::test_cancel - assert FAILED == CANCELED
=========================================================================== 2 failed, 6 passed in 1.63s ============================================================================
make: *** [tests] Error 1
Interesting. I am not getting these. Unfortunately, I do not have a mac close by, so I'm not sure how to approach the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! And we will spin out the Mac issue into a separate issue.
This should fix #1. The basic idea if this PR is:
Proper testing needs to be added for all the knobs, but, for now, I'm committing the infrastructure and only doing some basic local tests.