Skip to content
This repository has been archived by the owner on Feb 10, 2021. It is now read-only.

Copy user-provided script to local directory #47

Merged
merged 3 commits into from
Nov 17, 2017

Conversation

mrocklin
Copy link
Member

SGE seems to have some issues running scripts that are not in the local directory. I haven't spent much time investigating why this is, but have pushed up a change to copy the script into the current working directory that I hope both effective and innocuous.

Critical feedback welcome

@mrocklin
Copy link
Member Author

Have you seen this error before @jakirkham ?

____________________ test_dont_request_on_many_short_tasks _____________________
loop = <tornado.platform.epoll.EPollIOLoop object at 0x7f656fece9b0>
    def test_dont_request_on_many_short_tasks(loop):
        with SGECluster(scheduler_port=0) as cluster:
            adapt = Adaptive(cluster, interval=50, startup_cost=10)
            with Client(cluster, loop=loop) as client:
                cluster.scheduler.task_duration['slowinc'] = 0.001
                futures = client.map(slowinc, range(1000), delay=0.001)
    
                while not cluster.scheduler.workers:
                    sleep(0.01)
    
                for i in range(20):
                    sleep(0.1)
>                   assert len(cluster.workers) < 2
E                   AssertionError: assert 2 < 2
E                    +  where 2 = len({'15.1': WorkerSpec(job_id='15.1', kwargs={'nativeSpecification': '', 'cpus': 1, 'memory': None, 'memory_fraction': 0...., 'memory': None, 'memory_fraction': 0.5}, stdout='/dask-drmaa/worker.16.1.out', stderr='/dask-drmaa/worker.16.1.err')})
E                    +    where {'15.1': WorkerSpec(job_id='15.1', kwargs={'nativeSpecification': '', 'cpus': 1, 'memory': None, 'memory_fraction': 0...., 'memory': None, 'memory_fraction': 0.5}, stdout='/dask-drmaa/worker.16.1.out', stderr='/dask-drmaa/worker.16.1.err')} = <SGECluster: 2 workers>.workers
dask_drmaa/tests/test_adaptive.py:105: AssertionError

@jakirkham
Copy link
Member

Yep, it seems to be another sporadic issue that has yet to be solved. It's documented in issue ( #41 ). Maybe we can look at it after fixing this issue?

@jakirkham jakirkham merged commit 86aeb1e into dask:master Nov 17, 2017
@jakirkham
Copy link
Member

Thanks @mrocklin!

@azjps
Copy link
Collaborator

azjps commented Nov 21, 2017

Hi @mrocklin, the following line

               script = shutil.copy(script, os.path.curdir)

unfortunately only works for python 3.3+; it returns None for python 2.x.

Also, since this issue apparently only affects SGE, could we add an option to toggle whether to copy to the local directory? The default can be left as True (should copy). Otherwise this can litter the current working directory while the cluster is running, or if an exception is raised in DRMAACluster.__init__. Happy to submit a PR if you'd like.

@mrocklin
Copy link
Member Author

The file should be cleaned up if the Python process exits cleanly. A keyword for this behavior does sound sensible though.

It's a shame about the shutil.copy difference. Presumably we can correct this with os.path.join(os.path.curdir, os.path.split(script)[-1]) or something similar.

A PR for one or both issues would be welcome.

azjps added a commit to azjps/dask-drmaa that referenced this pull request Nov 21, 2017
In dask#47 a work around for a SGE-related issue was introduced,
where the worker script was copied to the current working
directory. The should_copy_script argument allows toggling
this behavior (defaulted to True so that SGE users do not
need to explicitly specify the fix).

Also fixed a minor python2 backwards-compatibility issue,
where shutil.copy() returns None.

Change-Id: I1158eca5701bd63b8939c3d08aef827dc2930afc
@mrocklin mrocklin deleted the self.script branch November 21, 2017 20:16
azjps added a commit to azjps/dask-drmaa that referenced this pull request Nov 22, 2017
In dask#47 a work around for a SGE-related issue was introduced,
where the worker script was copied to the current working
directory. The copy_script argument allows toggling
this behavior (defaulted to True so that SGE users do not
need to explicitly specify the fix).

Also fixed a minor python2 backwards-compatibility issue,
where shutil.copy() returns None.
jakirkham pushed a commit that referenced this pull request Nov 24, 2017
In #47 a work around for a SGE-related issue was introduced,
where the worker script was copied to the current working
directory. The copy_script argument allows toggling
this behavior (defaulted to True so that SGE users do not
need to explicitly specify the fix).

Also fixed a minor python2 backwards-compatibility issue,
where shutil.copy() returns None.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants