Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃憣 Allow copying between different computers with same hostname #6136

Open
mbercx opened this issue Sep 24, 2023 · 1 comment 路 May be fixed by #6196
Open

馃憣 Allow copying between different computers with same hostname #6136

mbercx opened this issue Sep 24, 2023 · 1 comment 路 May be fixed by #6196

Comments

@mbercx
Copy link
Member

mbercx commented Sep 24, 2023

Is your feature request related to a problem? Please describe

In some cases, I want to use two different schedulers on the same remote, e.g. hyperqueue for small jobs that I want to run on partial nodes, but Slurm for bigger jobs where I need multiple nodes and a solid chunk of walltime. Currently, this means I have to set up two computers with different schedulers. However, if one calculation needs to copy/symlink files from a previous one run on a different scheduler (i.e. computer), this currently fails with a NotImplementedError since the execmanager compares the computer UUIDs:

if remote_computer_uuid == computer.uuid:
logger.debug(
f'[submission of calculation {node.pk}] copying {dest_rel_path} '
f'remotely, directly on the machine {computer.label}'
)
try:
transport.copy(remote_abs_path, dest_rel_path)
except FileNotFoundError:
logger.warning(
f'[submission of calculation {node.pk}] Unable to copy remote '
f'resource from {remote_abs_path} to {dest_rel_path}! NOT Stopping but just ignoring!.'
)
except (IOError, OSError):
logger.warning(
f'[submission of calculation {node.pk}] Unable to copy remote '
f'resource from {remote_abs_path} to {dest_rel_path}! Stopping.'
)
raise
else:
raise NotImplementedError(
f'[submission of calculation {node.pk}] Remote copy between two different machines is '
'not implemented yet'
)

Describe the solution you'd like

One solution that I've been running with locally is to compare the hostname of the computers instead, which seemed sensible at first glance. There may be certain cases where this breaks, however?

Describe alternatives you've considered

It's clear that a computer can be used with multiple schedulers. Besides the hyperqueue case, you might want to run e.g. an aiida-shell job directly on the login node. Instead of setting up multiple computers, maybe a computer can be configured with multiple schedulers with one the default and the others can be used by setting an option?

Additional context

Related to #5084

@mbercx
Copy link
Member Author

mbercx commented Nov 29, 2023

Note: This is also important if you e.g. share a work chain with another user and this work chain has files stashed on the remote that need to be copied for a next step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant