Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine: recover behaviour of upload_calculation #6447

Merged
merged 6 commits into from
Jun 4, 2024

Conversation

mbercx
Copy link
Member

@mbercx mbercx commented Jun 4, 2024

This is a replacement PR for #6348 that contains a set of commits to recover the behaviour of upload_calculation the latest release (v2.5.1).

Copy link

codecov bot commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 77.86%. Comparing base (ef60b66) to head (c49b963).
Report is 24 commits behind head on main.

Files Patch % Lines
src/aiida/engine/daemon/execmanager.py 90.48% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6447      +/-   ##
==========================================
+ Coverage   77.51%   77.86%   +0.36%     
==========================================
  Files         560      562       +2     
  Lines       41444    41794     +350     
==========================================
+ Hits        32120    32539     +419     
+ Misses       9324     9255      -69     
Flag Coverage Δ
presto 73.19% <92.60%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mbercx mbercx requested a review from sphuber June 4, 2024 10:24
@mbercx
Copy link
Member Author

mbercx commented Jun 4, 2024

@sphuber I could still reorder or merge the commits a bit, but I tried to separate them in order to keep them more atomic and clear.

I forgot to add you as a co-author for the engine fix commit, apologies! Will fix that during review.

@sphuber
Copy link
Contributor

sphuber commented Jun 4, 2024

@sphuber I could still reorder or merge the commits a bit, but I tried to separate them in order to keep them more atomic and clear.

Think they look fine (minus the emojis of course ;) ). I would maybe just merge the second and third, since they are both dealing with centralizing the serialize_file_hierarchy fixture. Think it is fine and perhaps better even to have them in a single commit.

I forgot to add you as a co-author for the engine fix commit, apologies! Will fix that during review.

S'all good man

@mbercx
Copy link
Member Author

mbercx commented Jun 4, 2024

Think they look fine (minus the emojis of course ;) )

Oops, did I add emojis? I didn't even notice 😇

I would maybe just merge the second and third, since they are both dealing with centralizing the serialize_file_hierarchy fixture. Think it is fine and perhaps better even to have them in a single commit.

Makes sense, will do! Do I go ahead and do it now or wait for your first round of review?

@sphuber
Copy link
Contributor

sphuber commented Jun 4, 2024

Makes sense, will do! Do I go ahead and do it now or wait for your first round of review?

Going through it now. Might as well hold on a bit

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @mbercx . Fantastic piece of work. Since we have already gone through the most important changes in the upload_calculation implementation in detail, they should be good to go. I had a look at the other changes and they also look good. The only minor nit would be that I found the naming of the overwrite parametrized argument in test_put_and_get_overwrite a bit confusing. I would expect it to be used to actually define overwrite in the transport calls. But it is not critical to change it.

I am marking this as "changes requested" but that is really just to merge the second and third commit.

@mbercx
Copy link
Member Author

mbercx commented Jun 4, 2024

@sphuber hierarchy commits squashed and nit picked! Let's I didn't mess up and the tests still pass ;)

@mbercx mbercx requested a review from sphuber June 4, 2024 11:50
@mbercx mbercx force-pushed the test/upload-calc-order-fix branch from 557bfbe to 2446ede Compare June 4, 2024 15:16
mbercx and others added 6 commits June 4, 2024 22:22
The transport plugin tests in `tests/transports/test_all_plugins.py` were still
using classes to sort the tests. This comes from a historical use of `unittest`, that
required defining test classes instead of simple functions.

The classes added extra complexity (that could lead to test failures) for little
benefit. Here we remove the classes in favour of simple test functions.
The `serialize_file_hierarchy` function is a tool that is used in testing both the
repository and `execmanager`, and could be used in testing other parts of the code base
as well. Here we move the function to a fixture in the root `conftest.py` so it can be
accessed in every test file, and remove the two functions.

Similarly, we remove the `create_file_hierarchy` function from the `execmanager` tests
and create a new fixture that can create hierarchies both in a target `Path` and
`Folder`.
The `node_and_calc_info` fixture is currently limited to only local transports. Since
this fixture is used to test various functionalities of the `execmanager`, we
parametrize this fixture to also test the `core.ssh` transport and adapt the tests
that use it accordingly.
Currently, trying to overwrite an already existing folder in the
destination directory/localpath with `puttree`/`gettree` will raise an exception because
the folder already exists. This behaviour is counter-intuitive when the `overwrite`
argument is set to `True`, which is the default.

Here we make the following changes:

* `LocalTransport`: add the `dirs_exist_ok` input argument to the `shutil.copytree`
  commands for both `puttree` and `gettree`, setting it to the `overwrite` input.
* `SshTransport`: use the `makedirs` command instead of `mkdir` for both `puttree` and
  `gettree`, setting its `exist_ok` input argument equal to the `overwrite` input.
…es not exist

In 101a8d6 the engine was adapted to simply log
a warning in case a `FileNotFoundError` was raised by a remove copy operation. The logic
is that in case the source file does not exist this operation will fail every time.
Hence, it is not useful to trigger the exponential backoff mechanism and have this
operation fail multiple times.

However, when the parent folder of the target doesn't exist the remote copy command will
fail as well. In this case, the returned error was still an `OSError`, which is not
caught by the engine and logged as a warning. To obtain similar behaviour for these
two failures, we now parse the `stderr` returned by the `exec_command_wait_bytes`
method. In case it refers to a file or directory not being found, we raise a
`FileNoteFoundError` instead of a generic `OSError`.

Since this also deals with the case of a missing remote source file, we remove the check
in the `SshTransport.copy()` method.
In 6898ff4 the implementation of the processing of the
`local_copy_list` in the `upload_calculation` method was changed. Originally, the files
specified by the `local_copy_list` were first copied into the `SandboxFolder` before
copying its contents to the working directory using the transport. The commit allowed
the order in which the local and sandbox files were copied, so the local files were now
no longer copied through the sandbox. Rather, they were copied to a temporary directory
on disk, which was then copied over using the transport.

The problem is that this changed the behaviour of `upload_calculation` in several ways:

1. if the sandbox contained the directory `folder` and the `local_copy_list` contained
   the items `(_, 'file', './folder/file')` this would work just fine in the original
   implementation as the `file` would be written to the `folder` on the remote folder.
   The current implementation, however, would write the file content to `folder/file` in
   a local temporary directory, and then iterate over the directories and copy them
   over. Essentially it would be doing:

       Transport.put('/tmpdir/folder', 'folder')

   but since `folder` would already exist on the remote working directory the local
   folder would be _nested_ and so the final path on the remote would be
   `/workingdir/folder/folder/file`.

2. if the sandbox contained the directory `folder` and the `local_copy_list` tries to
   copy a file to that folder via e.g. `(_, 'file', 'folder')`, this would raise an
   `IsADirectoryError` in the original implementation. Instead, the current
   implementation creates a file named `folder` with the contents of `file` and then
   copies this into the `folder` directory.

3. if the sandbox contained the directory `folder` and the `local_copy_list` tries to
   copy a local folder in a `FolderData` to that directory via e.g.
   `(_, 'folder', 'folder')`, this would simply copy the _contents_ of that directory
   to `folder` in the original implementation. The current implementation creates a
   nested `folder/folder` hierarchy.

Although some of the new behaviour might be more intuitive and correspond closer to
that of `remote_copy_list` (e.g. [3] above), changing it would be a
backwards-incompatible change that could break existing plugins. Here, we adapt the
`_copy_local_files` function to replicate the original behaviour of
`upload_calculation`:

1. When copying a local directory, the contents of the temporary directory are copied
   to the remote using the `Transport.put` interface with `overwrite` set to `True`.
2. When copying a local directory, the parent directory is created on the remote,
   after which the single file is copied from the temporary directory to the remote.
3. In case the source type is `FileType.FILE` and the target is a directory, an
   `IsADirectoryError` is raised.
4. In case the source filename is specified and it is a directory that already exists
   on the remote, we avoid nested directories in the target path by setting the target
   filename to '.'. This means the contents of the node will be copied in the top level
   of the temporary directory, whose contents are then copied into the target directory.

To verify that the behaviour is unchanged versus the original implementation, we add a
large number of tests for various use cases of `upload_calculation` that have been run
successfully against the latest release tag (v2.5.1).

Co-Authored-By: Sebastiaan Huber <mail@sphuber.net>
@sphuber sphuber force-pushed the test/upload-calc-order-fix branch from 2446ede to c49b963 Compare June 4, 2024 20:22
@sphuber sphuber merged commit 7de136c into aiidateam:main Jun 4, 2024
16 checks passed
@sphuber
Copy link
Contributor

sphuber commented Jun 4, 2024

Fan-fucking-tastic @mbercx thanks a lot

@mbercx mbercx deleted the test/upload-calc-order-fix branch June 5, 2024 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants