Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine: implement functionality to import completed CalcJobs #5086

Merged
merged 2 commits into from
Sep 13, 2021

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Aug 18, 2021

Fixes #1892
Implementation of this AEP.

Example of usage:

from aiida.plugins import CalculationFactory

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
importer = ArithmeticAddCalculation.get_importer()
remote_data = RemoteData('/some/absolute/path', computer=load_computer('computer'))
inputs = importer.parse_remote_data(remote_data)
results, node = run.get_node(ArithmeticAddCalculation, **inputs)
assert node.is_imported

This is as standardized as it can be as the parse_remote_data for each importer implementation will probably need custom keyword arguments. For the Python API, we could provide a simple wrapping function, like e.g.:

def import_calcjob(cls: CalcJob, remote_data: RemoteData, **kwargs: Any) -> CalcJobNode:
    """"Import the contents of a completed calcjob.

    :param cls: the `CalcJob` class to which the completed calcjob corresponds, e.g. `ArithmeticAddCalculation`.
    :param remote_data: a ``RemoteData`` node that contains the input and output files of the completed job.
    :param kwargs: any additional inputs that should be passed to the importer plugin.
    :return: the node representing the imported calcjob when successful.
    """
    importer = cls.get_importer()
    inputs = importer.parse_remote_data(remote_data, **kwargs)
    inputs['remote_folder'] = remote_data
    results, node = run.get_node(cls, **inputs)
    return node

However, due to the arbitrary keyword arguments that need to be supported for the importer, it will be tricky to turn this into a CLI command. Unless we implement some plugin system for that to dynamically define the options based on the spec of the importer, such as we do for the transport CLI.

I think it would be great if we can try to release this with v2.0 as it is very useful feature, that some users having been waiting a long time for. Of course, this would require that plugin developers have some time to implement the importer, so the sooner we finish this implementation, the sooner we can help them prepare their plugins.

P.S.: this should probably be addressed in the AEP (and I will) but we should maybe discuss the naming of "Immigrator" and "immigrating". I have merely kept this because the original concept was implemented for PwCalculation and was called the PwImmigrant. I am not sure what the reasoning behind this naming was since I wasn't there (@giovannipizzi maybe you know) but maybe "importer/importing" would be better. It sounds more neutral and technically it is also more correct. Immigrating, in my understanding at least, is more something moving as seen from the point of view of the place that it is leaving, not so much the place it is moving towards. That being said, I am not sure if naming this also importing may cause confusion with importing of archives. As said, I will probably add this to the discussion on the AEP, but thought I would mention it here as well.

@giovannipizzi
Copy link
Member

Thanks @sphuber!
I didn't look at the implementation yet, but I think the current interface looks fine, and I like the wrapper (I understand the need of kwargs, I think your suggestion works fine).

I'm not worried about the command line - I think each plugin can decide to create a custom CLI command that explicitly declares which kwargs are needed, and then calls immigrate_calcjob. In any case it's a small wrapper, and it's good to let plugin developers define it: with your implementation, the wrapper will be minimal, and can allow the developer to define custom parameters and perform some validation or logic on the kwargs before passing them to immigrate_calcjob.

Finally: the name was proposed by Eric Hontz, who first designed it for Quantum ESPRESSO and contributed it (for QE) to AiiDA (AiiDA core at the time, then moved into aiida-quantumespresso), back in 2014-2015.
I admit I never liked it, so I would also vote for changing it (ideally to something different from "importing", indeed, to avoid confusion with AiiDA archives, but I'm not sure I have a great idea at the moment. Some random thoughts for the verb that could denote the action: inject, reconstruct, replicate, mirror, mock.
So e.g. get_replicator? get_reconstructor?

@sphuber sphuber force-pushed the feature/1892/calcjob-immigrant branch 2 times, most recently from ef97e1f to 7175a85 Compare August 25, 2021 12:26
@sphuber
Copy link
Contributor Author

sphuber commented Sep 2, 2021

Note that I have updated the AEP to change the naming officially from "immigrator" to "importer". I am waiting with adapting this implementation until we have discussed and approved this decision in order to prevent unnecessary work should it be rejected.

@sphuber sphuber force-pushed the feature/1892/calcjob-immigrant branch 4 times, most recently from 3957d62 to db45fb1 Compare September 9, 2021 08:59
@codecov
Copy link

codecov bot commented Sep 9, 2021

Codecov Report

Merging #5086 (8910074) into develop (cce0e30) will increase coverage by 0.05%.
The diff coverage is 97.03%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5086      +/-   ##
===========================================
+ Coverage    80.84%   80.88%   +0.05%     
===========================================
  Files          534      536       +2     
  Lines        36974    37057      +83     
===========================================
+ Hits         29889    29971      +82     
- Misses        7085     7086       +1     
Flag Coverage Δ
django 75.72% <97.03%> (+0.08%) ⬆️
sqlalchemy 74.82% <97.03%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/engine/__init__.py 100.00% <ø> (ø)
aiida/engine/processes/__init__.py 100.00% <ø> (ø)
aiida/plugins/__init__.py 100.00% <ø> (ø)
aiida/plugins/factories.py 92.19% <88.89%> (+0.53%) ⬆️
aiida/calculations/importers/arithmetic/add.py 95.00% <95.00%> (ø)
aiida/engine/processes/calcjobs/calcjob.py 89.81% <98.19%> (+1.10%) ⬆️
aiida/calculations/arithmetic/add.py 100.00% <100.00%> (ø)
aiida/engine/launch.py 97.73% <100.00%> (ø)
aiida/engine/processes/calcjobs/__init__.py 100.00% <100.00%> (ø)
aiida/engine/processes/calcjobs/importer.py 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cce0e30...8910074. Read the comment docs.

@sphuber sphuber changed the title Engine: implement functionality to immigrate completed CalcJobs Engine: implement functionality to import completed CalcJobs Sep 13, 2021
When people start using AiiDA they typically already have many
calculation jobs completed without the use of AiiDA and they wish to
import these somehow, such that they can be included in the provenance
graph along with the future calculations they will run through AiiDA.

This concept was originally implemented for the `PwCalculation` in the
`aiida-quantumespresso` plugin and worked, but the approach required a
separate `CalcJob` implementation for each existing `CalcJob` class that
one might want to import.

Here we implement a generic mechanism directly in `aiida-core` that will
allow any `CalcJob` implementation to import already completed jobs. The
calculation job is launched just as one would launch a normal one through
AiiDA, except one additional input is passed: a `RemoteData` instance
under the name `remote_folder` that contains the output files of the
completed calculation. The naming is chosen on purpose to be the same as
the `RemoteData` that is normally created by the engine during a normal
calculation job run.

When the engine detects this input, instead of going through the normal
sequence of transport tasks, it simply performs the presubmit and then
goes straight to the "retrieve" step. Here the engine will retrieve the
files from the provided `RemoteData` as if they had just been produced
during an actual run. In this way, the process is executed almost
exactly in the same way as a normal run, except the job itself is not
actually executed.
The `CalcJobImporter` class is added, which defines a single abstract
staticmethod `parse_remote_data`. The idea is that plugins can define an
importer for a `CalcJob` implementation and implement this method. The
method takes a `RemoteData` node that points to a path on the associated
computer that contains the input and output files of a calculation that
has been run outside of AiiDA, but by an executable that is normally run
with this particular `CalcJob`.

The `parse_remote_data` implementation should read the input files found
in the remote data and parse their content into the input nodes that
when used to launch the calculation job, would result in similar input
files. These inputs, including the `RemoteData` as the `remote_folder`
input, can then be used to run an instance of this particular `CalcJob`.
The engine will recognize the `remote_folder` input, signalling an
import job, and instead of running a normal job that creates the input
files on the remote before submitting it to the scheduler, it passes
straight to the retrieve step. This will retrieve the files from the
`RemoteData` as if it would have been created by the job itself. If a
parsers was defined in the inputs, the contents are parsed and the
returned output nodes are attached.

The `CalcJobImporter` can be loaded through its entry point name using
the `CalcJobImporterFactory`, just like the entry points of all other
entry point groups have their associated factory. As a shortcut, the
`CalcJob` class, provides the `get_importer` class method which will
attempt to load a `CalcJobImporter` class with the exact same entry
point. Alternatively, the caller can specify the desired entry point
name should it not correspond to that of the `CalcJob` class.

To test the functionality, a `CalcJobImporter` is implemented for the
`ArithmeticAddCalculation` class.
@sphuber sphuber force-pushed the feature/1892/calcjob-immigrant branch from db45fb1 to 8910074 Compare September 13, 2021 14:53
@sphuber sphuber merged commit 22d4e2e into aiidateam:develop Sep 13, 2021
@sphuber sphuber deleted the feature/1892/calcjob-immigrant branch September 13, 2021 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ImmigrantJobProcess
2 participants