Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing, caching and fast-forwarding #652

Merged
merged 122 commits into from
Feb 9, 2018

Conversation

greschd
Copy link
Member

@greschd greschd commented Aug 17, 2017

Fixes #119.

Edit / Note: The description below is somewhat out of date, as it describes the initial state of the PR. For a current description, either follow the changes proposed / discussed subsequently, or refer to the descriptions in the documentation made in this PR.

The hashing, caching and fast-forwarding project is a work in progress. Since it is nearing completion, I'm creating this PR to start collecting feedback.

Hashing

The hashing.make_hash creates a hash from a given Python datastructure. Nested structures, are handled by recursive hashing, meaning that for example a dict is turned into

{key: make_hash(value) for key, value in dict.items()}

before hashing it. AiiDA Folder data can also be hashed, by recursively taking the hash of filenames and contents. For floating point numbers, the last 4 bits of the mantissa are truncated before creating the hash. However, this does not apply to ArrayData, where the array is saved as a file on disk and therefore handled by the Folder hashing.

Using the make_hash function, the Node has a get_hash method. For basic nodes, this method creates a hash from get_attrs(), the database folder, and the __version__ of the module where the Node class is defined. If there are errors, get_hash returns None.

Node subclasses can define a class-level _hash_ignored_attributes list, with names of attributes that will not be taken into account when creating the hash. Edit: _updatabale_attributes are also ignored now.

The WorkCalculation subclass adds get_inputs() to the objects that are hashed in get_hash, and exposes the _hash_ignored_inputs to ignore certain inputs.

Caching

When a Node is stored, the hash is saved as an extra. When subsequent nodes are stored, caching can be enabled by setting use_cache=True in the store method. This means that a node of the same type and hash will be returned if it exists. Otherwise a new node is created.

Nodes can determine whether they can be used as cache by implementing _is_valid_cache. This is used to avoid using failed calculations as cache.

The verdi rehash command can be used to re-calculate the hash either for all nodes, or just for a specific class.

Fast-forwarding

Fast-forwarding is implemented by using caching for the calculation of a given Process. If fast-forwarding is enabled and there is already such a calculation, this finished calculation will be used. The subsequent steps of running the process are then skipped, and it is done immediately.

The process decides whether fast-forwarding is used based on the _fast_forward_enabled method. There are two option to set this:

  • First priority: Passing the _fast_forward=True / False input to the process.
  • Second priority: A config file cache_config.yml in the .aiida folder. A default can be set, and fast-forwarding for specifc calculation / process classes can be enabled or disabled (by class name). An example file would look like this:
my_profile_name:
  use_cache:  # could be omitted (it's the default)
    default: False
  fast_forward:
    default: False  # could be omitted (it's the default)
    enabled: 
      - SomeCalculation
        OtherCalculation
        SomeWorkChainClass

Since the fast-forwarding is implemented at the level of Process, it doesn't work for InlineCalculation, and JobCalculation that is not launched via work.run.run or work.run.submit.

To-Do's

EDIT: Updated To-Do's after a discussion with Giovanni.

  • Also ignore _updatable_attributes in the hash, in addition to _hash_ignored_attributes.
  • Check if only WorkCalculation adds the inputs to the hash, or if JobCalculation also does it. If it doesn't, this needs to be fixed. Update: This is implemented in _get_objects_to_hash of the AbstractCalculation, meaning that it works for both WorkCalculation and JobCalculation.
  • Check that plum is compatible with the caching, and change .travis.yml to use a release version. Update: Plum version "0.7.11" (not merged) is compatible, but the goal is to merge with plum >= 0.9.
  • verdi rehash should take PKs as variadic arguments. Add a -a/--all option to rehash all nodes.
  • Avoid ambiguitiy in cache_config.yml, by using the full type string for cached classes
  • Document that _is_valid_cache can be used to permanently disable fast-forwarding on a calculation class. Also add some tests using this, e.g. to have a calculation which creates random output. Update: To disable caching on a class level, the _cacheable attribute was added. This avoids doing a hash query when the class cannot be cached at all. This does not mean that the nodes don't have a hash, though.
  • Copy nodes instead of just returning them when there is a cache hit. This means that caching will be useless in terms of memory use, but allows for the fast-forwarding functionality and retains the integrity of the graph. Nodes that are copied should get a cached_from extra, as a stand-in for the LinkType.CACHE we might want to add if lazy / reference copying is implemented.
  • Document what ignored_folder_content (in make_hash) does. Update: This is exposed to the get_hash() method, and is used to ignore the raw_input for the JobCalculation.
  • For Folder hashing, handle symbolic links separately. The link target should be hashed instead of the file content, with ps as type string / salt.
  • Add documentation
  • Make _get_config() configurable through a keyword argument giving the file path.
  • Add tests for caching configuration.
  • Make setting the caching default work, or remove it completely.
  • Merge the notions of use_cache and fast_forward.
  • Fix and add tests for caching JobCalculation instances.
  • Add "advanced" section to the documentation.
  • Add has_failed and has_finished methods to AbstractCalculation, implement in InlineCalculation.
  • Take Computer uuid into account when creating the Node hash.

lekah and others added 30 commits June 1, 2017 18:00
…ckend) that replaces the _dbnode member if a similar node already exists
…None for hash, which obviously should not be checked in the DB
* Failing test for two ArrayData with unequal content
* (Accidentally) passing test for two ArrayData of different size, with same str representation

For the ArrayData, we need to take the actual array into account when creating the hash,
not just the shape which is return in get_attrs()
Close files after reading them to create the hash.
The caching.defaults.use_cache parameter should be used by
plugin developers to mark whether a specific .store() call
CAN actually use caching. The user decides whether to use it
in the end by setting the default to True / False.
@greschd
Copy link
Member Author

greschd commented Feb 1, 2018

@sphuber @giovannipizzi I've gone through all your comments now. Please check the changes and let me know if there's something else to be changed.

sphuber
sphuber previously approved these changes Feb 2, 2018
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with the changes. Thanks again for an amazing job Dominik

@greschd
Copy link
Member Author

greschd commented Feb 2, 2018

My pleasure. Thanks for all your help!

@giovannipizzi
Copy link
Member

I'm not sure where @greschd wrote the comment on all attributes being updatable now... anyway, it's a big bug that was undiscovered and I opened #1109 for that.

Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dominik, thanks a lot! I'm approving and merging this, so we also have more people testing it and giving feedback for potential issues.
I think that there are two more things to do after this

  • Check and fix Attributes of a calculation can be changed! #1109
  • discuss if it is a good idea to store the hash in the extra, or should we have a different internal table, or column in DbNode. The reason being that now all tests have to assume that there is an additional extra, which to me is a bit a strange assumption. Another way would be to decide that all extras starting with aiida are not shown by default by get_extra, and these methods have an additional show_internals=True flag, and the set_extra, without flags, complains if one tries to store something starting with aiida (but there is a flag allow_internal=False by default, to allow it). I'm not sure it's a great idea though, maybe it just complicates a lot the logic, and a new column is just the simplest?

@giovannipizzi giovannipizzi merged commit f593ca5 into aiidateam:develop Feb 9, 2018
@greschd
Copy link
Member Author

greschd commented Feb 9, 2018

Cool, thanks for merging!

For the _aiida_hash and _aiida_cached_from extras, I think we discussed at some point adding a column for the hash, and a link type for cached_from. I don't know how much effort both of these changes are, though. Probably we will want to use "internal" extras again in the future when developing new features, so maybe adding the logic to hide the _aiida_* extras still makes sense?

sphuber added a commit to sphuber/aiida-core that referenced this pull request Feb 22, 2018
…utes

This was already done in PR aiidateam#652 that was merged into develop but
needs to be done in this branch as well for consistency, which will
be merged into the v0.11.1 patch release
@greschd greschd deleted the node-hashing branch April 11, 2018 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hashing of nodes
4 participants