Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node definition loops and caching #389

Merged
merged 4 commits into from Apr 15, 2020

Conversation

jmilloy
Copy link
Collaborator

@jmilloy jmilloy commented Apr 15, 2020

  • raise a NodeDefinitionError when a circular node definition is detected
  • raise a NodeDefinitionError when the node definition is accessed before the node traits are fully initialized
  • catch NodeDefinitionError in node caching methods and add a more informative message, including the key
  • cache put methods overwrite/update the cache by default
  • LoadFileMixin caches the dataset independent of the other node attrs (such as interpolation or data_key).
  • Node.__init__ calls the tl.HasTraits.__init__ within the hold_trait_notifications context in order to avoid calling some validation methods before other traits have even ben initialized.
  • CacheCtrl is cleaned up, with bugfixes and tests
  • small bugfixes in CSV and WCS nodes

The circular node definition handling prevents the cache deadlock. Then I had to change LoadFileMixin dataset caching in order to avoid hitting those exceptions, and it is better now. As a result, we do not have to remove the lock in has_cache if we don't want to.

Note also that I did not return False in has_cache when the NodeDefinitionError occurs, as we discussed. If that error is reached, the code trying to use the cache will never succeed. It is not like it is going to succeed in another state. The user should just not be caching that object, or needs to find another way/place to cache it. (Same for put_cache, etc).

Note also that I did not have to omit default traits from the node definition, so I did not yet add the podpac version to the definition.

The meat of this PR is in commit a425106.

…tries. Use overwrite=False or update=False to raise an exception instead of overwriting.
_cache_mode is removed (it is never used and not settable)
references to node._cache_mode are removed (it does not exist)
repr added
error handling is cleaned up, including some typos
units added for CacheCtrl
cache_stores property added
…zed traits when using the cache or otherwise accessing the node definition.

NodeDefinitionError is added
A guard is added to detect circular dependencies in the node definition, which can cause an infinite loop or a cache deadlock. A NodeDefinitionError exception is raised.
A guard is added to detect if the node definition is accessed before the traits are fully initialized, which can cause traitlets to use a trait default_value or default method even if an argument has been passed to the constructor. This is due in part to our __init__ method and handling of some traits prior to calling the super tl.HasTraits __init__. I also saw tl.default methods called multiple times with different values for other traits in each call, and other subtly weird behavior, especially if an exception is caught and handled. A NodeDefinition exception is raised.
The super tl.HasTraits __init__ is moved within the hold trait notifications context during Node __init__, which helps mitigate some of the issues above, especially with regard to accessing the cache during trait validation.
The Node caching methods catch the NodeDefinitionError and add an informative error message, including the key. We discussed that has_cache can return False, get_cache can just return, etc in these cases, but note that when these exceptions are raised, the line calling the cache will *never* succeed, so I think raising an exception is appropriate.
The LoadFileMixin dataset was the primary culprit triggering these kinds of errors because the data_key default and validation required the dataset, but the dataset caching required the data_key. In order to actually cache the dataset, it is now cached using a stand-in "dataset_caching_node" that only includes the source attr. Note that this should actually improve cache usage here because the binary dataset data is the same regardless of the data_key (or other datasource attributes, such as interpolation).
These updates revealed a small bug in the WCS node base_ref when no layer_name is supplied.
We also discussed omitting node attributes that are using the default value from the node definition (and adding the podpac version to the definition) as a way to mitigate these issues, but I believe that may no longer be necessary.
@jmilloy jmilloy added bug Something isn't working enhancement New feature or request testing fixes-and-maintenance labels Apr 15, 2020
@jmilloy jmilloy requested a review from mpu-creare April 15, 2020 16:31
@jmilloy jmilloy self-assigned this Apr 15, 2020
@jmilloy jmilloy added this to In progress in 2.0.0 Release via automation Apr 15, 2020
@coveralls
Copy link

Coverage Status

Coverage remained the same at 0.0% when pulling e4c4a18 on feature/definition-loops-and-caching into 6d774e6 on develop.

Copy link
Contributor

@mpu-creare mpu-creare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I do suspect we'll run into strange errors so we'll have to test the notebooks early.

podpac/core/node.py Show resolved Hide resolved
podpac/core/node.py Show resolved Hide resolved
@jmilloy jmilloy merged commit d03c52e into develop Apr 15, 2020
2.0.0 Release automation moved this from In progress to Done Apr 15, 2020
@jmilloy jmilloy deleted the feature/definition-loops-and-caching branch April 15, 2020 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request fixes-and-maintenance testing
Projects
No open projects
2.0.0 Release
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants