New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node definition loops and caching #389
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tries. Use overwrite=False or update=False to raise an exception instead of overwriting.
_cache_mode is removed (it is never used and not settable) references to node._cache_mode are removed (it does not exist) repr added error handling is cleaned up, including some typos units added for CacheCtrl cache_stores property added
…zed traits when using the cache or otherwise accessing the node definition. NodeDefinitionError is added A guard is added to detect circular dependencies in the node definition, which can cause an infinite loop or a cache deadlock. A NodeDefinitionError exception is raised. A guard is added to detect if the node definition is accessed before the traits are fully initialized, which can cause traitlets to use a trait default_value or default method even if an argument has been passed to the constructor. This is due in part to our __init__ method and handling of some traits prior to calling the super tl.HasTraits __init__. I also saw tl.default methods called multiple times with different values for other traits in each call, and other subtly weird behavior, especially if an exception is caught and handled. A NodeDefinition exception is raised. The super tl.HasTraits __init__ is moved within the hold trait notifications context during Node __init__, which helps mitigate some of the issues above, especially with regard to accessing the cache during trait validation. The Node caching methods catch the NodeDefinitionError and add an informative error message, including the key. We discussed that has_cache can return False, get_cache can just return, etc in these cases, but note that when these exceptions are raised, the line calling the cache will *never* succeed, so I think raising an exception is appropriate. The LoadFileMixin dataset was the primary culprit triggering these kinds of errors because the data_key default and validation required the dataset, but the dataset caching required the data_key. In order to actually cache the dataset, it is now cached using a stand-in "dataset_caching_node" that only includes the source attr. Note that this should actually improve cache usage here because the binary dataset data is the same regardless of the data_key (or other datasource attributes, such as interpolation). These updates revealed a small bug in the WCS node base_ref when no layer_name is supplied. We also discussed omitting node attributes that are using the default value from the node definition (and adding the podpac version to the definition) as a way to mitigate these issues, but I believe that may no longer be necessary.
jmilloy
added
bug
Something isn't working
enhancement
New feature or request
testing
fixes-and-maintenance
labels
Apr 15, 2020
mpu-creare
approved these changes
Apr 15, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I do suspect we'll run into strange errors so we'll have to test the notebooks early.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NodeDefinitionError
when a circular node definition is detectedNodeDefinitionError
when the node definition is accessed before the node traits are fully initializedNodeDefinitionError
in node caching methods and add a more informative message, including thekey
LoadFileMixin
caches thedataset
independent of the other node attrs (such asinterpolation
ordata_key
).Node.__init__
calls thetl.HasTraits.__init__
within thehold_trait_notifications
context in order to avoid calling some validation methods before other traits have even ben initialized.The circular node definition handling prevents the cache deadlock. Then I had to change
LoadFileMixin
dataset caching in order to avoid hitting those exceptions, and it is better now. As a result, we do not have to remove the lock inhas_cache
if we don't want to.Note also that I did not return
False
inhas_cache
when theNodeDefinitionError
occurs, as we discussed. If that error is reached, the code trying to use the cache will never succeed. It is not like it is going to succeed in another state. The user should just not be caching that object, or needs to find another way/place to cache it. (Same forput_cache
, etc).Note also that I did not have to omit default traits from the node definition, so I did not yet add the podpac version to the definition.
The meat of this PR is in commit a425106.