Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checks for multioutput plugins and loop dependencies #312

Merged
merged 3 commits into from Sep 9, 2020

Conversation

darrylmasson
Copy link
Contributor

@darrylmasson darrylmasson commented Sep 2, 2020

What is the problem / what does the code in this PR do

If the first dependency of a loop plugin is produced by a multi-output plugin, the automatic determination of loop_over fails. During the automatic determination of loop order, loop_over = self.deps[self.depends_on[0]].data_kind (plugin.py L610) evaluates to a dictionary rather than a string, and this fails 4 lines down when it dereferences kwargs.

Can you briefly describe how it works?

This PR checks to see if the data_kind of the plugin providing the first dependency is a multi-output plugin and asks the user to be more specific if it is. Fixes #311 .

@WenzDaniel
Copy link
Collaborator

Looks good, maybe here an additional MWE since it took me quite some time to get deps initialized:

plugins = st._get_plugins(targets=('events',), run_id='0')  # Generates plugins up to events
loop_over = plugins['records'].deps[plugins['records'].depends_on[0]].data_kind

returns a dict

plugins = st._get_plugins(targets=('events',), run_id='0')  # Generates plugins up to events
loop_over = plugins['records'].deps[plugins['records'].depends_on[0]].data_kind
loop_over[plugins['records'].depends_on[0]]

returns 'raw_records'

strax/plugin.py Outdated
@@ -608,6 +608,9 @@ def compute(self, **kwargs):
loop_over = self.loop_over
else:
loop_over = self.deps[self.depends_on[0]].data_kind
if isinstance(loop_over, dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to changes this line in isinstance(loop_over, (dict, immutabledict)) since the DAQReader has an immutable dict as data_kind. See

plugins = st._get_plugins(targets=('events',), run_id='0')
plugins['records'].deps[plugins['records'].depends_on[0]].data_kind

strax/plugin.py Outdated
@@ -608,6 +608,9 @@ def compute(self, **kwargs):
loop_over = self.loop_over
else:
loop_over = self.deps[self.depends_on[0]].data_kind
if isinstance(loop_over, dict):
loop_over = loop_over[self.depends_on[0]]
# We can't have nested dictionaries here, can we?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you showed, that we cannot. So maybe remove this comment.

Copy link
Member

@JoranAngevaare JoranAngevaare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Darryl for spotting this, the error is indeed very non-instructive and unpacking it from a multioutput plugin like this will help recognizing when something is off / badly constructed.

Maybe it's good to add that the high-level problem is that this plugin wasn't designed for these kinds of low-level datatypes. You will (with anything below the non-overlapping peaklets) run into this assertion error:

assert np.all(base[1:]['time'] >= strax.endtime(base[:-1])), \
. Your base needs to be split and sorted by time. Records have overlapping time-information and therefore aren't suitable for this kind of plugin.

strax/plugin.py Outdated
@@ -608,6 +608,9 @@ def compute(self, **kwargs):
loop_over = self.loop_over
else:
loop_over = self.deps[self.depends_on[0]].data_kind
if isinstance(loop_over, dict):
loop_over = loop_over[self.depends_on[0]]
# We can't have nested dictionaries here, can we?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can be on the safe side by making a check just below:

if not istinstance(loop_over, str): 
    raise TypeError(f'Trying to loop over {loop_over} which is not a string? Please add loop_over = <base> to your plugin')

@darrylmasson
Copy link
Contributor Author

Maybe it's good to add that the high-level problem is that this plugin wasn't designed for these kinds of low-level datatypes.

So what's the high-level solution we want to aim for? Prevent loop plugins from depending on records or lower (or other overlapping data types)?

@JoranAngevaare
Copy link
Member

Hi Darryl,
Thanks, I think your solution is fine, for example for peaklets this should work (although presumably slow). We just shouldn't expect this plugin to work for things like records.

The lines I suggested were in addition to what you added before.

Combining this would lead to :

...
if isinstance(loop_over, (dict, immutabledict)):
    loop_over = loop_over[self.depends_on[0]]
if not isinstance(loop_over, str):
    raise TypeError("Please add \"loop_over = <base>\""
                            " to your plugin definition")
...

@WenzDaniel
Copy link
Collaborator

Would not your proposal the lines above obsolete?

if not isinstance(loop_over, str):
    raise TypeError("Please add \"loop_over = <base>\""
                            " to your plugin definition")

Would always be raised when

if isinstance(loop_over, (dict, immutabledict)):
    loop_over = loop_over[self.depends_on[0]]

is true. Hence we could remove it.

@JoranAngevaare
Copy link
Member

Would not your proposal the lines above obsolete?

if not isinstance(loop_over, str):
    raise TypeError("Please add \"loop_over = <base>\""
                            " to your plugin definition")

Would always be raised when

if isinstance(loop_over, (dict, immutabledict)):
    loop_over = loop_over[self.depends_on[0]]

is true. Hence we could remove it.

No, you redefine loop_over the line just above it?

@darrylmasson
Copy link
Contributor Author

If we do go for something that looks like this:

if hasattr(self, 'loop_over'):
    loop_over = self.loop_over
else
    loop_over = self.deps[self.depends_on[0]].data_kind
if isinstance(loop_over, (dict, immutabledict)):
    loop_over = loop_over[self.depends_on[0]]
if not isinstance(loop_over, str):
    raise TypeError("message")

The second isinstance will only do anything in the situation where a plugin is particularly convoluted. I would argue to skip the first check. The second is a catch-all case that should handle any ambiguous situation.

@JoranAngevaare
Copy link
Member

JoranAngevaare commented Sep 8, 2020

The first isinstance changes the loop_over and is not a check as such. I don't follow your reasoning as you will end up with the TypeError if you skip the first isinstance for the multioutputplugins.

@darrylmasson
Copy link
Contributor Author

That's my point - the second isinstance is a catch-all that should handle any ambiguous situation, and multioutput plugins are at least somewhat ambiguous in this regard. Rather than individually handle these cases, only having if not isinstance(loop_over, str) both makes the code easier to maintain and also deals with any possible future features that could affect this in some way.

@JoranAngevaare
Copy link
Member

There is nothing wrong with multi-output plugins per se but perhaps you are right that rather than doing the thinking for the analysts that just accepting a simple format is clearer (and throwing an error otherwise).

Copy link
Collaborator

@WenzDaniel WenzDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat

@WenzDaniel WenzDaniel merged commit 06fd303 into AxFoundation:master Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loop plugin fails when it depends on a multi-output plugin
3 participants