-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warning when select with require_all discards most trains #497
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. LGTM in principle, with a minor suggestion.
extra_data/reader.py
Outdated
@@ -1013,6 +1015,12 @@ def select(self, seln_or_source_glob, key_glob='*', require_all=False, | |||
else: # require_any | |||
train_ids = np.union1d(train_ids, source_tids) | |||
|
|||
if len(train_ids) < (n_trains_prev / 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a configurable threshold via keyword parameter? Then this feature can be adapted to any situation and actively used to test against expectations. 0.5
could be the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more after commmitting, I might make the default 0% left (= 100% dropped). The most common case for this is a source that recorded no data at all, so this would still catch that, but avoid any annoying warnings when they're not wanted.
I think that should be fine with HED data, as currently we get the train ID of interest and filter the run based on that: That said, I think I'd favor warning only if all trains a dropped by default and make it configurable if we want a higher threshold. |
b051b9c
to
f0f32b3
Compare
OK, the warning will only show by default if all trains are dropped. Configuring it looks like: The fraction is out of the trains left just before selecting each source, not from the original number. So if you select a lot of sources with different trains missing, many trains could be dropped without any one source triggering the warning. There's potential for confusion there, but I don't think it's too likely to come up, and I think there's potential for confusion with any option. |
f0f32b3
to
83fa3c6
Compare
83fa3c6
to
0e3f1ba
Compare
(Got LGTM in chat) |
We sometimes see an instrument source which is in the run but has no data for any train. If you
run.select(..., require_all=True)
including such a source, you get a valid selection with 0 trains, which is surprising, and you then have to work out which source cause it.With this change, the selection will come with a warning like:
I've currently made this appear when filtering by a single source drops more than half the trains we had just before. IDK if that's right, e.g. with HED-style experiments where only a very few pulses are important. Another option is to show the warning only if we're deselecting all trains.
Naturally we could also make it configurable, but I hope we can have a default that works for 99% of use cases without config.