Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning when select with require_all discards most trains #497

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

takluyver
Copy link
Member

We sometimes see an instrument source which is in the run but has no data for any train. If you run.select(..., require_all=True) including such a source, you get a valid selection with 0 trains, which is surprising, and you then have to work out which source cause it.

With this change, the selection will come with a warning like:

480/480 trains lost when filtering by FXE_XAD_GEC/CAM/CAMERA_NODATA:daqOutput

I've currently made this appear when filtering by a single source drops more than half the trains we had just before. IDK if that's right, e.g. with HED-style experiments where only a very few pulses are important. Another option is to show the warning only if we're deselecting all trains.

Naturally we could also make it configurable, but I hope we can have a default that works for 99% of use cases without config.

@takluyver takluyver added the enhancement New feature or request label Mar 6, 2024
Copy link
Contributor

@philsmt philsmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. LGTM in principle, with a minor suggestion.

@@ -1013,6 +1015,12 @@ def select(self, seln_or_source_glob, key_glob='*', require_all=False,
else: # require_any
train_ids = np.union1d(train_ids, source_tids)

if len(train_ids) < (n_trains_prev / 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a configurable threshold via keyword parameter? Then this feature can be adapted to any situation and actively used to test against expectations. 0.5 could be the default value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more after commmitting, I might make the default 0% left (= 100% dropped). The most common case for this is a source that recorded no data at all, so this would still catch that, but avoid any annoying warnings when they're not wanted.

extra_data/reader.py Outdated Show resolved Hide resolved
@tmichela
Copy link
Member

tmichela commented Mar 7, 2024

with HED-style experiments where only a very few pulses are important.

I think that should be fine with HED data, as currently we get the train ID of interest and filter the run based on that: run.select_trains(by_id[[tid]]), eventually in future something like: run.select_trains(PPU).

That said, I think I'd favor warning only if all trains a dropped by default and make it configurable if we want a higher threshold.

@takluyver takluyver force-pushed the warn-require-all-discarding branch 2 times, most recently from b051b9c to f0f32b3 Compare March 7, 2024 09:31
@takluyver
Copy link
Member Author

OK, the warning will only show by default if all trains are dropped. Configuring it looks like: .select(..., require_all=True, warn_drop_trains_frac=0.9).

The fraction is out of the trains left just before selecting each source, not from the original number. So if you select a lot of sources with different trains missing, many trains could be dropped without any one source triggering the warning. There's potential for confusion there, but I don't think it's too likely to come up, and I think there's potential for confusion with any option. .plot_missing_data() is a better way to get an overview of what data is missing.

@takluyver takluyver force-pushed the warn-require-all-discarding branch from f0f32b3 to 83fa3c6 Compare March 7, 2024 10:03
@takluyver takluyver force-pushed the warn-require-all-discarding branch from 83fa3c6 to 0e3f1ba Compare March 7, 2024 10:17
@takluyver
Copy link
Member Author

(Got LGTM in chat)

@takluyver takluyver merged commit 3bdce20 into master Mar 11, 2024
10 checks passed
@takluyver takluyver deleted the warn-require-all-discarding branch March 11, 2024 09:43
@takluyver takluyver added this to the 1.17 milestone Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants