Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get_array() when introducing gaps in multi-module detector data #234

Merged
merged 4 commits into from Nov 1, 2021

Conversation

takluyver
Copy link
Member

@takluyver takluyver commented Oct 29, 2021

@daviddoji alerted me to a bug when reading LPD data, where the same data would appear in the array twice.

This turned out to be when we introduce a gap in the data for one module, where that module missed a train which other modules recorded. I.e. a contiguous chunk of data in the file has to be split into 2 or more pieces in the output array. After each split, it was going back to the start of the source chunk, instead of continuing from after the split point. This only affects the new code path for reading with a pulse selection.

Edit: I found another bug when introducing gaps and not using pulse selection. Another win for tests.

I'm not exactly sure how to have a good test for this.

@takluyver takluyver added the bug Something isn't working label Oct 29, 2021
@takluyver
Copy link
Member Author

Loading a single pulse from selected trains for all available LPD modules, then taking a single module and averaging each frame to get one number per train.

Before:

image

After:

image

The downward spike is the gap - because this is integer data (uncorrected), the gap is filled with zeros.

@takluyver takluyver added this to the 1.8.1 milestone Oct 29, 2021
@takluyver
Copy link
Member Author

Good job I did decide to write a test, because I immediately discovered another bug. 😅

Copy link
Member

@tmichela tmichela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Any feel how often these gaps appear IRL? Do we need a bugfix release for that?

extra_data/tests/make_examples.py Outdated Show resolved Hide resolved
Co-authored-by: Thomas Michelat <32831491+tmichela@users.noreply.github.com>
@takluyver
Copy link
Member Author

Thanks! I think these gaps are reasonably common - @daviddoji noticed a couple of repeats by inspecting data from a single LPD module, and if there were a couple of fairly visible ones, chances are there were others that weren't so easy to see. So I'm planning to do a bugfix with this and Philipp's #236.

@takluyver takluyver merged commit 1e53e5a into master Nov 1, 2021
@takluyver takluyver deleted the fix-multimod-get-array-gaps branch November 1, 2021 09:51
@takluyver
Copy link
Member Author

I meant to say, though, I think people often want only trains with data for all modules (min_modules=16), in which case they can avoid this issue. So that mitigates it a bit. But unfortunately, the default when you use classes like LPD1M is to find trains with data from any module.

@takluyver
Copy link
Member Author

David mentioned today that using min_modules to select only trains with data for all modules dropped about 600 trains from a big run of ~30000 trains. That implies that each module is missing about 1 train in 650 - or there might be some flaky modules and others more reliable. This is from data taken just a couple of weeks ago (p900226, r172).

@daviddoji
Copy link

I meant to say, though, I think people often want only trains with data for all modules (min_modules=16), in which case they can avoid this issue. So that mitigates it a bit. But unfortunately, the default when you use classes like LPD1M is to find trains with data from any module.

As we've already discussed, giving to min_modules a value greater than 1 would not be ideal. With LPD data, there is unfortunately bad luck in making all modules to work simultaneously, so one has to choose either using 12 or 13.

As one can decide how many of them would like to have in the analysis by using min_modules flag, I'll leave it as it is

@tmichela with long runs this happens quite frequently, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants