Add the possilibity of opening intan files even if corrupted. #1470

h-mayorquin · 2024-05-06T17:57:11Z

Hi,
Currently, we have a check to determine if the timestamps associated with the data exhibit discontinuities. If they do, we throw an assertion and prevent the file from being read. I believe this is beneficial as it serves as a strong signal that the user should take a closer look at the file. This assertion is warranted.

But I also believe that once we infrormed the users that there is an error we should still allow the users to open the file if so they desired. This is good for two reasons:

Users can use the machinery of neo to find out what might have been wrong which would be a valuable service.
Sometimes, there is no other way, that is, the person who ran the experiments is not longer available but people might still want to analyze the data. I think this is a user case that should be covered.

This PR allows this to happen by introducing another boolean at instantiation that overpass this error and open the file anway. That is, it just allows the desired behavior with a boolean and keeps the default as it is for backwards compatbility.

zm711 · 2024-05-06T18:59:12Z

@h-mayorquin, two comments. (One a suggestion and then one a more philosophical discussion point).

I think I would raise the warning even if someone set strict_mode_for_timestamps=False so they know which datasets have errors and which ones don't. I think if the user decides to switch that argument to False we still need to distinguish between "good" and "corrupted" datasets. (but in this case it would be a warning instead of an error).

My only worry is that if we think of the case of a block of data (in the Intan sense not in the neo.Block sense), that could mean the whole data is scrambled because all packets after that issue are messed up. (I think, I don't know this empirically). Is the user-choice valuable enough to allow them to play around with potentially truly messed up data? (ie how many users do you think could truly go block by block and salvage some data vs would open a file and try to process it like normal despite it needing some post-curation).

I'm really torn on this. But I'm willing to be convinced (I've had one corrupted file due to another user of our shared computer) and I thought about trying to salvage it, but decided I didn't want to expend the effort.

h-mayorquin · 2024-05-06T19:15:07Z

I think I would raise the warning even if someone set strict_mode_for_timestamps=False so they know which datasets have errors and which ones don't. I think if the user decides to switch that argument to False we still need to distinguish between "good" and "corrupted" datasets. (but in this case it would be a warning instead of an error).

I did it like this at the beginning and then I changed. If they switch the default to False it means they know what they have to know. A warnings should be actionable and should tell you what to do to avoid it. If we added a warning, what would be the action to take to remove it?

My only worry is that if we think of the case of a block of data (in the Intan sense not in the neo.Block sense), that could mean the whole data is scrambled because all packets after that issue are messed up. (I think, I don't know this empirically). Is the user-choice valuable enough to allow them to play around with potentially truly messed up data? (ie how many users do you think could truly go block by block and salvage some data vs would open a file and try to process it like normal despite it needing some post-curation).

Yes, we told them there is a problem and stopped them at their tracks, gave them a big bool so they can go on to the forbidden land on their own will with no guarantees in our part. Why play the police?

As I don't want to play the police I think that the improvements would be on making it more clear that they are going to dangerous place. Maybe the variable should be unsafe_loading=False or load_data_event_if_corrupted. Something to make it clear that "don't use to load data in bulk as there might be dragons"

zm711 · 2024-05-06T19:21:41Z

I did it like this at the beginning and then I changed. If they switch the default to False it means they know what they have to know. A warnings should be actionable and should tell you what to do to avoid it. If we added a warning, what would be the action to take to remove it?

Maybe then instead we could add an attribute or a header element that provides a record that this file is a corrupted or uncorrupted file. Someone might just globally take on the risk, but I think we should still give them a way to know that this particular dataset is corrupt. So header element is probably easiest and most visibile.

Maybe the variable should be unsafe_loading=False or load_data_event_if_corrupted. Something to make it clear that "don't use to load data in bulk as there might be dragons"

I think this is a good point. The name timestamps doesn't quite make it clear the potential gravity of the dataset. Since we also get this error for some legitimate cases I also don't know if load_data_even_if_corrupted is fair either. unsafe_loading might be fair with an explanation that it doesn't check for proper timestamps which are a proxy in intan for data integrity.

h-mayorquin · 2024-05-06T19:23:33Z

Maybe then instead we could add an attribute or a header element that provides a record that this file is a corrupted or uncorrupted file. Someone might just globally take on the risk, but I think we should still give them a way to know that this particular dataset is corrupt. So header element is probably easiest and most visibile.

This makes sense to me so people can use that in loops if so they desire by checking against that attribute.

h-mayorquin · 2024-05-06T20:18:29Z

@zm711 I added your suggestions. I am not satisfied yet with the docstring writing and the variable name but this will give an idea of the structure and we can improve from there.

zm711 · 2024-05-06T20:59:25Z

I think the structure looks good to me now, but I agree the naming probably needs to be thought about a bit more. It's more a load_data_unsafely because we aren't applying all checks which does make the data "unsafe" but saying "unsafe data" doesn't sound quite right.

h-mayorquin · 2024-05-06T21:03:53Z

load_data_unsafely. I think this is a good idea

integrity_checks=True by default is another option.

What do you think of the name of the attribute that keeps the state of the check?

h-mayorquin · 2024-05-13T21:39:17Z

Changed here to load_data_unsafely.

zm711

Just noticed a typo to fix. Otherwise I think this is fine for me. Unless @alejoe91 or @samuelgarcia are against this? But this works for me if it helps you.

neo/rawio/intanrawio.py

zm711 · 2024-05-13T21:42:22Z

neo/rawio/intanrawio.py

        BaseRawIO.__init__(self)
        self.filename = filename
+        self.load_data_unsafely = load_data_unsafely
+        self.unsafe_timestamps = False


That works for me.

I just noticed the default for load_data_unsafely is wrong.

Corrected in the last commit.

Actually just thinking about this maybe rather than unsafe_timestamps we be more explicit and say something like noncontinuous_timestamps?

Yes. Agree. Changing.

h-mayorquin · 2024-05-29T09:23:37Z

What do you think of ignore_integrity_checks vs load_data_unsafely

zm711 · 2024-05-29T22:16:57Z

I think that might be a better name. Although technically we are only doing one check so it would be ignore_integrity_check. I'm not the biggest fan of saying unsafe or unsafely in general because it is ambiguous. But ignore_integrity_check or if we want to be more explicit ignore_timestamp_integrity_check.

Looks like CI safe for some reason since both RTD and testing didn't work. Probably just need a fresh push.

h-mayorquin · 2024-05-30T08:18:19Z

I think that might be a better name. Although technically we are only doing one check so it would be ignore_integrity_check. I'm not the biggest fan of saying unsafe or unsafely in general because it is ambiguous. But ignore_integrity_check or if we want to be more explicit ignore_timestamp_integrity_check.

Looks like CI safe for some reason since both RTD and testing didn't work. Probably just need a fresh push.

I am leaving open the possibility that we might have more than one check in the future. Thinking that we might also want to have something similar in other extractors (consistent API) and they might have different integrity checks.

So, integrity checks as a generic name and then a bunch of flags for every type of checks (here being the discontinous timestamps)

zm711 · 2024-05-30T09:06:05Z

neo/rawio/intanrawio.py

+    ignore_integrity_checks: bool, default: False
+        If True, data that violates integrity assumptions will be loaded. At the moment the only integrity
+        check we perform is that timestamps are continuous. Setting this to True will ignore this check and set
+        the attribute `discontinous_timestamps` to True if the timestamps are not continous. This attribute can be checked 


Suggested change

the attribute `discontinous_timestamps` to True if the timestamps are not continous. This attribute can be checked

the attribute `discontinuous_timestamps` to True if the timestamps are not continuous. This attribute can be checked

zm711

Thank you!

h-mayorquin added 2 commits May 6, 2024 11:50

open the file intan

6077740

Merge branch 'master' into add_possiblity_of_reading_file

8eb744e

add attribute and logic

c38c300

h-mayorquin added 2 commits May 13, 2024 15:37

Merge branch 'master' into add_possiblity_of_reading_file

99a51a9

add zach suggestion

32c5310

zm711 reviewed May 13, 2024

View reviewed changes

h-mayorquin added 3 commits May 13, 2024 16:19

fix typos notiched by Zach

2919a6d

improve error docstring and change default

d306ac7

another damn spelling story

21ac9ed

naming

8fc1955

h-mayorquin force-pushed the add_possiblity_of_reading_file branch from 40e6b6e to 8fc1955 Compare May 30, 2024 08:37

zm711 reviewed May 30, 2024

View reviewed changes

yet another typo

7a53c7e

zm711 approved these changes May 30, 2024

View reviewed changes

zm711 merged commit 5c54fd9 into NeuralEnsemble:master May 30, 2024

h-mayorquin deleted the add_possiblity_of_reading_file branch June 4, 2024 01:59

This was referenced Jun 4, 2024

Add argument to IntanRecordingExtractor for opening files with discontinous timestamps SpikeInterface/spikeinterface#2969

Merged

Propagate integrity checks verbose for intan catalystneuro/neuroconv#887

Merged

zm711 added this to the 13.2 milestone Jul 26, 2024

h-mayorquin mentioned this pull request Sep 10, 2025

General API for handling sample gaps on rawio #1773

Open

	the attribute `discontinous_timestamps` to True if the timestamps are not continous. This attribute can be checked
	the attribute `discontinuous_timestamps` to True if the timestamps are not continuous. This attribute can be checked

Add the possilibity of opening intan files even if corrupted. #1470

Add the possilibity of opening intan files even if corrupted. #1470

Uh oh!

Conversation

h-mayorquin commented May 6, 2024

Uh oh!

zm711 commented May 6, 2024

Uh oh!

h-mayorquin commented May 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zm711 commented May 6, 2024

Uh oh!

h-mayorquin commented May 6, 2024

Uh oh!

h-mayorquin commented May 6, 2024

Uh oh!

zm711 commented May 6, 2024

Uh oh!

h-mayorquin commented May 6, 2024

Uh oh!

h-mayorquin commented May 13, 2024

Uh oh!

zm711 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zm711 May 13, 2024

Choose a reason for hiding this comment

Uh oh!

h-mayorquin May 13, 2024

Choose a reason for hiding this comment

Uh oh!

h-mayorquin May 13, 2024

Choose a reason for hiding this comment

Uh oh!

zm711 May 29, 2024

Choose a reason for hiding this comment

Uh oh!

h-mayorquin May 29, 2024

Choose a reason for hiding this comment

Uh oh!

h-mayorquin May 29, 2024

Choose a reason for hiding this comment

Uh oh!

h-mayorquin commented May 29, 2024

Uh oh!

zm711 commented May 29, 2024

Uh oh!

h-mayorquin commented May 30, 2024

Uh oh!

zm711 May 30, 2024

Choose a reason for hiding this comment

Uh oh!

zm711 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

h-mayorquin commented May 6, 2024 •

edited

Loading