Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve spikeglx file naming #1125

Merged
merged 10 commits into from
Aug 5, 2022

Conversation

samuelgarcia
Copy link
Contributor

No description provided.

@pep8speaks
Copy link

pep8speaks commented May 31, 2022

Hello @samuelgarcia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-08-04 08:14:39 UTC

@samuelgarcia
Copy link
Contributor Author

@TomBugnon @grahamfindlay
This PR should fix the problem of multi trigger or/and multi gate folder.
With this new way both multi trigger and multi gate will become multi segment recording.
You can even read nested gate/trigger folder, segment would be cumulated in that case.

If you have time to test and give feed back it would be great.
Normally no changes are need in spikeinterface side.

@TomBugnon
Copy link

TomBugnon commented Jun 7, 2022

Hi @samuelgarcia ,

Thanks a lot for implementing this.!

I tried instantiating recordings for a few real life datasets (not covering all the cases that graham provided you) and it seems to be working fine when the gate index is 0 (I didn't check what the data actually loaded looks like, but the number of segments and total duration look correct)

However I still get the same error when trying to load folder with gate index > 0. It seems like the gate is not parsed properly from the folder/filename?

$ ls /Volumes/neuropixel_archive/Data/chronic/CNPIX10-Charles/7-11-2021/SpikeGLX/7-11-2021_g1/7-11-2021_g1_imec0

7-11-2021_g1_t0.imec0.ap.bin   7-11-2021_g1_t2.imec0.ap.meta  7-11-2021_g1_t4.imec0.lf.bin   7-11-2021_g1_t6.imec0.lf.meta
7-11-2021_g1_t0.imec0.ap.meta  7-11-2021_g1_t2.imec0.lf.bin   7-11-2021_g1_t4.imec0.lf.meta  7-11-2021_g1_t7.imec0.ap.bin
7-11-2021_g1_t0.imec0.lf.bin   7-11-2021_g1_t2.imec0.lf.meta  7-11-2021_g1_t5.imec0.ap.bin   7-11-2021_g1_t7.imec0.ap.meta
7-11-2021_g1_t0.imec0.lf.meta  7-11-2021_g1_t3.imec0.ap.bin   7-11-2021_g1_t5.imec0.ap.meta  7-11-2021_g1_t7.imec0.lf.bin
7-11-2021_g1_t1.imec0.ap.bin   7-11-2021_g1_t3.imec0.ap.meta  7-11-2021_g1_t5.imec0.lf.bin   7-11-2021_g1_t7.imec0.lf.meta
7-11-2021_g1_t1.imec0.ap.meta  7-11-2021_g1_t3.imec0.lf.bin   7-11-2021_g1_t5.imec0.lf.meta  7-11-2021_g1_t8.imec0.ap.bin
7-11-2021_g1_t1.imec0.lf.bin   7-11-2021_g1_t3.imec0.lf.meta  7-11-2021_g1_t6.imec0.ap.bin   7-11-2021_g1_t8.imec0.ap.meta
7-11-2021_g1_t1.imec0.lf.meta  7-11-2021_g1_t4.imec0.ap.bin   7-11-2021_g1_t6.imec0.ap.meta  7-11-2021_g1_t8.imec0.lf.bin
7-11-2021_g1_t2.imec0.ap.bin   7-11-2021_g1_t4.imec0.ap.meta  7-11-2021_g1_t6.imec0.lf.bin   7-11-2021_g1_t8.imec0.lf.meta
rec = se.SpikeGLXRecordingExtractor(
    "/Volumes/neuropixel_archive/Data/chronic/CNPIX10-Charles/7-11-2021/SpikeGLX/7-11-2021_g1/7-11-2021_g1_imec0",
    stream_id='imec0.ap'
)

KeyError Traceback (most recent call last)
/home/tbugnon/projects/ecephys_dev/misc/sglx/test_spikeinterface_sglx.ipynb Cell 17' in <cell line: 4>()
1 folder_path = files.path[0].parent
2 print(folder_path)
----> 4 rec = se.SpikeGLXRecordingExtractor(
5 folder_path,
6 stream_id='imec0.ap'
7 )

File ~/projects/misc/test_sglx_spikeinterface/spikeinterface/spikeinterface/extractors/neoextractors/spikeglx.py:38, in SpikeGLXRecordingExtractor.init(self, folder_path, stream_id)
36 if HAS_NEO_10_2:
37 neo_kwargs["load_sync_channel"] = False
---> 38 NeoBaseRecordingExtractor.init(self, stream_id=stream_id, **neo_kwargs)
40 # ~ # open the corresponding stream probe
41 if HAS_NEO_10_2 and "nidq" not in self.stream_id:

File ~/projects/misc/test_sglx_spikeinterface/spikeinterface/spikeinterface/extractors/neoextractors/neobaseextractor.py:26, in NeoBaseRecordingExtractor.init(self, stream_id, **neo_kwargs)
24 def init(self, stream_id=None, **neo_kwargs):
---> 26 _NeoBaseExtractor.init(self, **neo_kwargs)
28 # check channel
29 # TODO propose a meachanisim to select the appropriate channel groups
30 # in neo one channel group have the same dtype/sampling_rate/group_id
31 # ~ channel_indexes_list = self.neo_reader.get_group_signal_channel_indexes()
32 stream_channels = self.neo_reader.header['signal_streams']

File ~/projects/misc/test_sglx_spikeinterface/spikeinterface/spikeinterface/extractors/neoextractors/neobaseextractor.py:16, in _NeoBaseExtractor.init(self, **neo_kwargs)
14 neoIOclass = eval('neo.rawio.' + self.NeoRawIOClass)
15 self.neo_reader = neoIOclass(**neo_kwargs)
---> 16 self.neo_reader.parse_header()
18 assert self.neo_reader.block_count() == 1,
19 'This file is neo multi block spikeinterface support one block only dataset'

File ~/projects/misc/test_sglx_spikeinterface/python-neo/neo/rawio/baserawio.py:185, in BaseRawIO.parse_header(self)
172 def parse_header(self):
173 """
174 This must parse the file header to get all stuff for fast use later on.
175
(...)
183
184 """
--> 185 self._parse_header()
186 self._check_stream_signal_channel_characteristics()

File ~/projects/misc/test_sglx_spikeinterface/python-neo/neo/rawio/spikeglxrawio.py:108, in SpikeGLXRawIO._parse_header(self)
105 signal_channels = []
106 for stream_name in stream_names:
107 # take first segment
--> 108 info = self.signals_info_dict[0, stream_name]
110 stream_id = stream_name
111 stream_index = stream_names.index(info['stream_name'])

KeyError: (0, 'imec0.ap')

Besides that the main comment I have is about user control of gates/triggers that are being concatenated:

  • You said the preferred way to subselect segments is to instantiate the whole recording and then use SpikeGLXRecordingExtractor.select_segments() to subset segments. I think this is quite error-prone atm since select_segments only takes as argument segment indices (rather than, in this case actual gate and trigger ranges).
    I can't see in the recording attributes that the mapping between segments and trigger/gates is conserved anywhere, and it is not straightforward (all the more since triggers can be concatenated across gates ). So maybe SpikeGLXRecordingExtractor.select_segments() should take ranges of triggers/gates as kwarg, or there could at least be an attribute to record the info for each segment

  • This might be too specific, but if not too costly I still think it would be nice to have control over trigger and gate ranges (and not only stream type) during instantiation. For instance we have a folder of recordings for which the metadata of the last trigger is corrupt (due to a SGLX crash or something), in this particular case I can't do this method of subselecting segments of interest a posteriori since the instantiation from the folder fails (so I would need to delete or rename the faulty files even though I'm not interested in them to start with)

Couple other comments that came to my mind

  • Is there any way to access the metadata of each segment from the parent recording? or the full path to their .meta or .bin files?

  • I don't know for sure what the t_start of each segment (eg at recording._recording_segments[0].t_start) means or how it is used but it seems like it is set to 0.0 for all segments currently.

Best
Tom

@samuelgarcia
Copy link
Contributor Author

Hi Tom,
thank you for the feedback.
I see. The way I did the code is that because there is no g0 then the seg_index is not done properly.
I will change this.
My guess is that if your read from the parent folder (both g0 and g1) the actual implementation should work.
I will fix this soon.

Ok for putting some kwargs for filtering directly in init for g and t selection.

For metadata : yes you will be able to access then. This PR #627 have been merged recently.
So neo array_annoation are mapped to spikeinterface property (channel wise) and neo annoation are mamper to spikeinetrface annonation.
Which metadata you want to have for spikeglx ?

For t_start: in SI you should not acces recording._recording_segments. You can acces this with
recording.get_times(segment_index=) t_start is not directly exposed. We could do it of course. It is the first sample of times vector.

@JuliaSprenger JuliaSprenger added this to the 0.11.0 milestone Jun 23, 2022
@samuelgarcia
Copy link
Contributor Author

Hi @TomBugnon
Sorry for long delay.
I have just push a patch.

Normaly all case now handled:

  • multi trigger
  • multi gate
  • multi trigger and multi gate

Depending the depth of the folder we can have several case, wwith the dataset you provide.

Given the dataset provied by @grahamfindlay we can have:

  • this is only gate0 multi index 'spikeglx/multi_trigger_multi_gate/SpikeGLX/5-19-2022-CI0/5-19-2022-CI0_g0'
  • this is only gate1 multi index 'spikeglx/multi_trigger_multi_gate/SpikeGLX/5-19-2022-CI0/5-19-2022-CI0_g1'
  • this mix both multi gate and multi trigger 'spikeglx/sample_data_v2/SpikeGLX/5-19-2022-CI0' because this is parent folder

Tell me if this is ok for you.

@TomBugnon
Copy link

Hi Samuel,

Thank you so much for this!

I haven't checked that the data actually loaded is correct but the extractor seems to find the right number of segments in my case!
I do get the warning ".meta file has falty value however". It doesn't say which file exactly triggers this so I can check whether it is corrupt but I don't think that is the case.

Regarding our previous comment, it doesn't seem like there are kwargs to subselect gate and trigger indices at initiation?
Otherwise, as I said, can the mapping between segment and trigger/gate be accessed from the recording object? If not it's not clear to me how one can subselect specific trigger/gates

@samuelgarcia
Copy link
Contributor Author

This warning is anoying and I plan to remove it soon. It just check that real file size is exactly the one in meta.
But very often files are reduced!!
@JuliaSprenger

@JuliaSprenger
Copy link
Member

@TomBugnon Did you intentionally reduce the file size before loading?

@grahamfindlay
Copy link

@JuliaSprenger I suspect I know what’s going on here: we use an OpenZFS file system that performs on-the-fly compression for our large datasets, so size on disk will not match file size reported in the metadeta. You can get the file system to report the uncompressed file size, but naive checks might just be asking for size on disk.

@JuliaSprenger
Copy link
Member

I see, so bin_filename.stat().st_size is not the appropriate way to access the file size of the bin file for file systems with automatic compression. @grahamfindlay Do you know a way to access the uncompressed size from python?

@grahamfindlay
Copy link

@JuliaSprenger Well, that was my theory, but it doesn't seem to hold up. os.stat(path).st_size does seem to correctly report the uncompressed file size, rather than size on disk. And at least for the files I just checked, it matches what is in the SpikeGLX metadata. @TomBugnon Can you let me know which files you used to test, and I can take a closer look?

@grahamfindlay
Copy link

@JuliaSprenger I just checked out all the files that @TomBugnon was using, and everything looks square to me. Using Python 3.7.12, os.stat(path).st_size properly reports uncompressed file size, matching SpikeGLX metadata content, for every file. @samuelgarcia How confident are you that this is the check that is leading to the warning in question?

@TomBugnon
Copy link

Also @samuelgarcia , if I try to instantiate from a folder in which there are only 'lf' files, no 'ap' file, I get the following errors

extractor = SpikeGLXRecordingExtractor(
    folder_path,
    stream_id='imec0.lf'
)

FileNotFoundError Traceback (most recent call last)
/home/tbugnon/projects/misc/test_sglx_spikeinterface/test_spikeinterface_sglx.ipynb Cell 6' in <cell line: 1>()
----> 1 extractor = SpikeGLXRecordingExtractor(
2 '/Volumes/neuropixel_archive/Data/chronic/CNPIX13-Al/1-6-2022/SpikeGLX/1-6-2022_g0/1-6-2022_g0_imec0',
3 stream_id='imec0.lf'
4 )

File ~/projects/misc/test_sglx_spikeinterface/spikeinterface/spikeinterface/extractors/neoextractors/spikeglx.py:49, in SpikeGLXRecordingExtractor.init(self, folder_path, stream_id)
47 if "lf" in self.stream_id:
48 meta_filename = meta_filename.replace(".lf", ".ap")
---> 49 probe = pi.read_spikeglx(meta_filename)
51 if probe.shank_ids is not None:
52 self.set_probe(probe, in_place=True, group_mode="by_shank")

File ~/miniconda3/envs/test_sglx_spikeinterface/lib/python3.10/site-packages/probeinterface/io.py:696, in read_spikeglx(file)
694 meta_file = Path(file)
695 assert meta_file.suffix == ".meta", "'meta_file' should point to the .meta SpikeGLX file"
--> 696 with meta_file.open(mode='r') as f:
697 lines = f.read().splitlines()
699 meta = {}

File ~/miniconda3/envs/test_sglx_spikeinterface/lib/python3.10/pathlib.py:1117, in Path.open(self, mode, buffering, encoding, errors, newline)
1115 if "b" not in mode:
1116 encoding = io.text_encoding(encoding)
-> 1117 return self._accessor.open(self, mode, buffering, encoding, errors,
1118 newline)

FileNotFoundError: [Errno 2] No such file or directory: '/Volumes/neuropixel_archive/Data/chronic/CNPIX13-Al/1-6-2022/SpikeGLX/1-6-2022_g0/1-6-2022_g0_imec0/1-6-2022_g0_t0.imec0.ap.meta'

And with stream_id = 'imec0.ap' (there is no imec0.ap in the folder)

extractor = SpikeGLXRecordingExtractor(
    folder_path,
    stream_id='imec0.ap'
)

AssertionError Traceback (most recent call last)
/home/tbugnon/projects/misc/test_sglx_spikeinterface/test_spikeinterface_sglx.ipynb Cell 6' in <cell line: 1>()
----> 1 extractor = SpikeGLXRecordingExtractor(
2 '/Volumes/neuropixel_archive/Data/chronic/CNPIX13-Al/1-6-2022/SpikeGLX/1-6-2022_g0/1-6-2022_g0_imec0',
3 stream_id='imec0.ap'
4 )

File ~/projects/misc/test_sglx_spikeinterface/spikeinterface/spikeinterface/extractors/neoextractors/spikeglx.py:38, in SpikeGLXRecordingExtractor.init(self, folder_path, stream_id)
36 if HAS_NEO_10_2:
37 neo_kwargs["load_sync_channel"] = False
---> 38 NeoBaseRecordingExtractor.init(self, stream_id=stream_id, **neo_kwargs)
40 # ~ # open the corresponding stream probe
41 if HAS_NEO_10_2 and "nidq" not in self.stream_id:

File ~/projects/misc/test_sglx_spikeinterface/spikeinterface/spikeinterface/extractors/neoextractors/neobaseextractor.py:40, in NeoBaseRecordingExtractor.init(self, stream_id, **neo_kwargs)
38 stream_id = stream_ids[0]
39 else:
---> 40 assert stream_id in stream_ids, f'stream_id {stream_id} is no in {stream_ids}'
42 self.stream_index = list(stream_ids).index(stream_id)
43 self.stream_id = stream_id

AssertionError: stream_id imec0.ap is no in ['imec0.lf']

@samuelgarcia
Copy link
Contributor Author

@TomBugnon : for the LF metadata there is no description of the probe because the meta file of LF do not contain the snsShankMap section.
So we read the meta of AP instead.
But in case you have there no AP so this trigger the bug we need avoid this.
But you will have probe loaded in that case.

@JuliaSprenger
Copy link
Member

@samuelgarcia any news on this?

@samuelgarcia
Copy link
Contributor Author

Hi julia,
for me this PR is ready to merge because it solve the original bug : file naming with multi gate and/or multi trigger.

The latest bug is related to spikeinterface/probeinterface because for LF there is no probe description in meta.

@JuliaSprenger
Copy link
Member

@samuelgarcia Cool, can you track the remaining problem in a separate issue and link it here? Then we can merge this for now.

@JuliaSprenger JuliaSprenger merged commit 50abfbc into NeuralEnsemble:master Aug 5, 2022
@apdavison apdavison modified the milestones: 0.10.3, 0.11.0 Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants