New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpikeGLXRecordingExtractor fails when gate > 0 #628
Comments
If you want to parse the filenames themselves for trigger/gate/run info, here's a snippet I wrote for that and often use (source): import re
def parse_sglx_fname(fname):
"""Parse recording identifiers from a SpikeGLX style filename.
Parameters
---------
fname: str
The filename to parse, e.g. "my-run-name_g0_t1.imec2.lf.bin"
Returns
-------
run: str
The run name, e.g. "my-run-name".
gate: str
The gate identifier, e.g. "g0".
trigger: str
The trigger identifier, e.g. "t1".
probe: str
The probe identifier, e.g. "imec2"
stream: str
The data type identifier, "lf" or "ap"
ftype: str
The file type identifier, "bin" or "meta"
Examples
--------
>>> parse_sglx_fname('3-1-2021_A_g1_t0.imec0.lf.meta')
('3-1-2021_A', 'g1', 't0', 'imec0', 'lf', 'meta')
"""
x = re.search(
r"_g\d+_t\d+\.imec\d+.(ap|lf).(bin|meta)\Z", fname
) # \Z forces match at string end.
run = fname[: x.span()[0]] # The run name is everything before the match
gate = re.search(r"g\d+", x.group()).group()
trigger = re.search(r"t\d+", x.group()).group()
probe = re.search(r"imec\d+", x.group()).group()
stream = re.search(r"(ap|lf)", x.group()).group()
ftype = re.search(r"(bin|meta)", x.group()).group()
return (run, gate, trigger, probe, stream, ftype) |
Thanks for the snippet @grahamfindlay Anyways, I think that NEO should allow to parse gates + streams.. @samuelgarcia thoughts? |
Hi. But seing your dataset, it appears that I was totally wrong! The g0 or gt1 relate to something. What does a gate represent ? |
@TomBugnon : I was not ware of you fork here https://github.com/CSC-UW/spikeinterface. |
Hi @samuelgarcia, If I got that properly, within a single "run" the gate can be opened or closed to stop and restart acquisition, saving files in a different directory each time, and the triggers increments are splitting the ongoing recording (for a given gate) into multiple files. I am not certain whether or not files are necessarily contiguous across triggers (which is the definition of spikeinterface segments?), but they definitely are not necessarily contiguous across gates. The output directory structure is one of the following depending on the "folder-per-probe" parameter:
One other thing is, many people use the
I'll try and find short real-like files to send you, and ping Bill Karsh so he double checks I'm not telling you nonsense |
@TomBugnon do you know if the meta file can change across triggers/gates? In order to have a multi-segment object, we need channels and locations to be the same (no reconfiguration of the probe) |
(@alejoe91 I just updated my comment) |
And here the trigger help This suggests that recorded channels can vary across gates within a run (but probably not across triggers? I can't say Bill will need to confirm..). However CatGT definitely allows concatenating files across gates. I don't know whether it checks that the channels recorded haven't changed when it does so (I assume it ought to) |
OK I see. If I understand correctly, trigger should be used as neo segment. They are in the same folder. diffrents gate are in diffrents folder so this can be ignored are only porpagated to annotation. Normally CatGT will not be anymore necessary with SI because it will read at once an entire folder of multiple trigger. |
That sounds good to me. About CatGT I was more hinting that it would be good that SGLX extractors also support catGT-processed files (even though concatenation could be performed via SI), because CatGT performs non trivial preprocessing and is widely used. For instance in our pipeline we first preprocess contiguous files with CatGT, and then use SI to (possibly concatenate non-contiguous CatGT-processed files and) run the sorter. Also I think the |
Once SpikeGLX acquisition is started most functional parameters can not be changed, so should be the same across multiple g and t indices. Things that are changeable in metadata include additional user annotations and file {length, SHA, timestamp}. There are several mini programs that determine how a sequence of files is written, these are called 'triggers' and produce files with advancing t index. Depending on the trigger parameters, the t-files may or may not overlap. A given trigger program (t-series) can be 'run' multiple times, and each time, the g index is advanced. All files with the same base run-name share parameters and come from the same underlying SpikeGLX hardware run (a continuous stream of consecutive samples), so have a time relation that allows them to be sewn back together (but with possible gaps and/or overlaps that need to be trimmed). The metadata 'firstSample' item is the starting sample number of this file (in that common underlying stream). CatGT can sew g and t series files back together, but optionally also performs various types of filtering and artifact removal; notably tshift, which undoes a subsample shape distortion due to the probe's ADC multiplexing. CatGT works correctly and will be properly maintained to keep up with new types of probes and special handling considerations. It makes sense to support CatGT output. It makes less sense to bypass CatGT and duplicate its operations into multiple downstream components. |
@TomBugnon : both @billkarsh : thnak you for this precision. I will try to propagate this into neo reader and so it will be available in spiekinterface. @billkarsh @TomBugnon : Also, note that we also have the To also handle CatGT format we will need some testing files that will be public for CI purposes. |
If some simple recordings in saline (i.e. non-neural) are okay to get started with, I can make you any testing files you need tomorrow. If I am following everything correctly, this should cover all the cases that might need handling:
So this will be 6 runs in total. Assuming the minimum of 2 probes, 2 t-files, and 2 g-indices per run, with very short (~10s each) t-files, this will be ~2GB per run, so ~12GB total. That okay @samuelgarcia ? |
Hi, theses files will be push here : https://gin.g-node.org/NeuralEnsemble/ephy_testing_data I was not aware of the (3b) possibility. Overlaping could be potentialy and conceptualy a problem. Files have to be ultra small. |
Guys, overlapping files is not a rare case. I really think you should let CatGT be your front end for ingesting SpikeGLX data. As the author of both programs I know exactly how they work and keep them in sync. That's my full time job. Why don't you want to use an existing and correct resource? |
@billkarsh I think we should allow also to loat CatGT processed data directly into SI. On the other hand, the sample/shift / destriping is also useful for NPIX Open Ephys data. Note that our implementation is ported from the IBL code base, which was yet another version of the same process. |
Tom and I agree with Bill, that supporting CatGT-processed files is a good idea. As Tom mentioned, it is widely used, and as Bill mentioned, it is fastidiously maintained. I would add that it is also multi-platform and very fast. And many of the functions it performs are quite necessary for correctness (e.g. the demultiplexed common-average referencing). If CatGT-processed files were not supported, it would be necessary to re-implement much of its functionality. And that would almost certainly be more work than simply supporting CatGT. Of course, it would be nice to just But in any case, I think we are all on the same page that it would be great for |
I want the community to have access to correct implementations, where that is possible. A few points. (1) Regarding "demultiplexed common-average referencing," that sounds like the old global demux, which is now replaced by the preferred (tshift + gblcar). (2) But in either case, each probe class/type has its own channel-to-ADC mapping, and doing any multiplexing correction needs an awareness of the probe type, which is in SpikeGLX metadata and observed in CatGT. (3) Does the IBL destriper case out probe type for all probes? (4) Does Open Ephys record probe type? (5) Just a thought. Instead of having the user first call CatGT and then have SI import that, why not have SI call CatGT? Admittedly the CatGT commandline is complicated and automating the call is helpful. (6) If SI could generate a SpikeGLX metadata file for Open Ephys data, then all data sets could be processed by CatGT. Not all SpikeGLX metadata items are needed for processing by CatGT, but if the needed ones are not available in an Open Ephys dataset then fixing that deficit is super valuable. |
@billkarsh let's have a chat about this! I would include @jsiegle and @oliche in the loop :) (btw IBL and SI have Tshift + global demux too) |
Another possibility would also be to collaborate on the SI side and make sure that the implementations match and are constantly up to date. The nice thing about the SI implementation is that it's lazy, so you just process the traces that you ask for (e.g. for visualization) |
Yes, you're right @billkarsh. I'm stuck on the old name, but I mean tshift +gblcar. And actually, as you suggest, here in our lab we do something similar to what you describe. We build up a CatGT command (source here)* and then call the command line utility from Python. This makes things very user-friendly, especially for our users who aren't so comfortable with UNIX or the shell. But also is just very nice because all our data organization / path management / experiment info databases / processing pipelines that influence the CatGT call are in Python. Works really well for us, and our calls to SpikeInterface are basically wrappers that include CatGT handling. For what it's worth, I think it would be really cool if the Python SpikeGLX DataFile tools that you and Jennifer distribute here were hosted on PyPI. Maybe such a utility could include an easy Python interface for calling CatGT. By the way, re-reading this thread, I wonder if one of your questions what about why one would ever want to load the raw SpikeGLX (i.e. before CatGT) into Python. If so, the reason is just storage costs. I regularly do experiments that are 4+TB of raw data. If I have to keep that around for archival/backup purposes, and then also keep a CatGT-processed version around for analysis, it's pricey! So I basically run CatGT, write the output to a RAID0 NVME array, sort the CatGT output, then delete the CatGT files immediately. Then I load raw SpikeGLX LFPs for later analysis along side the spike sorted data. Sorry if that answered a question you weren't asking :) *And yes, this is old, from back before you offered native CatGT for Linux, and we were running the Windows executable with Wine :) We now use the Linux utility. |
I'm a super friendly guy, trust me. I don't have any free time and have a very full agenda already. I'm not keen on working on a parallel version in SI (or other) of what I already cleared off my plate in CatGT. It's a command line tool so that it can plug into other workflows. Also the source code is there which answers essentially all questions about how I actually do it, so attending meetings to say "how I think I did it" wouldn't even be as accurate. BTW, searching the CatGT sources for ["xxx"] finds CatGT reads of metadata items, so you can see what items are required and how they are parsed and interpreted. That's my first order answer while I reconsider, since I am trying to help as best I can. |
@samuelgarcia Sample data got. I opened a GIN issue per the instructions at the link you provided. |
I am also adding here a collection of CatGT output produced from these sample files. This output uses the latest version of CatGT (v3.0). It should be enough to properly test ingest of CatGT output in Neo (it covers concatenation of t-files within and across g-indices, concatenation of continuous and discontinuous recordings, supercat concatenation of multiple runs, and output in both folder-per-probe and single-folder format). Data: sample_data_v2.zip |
Hi Graham. |
@grahamfindlay I have this file list
Some files have the |
CatGT output are labeled as _tcat. One of the CatGT operations sews a t-series together, e.g. _t0+_t1+_t2+...+_tN becomes _tcat, signifying "concatenated t's." But we generalize so that all CatGT processed output bin/meta files are labeled _tcat. |
Thanks. |
Hi Sam, Sorry about that, must have been a mistake. If you check the metadata for the
The second command format is the one I intended to use. I must have just forgot to remove the first files from the sample data. You can remove them. But in general, although I never actually do this, it is possible to have catGT output in the direct output folder if the Thanks, |
FWIW, CatGT sends output to the input folders by default and to an optional destination path if specified. Having the input file set and the CatGT output for that set is an ideal opportunity to test that ingestion of these agree. If there is any difference that's a bug. Also the nomenclature is "_tcat" There would never be a following underscore. This is always followed by a file type string (dot something). |
@grahamfindlay @billkarsh : I propose to move the discussion where I did the code here : |
Hi @samuelgarcia and all,
Initializing a
spikeinterface.extractors.SpikeGLXRecordingExtractor
object from a probe directory with gate index 0 works properly:... But it fails when the gate index is >0
Same issue when instantiating directly the
SpikeGLXRawIO
object from neo. It seems like the gate (and not only the stream) should be either passed as a kwarg or parsed from the folder name?Thanks!
Spikeinterface version = 0.94.1.dev0
neo version = 0.12.2
The text was updated successfully, but these errors were encountered: