Skip to content

Analyzer extension exit status#3347

Merged
samuelgarcia merged 17 commits intoSpikeInterface:mainfrom
jonahpearl:analyzer_extension_exit_status
Sep 11, 2024
Merged

Analyzer extension exit status#3347
samuelgarcia merged 17 commits intoSpikeInterface:mainfrom
jonahpearl:analyzer_extension_exit_status

Conversation

@jonahpearl
Copy link
Copy Markdown
Contributor

@jonahpearl jonahpearl commented Aug 28, 2024

Fixes #3329

Here's a first pass at adding information about an extension's completion (or not) to its metadata.

This PR adds a run_info.json file to each extension's folder. This contains keys:

  • run_completed (whether or not the call to AnalyzerExtension._run() finished),
  • data_loadable (whether or not AnalyzerExtension.load_data() can be called without an error)
  • runtime_s (the runtime of AnalyzerExtension._run() in seconds).

The core logic is implemented in AnalyzerExtension.run() and wraps the call to the extension-specific _run(). Then when AnalyzerExtension.get_extension() is called (and redirects to AnalyzerExtension.load() internally), if the run wasn't completed, it simply returns None as if the extension doesn't exist, and the user would be able to catch that and re-run the extension.

There is a mild complication with extensions that use the run_node_pipeline if called through compute_several_extensions, because then AnalyzerExtension.run() is never actually called. Instead, it looks like AnalyzerExtension.save() is used, so I added the relevant lines there to catch that, and assume that if the code makes it there, the run is completed.

TODO:

  • I'm not sure if the implementations in merge() and copy() are correct, I don't totally understand when those methods are expected to be called / what's wrapping them.
  • Write some tests?

Notes:

  • It is a bit of a slow-down to check if the data are loadable after each run. I'm ambivalent as to whether that check is actually useful, especially since data corruption is more likely to happen sometimes significantly after the run, not immediately after.
  • It seemed more reasonable to add a separate run_info.json file than trying to cram this into info.json, since that is more about the code itself.

Comment thread src/spikeinterface/core/sortinganalyzer.py Outdated
@samuelgarcia
Copy link
Copy Markdown
Member

Hi Jonah.
Thanks a lot. this looks like really cool.
I will have a deeper look into this.

data_loadable and run_completed are a bit redundant. (unles the computation is done but the wring failed.)

@alejoe91 alejoe91 added the core Changes to core module label Aug 29, 2024
Comment thread src/spikeinterface/core/sortinganalyzer.py Outdated
@jonahpearl
Copy link
Copy Markdown
Contributor Author

I also fixed the edge case where (for some reason) the run completes, but then the data file gets deleted; it will now raise a warning upon loading the sorting analyzer, and return None on calls to get_extension(), since the extension is no longer usable and should be re-computed as if it had never been.

@jonahpearl
Copy link
Copy Markdown
Contributor Author

jonahpearl commented Aug 29, 2024

Also — I still think my naive / default usage of this code would be to check has_extension(), and if that's true, assume everything is ok. Sure, the "check for None" pattern can be shown in the docs, but it's still odd that has_extension() now essentially means, is there a folder, regardless of if there's data. It looks like internally, sometimes it's used to mean, "is there a folder", like here, but then sometimes it means, "is there usable data", like here or here or here. One could imagine having a private function for "is there a folder here", and either getting rid of the public-facing version in favor of get_extension and checking none, or just making has_extension essentially do the none check for you.

Just taking the third example:

if sorting_analyzer.has_extension("spike_amplitudes"):
    peak_amplitudes = sorting_analyzer.get_extension("spike_amplitudes").get_data()
else:
    peak_amplitudes = None

right now, that will fail if the extension doesn't have data, since has_extension() will come back True but then get_extension() will return None and the call to get_data() will raise an AttributeError. I guess currently the preferred implementation would be

spike_amps = sorting_analyzer.get_extension("spike_amplitudes")
if spike_amps:
    peak_amplitudes = spike_amps.get_data()
else:
    peak_amplitudes = None

(or with a tertiary operator plus assignment expression it can fit in one line)

peak_amplitudes = _ext.get_data() if (_ext := sorting_analyzer.get_extension("spike_amplitudes")) else None

but the pattern still just feels confusing 🤷

Comment thread src/spikeinterface/core/sortinganalyzer.py
Comment thread src/spikeinterface/core/sortinganalyzer.py Outdated
@alejoe91
Copy link
Copy Markdown
Member

alejoe91 commented Sep 5, 2024

@samuelgarcia this is good to merge on my side. I added a back-compatibility logic for loading folders/zarr produced prior to this change

@alejoe91 alejoe91 added this to the 0.101.1 milestone Sep 10, 2024
Comment thread src/spikeinterface/core/sortinganalyzer.py Outdated
Comment thread src/spikeinterface/core/sortinganalyzer.py Outdated
Comment thread src/spikeinterface/core/sortinganalyzer.py Outdated
Comment thread src/spikeinterface/core/sortinganalyzer.py
Comment thread src/spikeinterface/core/sortinganalyzer.py Outdated
for r, result in enumerate(results):
extension_name, variable_name = result_routage[r]
extension_instances[extension_name].data[variable_name] = result
extension_instances[extension_name].run_info["runtime_s"] = runtime_s
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me but if we want the total run ttime then suming of run_time will be sring because this run time is shared.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the best estimate we can get :)

@samuelgarcia
Copy link
Copy Markdown
Member

Merci beaucoup Jonah et Alessio.

@samuelgarcia samuelgarcia merged commit 469187a into SpikeInterface:main Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sorting analyzer extensions don't know if they've been interrupted

3 participants