# Postprocessing

We're going to process some sorting results, and calculate
- Waveforms
- Templates
- Features. Such as amplitudes, PCAs etc.

This is good chance to visualize some of the output, and look for suspicious sorted units!

For this tutorial, we'll use some simulated data that has already been sorted. This can be found in the `dataset_postprocessing` folder.

In [None]:
import spikeinterface as si
import spikeinterface.postprocessing as sipp
import spikeinterface.widgets as sw

from pathlib import Path

base_folder = Path(“path/to/SpikeInterface Dataset Tutorial”)

postprocessing_folder = base_folder/"dataset_postprocessing"

recording = si.load_extractor(postprocessing_folder / "recording")
sorting = si.load_extractor(postprocessing_folder / "sorting_mysterious")

Let's take a look...

In [None]:
recording

In [None]:
sorting

Now we'll combine these into a a single sorting analyzer called `analyzer_in_memory`.

By default, the analyzer is saved in memory. When it's in memory, computations involving the analyzer are very fast. But, of course, this takes up memory.

In [None]:
analyzer_in_memory = si.create_sorting_analyzer(sorting=sorting, recording=recording)


Note that when you create a sorting analyzer, it automatically calculates the _sparsity_. This creates a mask for each unit, so that only relevant channels are kept. This can greatly speed up computation for high density probes. We'll see it in action later!

If your working locally (e.g. on your laptop at a workshop) you'll probably want to save your analyzer in a folder. Let's do that now


In [None]:
analyzer = analyzer_in_memory.save_as(format="binary_folder", folder="my_sorting_analyzer")

Note that `analyzer_in_memory` is still in memory, while analyzer is not. 

_{ Note: you can save your analyzer in a folder from the start by running:_ \
_analyzer = si.create_sorting_analyzer(sorting=sorting, recording=recording, format="binary_folder", folder="my_sorting_analyzer")_ }

In [None]:
print(analyzer)
print(analyzer_in_memory)

## Extensions

The physical information we're interested in is computed using _extensions_. We can compute the `waveform` extension as follows:

In [None]:
analyzer.compute("waveforms")

Oh no - an error! The waveforms extensions requires the random_spikes extension. In fact, many extensions depend on other extensions. Here are all the current extension in spike interface and how they depend on one another:

![Hello](parent_child.svg)

So, we should calculate `random_spikes` as well as waveforms. In fact we can calculate several extensions in one funciton call. Here we'll calculate random_spikes, waveforms and templates:

In [None]:
si.set_global_job_kwargs(n_jobs=4)
analyzer.compute(["random_spikes", "waveforms", "templates"])

Since `analyzer` is saved as a folder, the extensions will appear in the folder too. Let's have a look...

...

...

...

Great! 

What happens when you recalculate an extension? Well, it depends what it depends on. For example, waveforms depend on random spikes. This is because the waveforms are calculated using a random sampling of spikes. So if we recalculate random spikes, we'll get different waveforms. If our random sampling is good and representative, the waveforms won't change much. But they do. So if we recalculate random spikes, then our waveforms are inconsistent with our new random spikes. To keep things consistent when extensions are recomputed, spikeinterface _deletes_ the extensions which depend on the recomputed extension. Let's see this in action:

In [None]:
analyzer.compute("random_spikes")

Now, check your folder. The waveforms and templates are gone.

But we do want them for this tutorial. So let's calculate them again.

In [None]:
analyzer.compute(["waveforms", "templates"])

You can access the extension data using the `get_extension` and `get_data` methods.

In [None]:
analyzer.get_extension("templates").get_data()

But it's a little awkward to work with without visualization...

Soon, we'll visualise lots of interesting stuff. These will rely on the more extensions, which we'll now calculate in a slightly different way: using a dictionary. This might suit your coding style better.

In [None]:
extensions_to_compute = {
    #'principal_components': {
    #    'n_components': 4
    #},
    'spike_amplitudes': {},
    'amplitude_scalings': {},
    'spike_locations': {},
    'template_metrics': {},
    'template_similarity': {},
    'unit_locations': {
        'method': 'monopolar_triangulation'
    },
}

analyzer.compute(extensions_to_compute)

### Exercises

1. Run this notebook
2. Save the sorting analyzer using the Zarr format. More details here: https://spikeinterface.readthedocs.io/en/latest/modules/postprocessing.html
3. Try varying the keywords in one of the extensions. More details here: https://spikeinterface.readthedocs.io/en/latest/modules/postprocessing.html#available-postprocessing-extensions
4. Run the next code block, the first one in the "Widgets" section.

## Widgets

We'll now have a look at the information we've calculated, using a _widget_. These are used to make graphical, interactive output in Jupyter notebooks. They can be a bit fiddely to set up. For instance, some of the most interactive features fail in VSCode (for me!).

Let's plot the unit summary. This contains the unit location, template, template on the most important channel, the autocorrelogram and the amplitude density plot (phew!)

In [None]:
# activate the matplotlib widget
import matplotlib.pyplot
%matplotlib widget

sw.plot_unit_summary(analyzer, unit_id=39, figsize=(8,4))

_(Note: the recording has 32 channels, but only ~15 are shown. This is thanks to the sparisty discussed earlier.)_

Beautiful! There are _a lot_ of widgets: https://spikeinterface.readthedocs.io/en/latest/modules/widgets.html#available-plotting-functions

They can be very useful when checking if your units look reasonable. For instance, we can have a look at the unit locations. One way is to get the data and have a look:

In [None]:
print(analyzer.get_extension("unit_locations").get_data())

Another way is to use a widget:


In [None]:
sw.plot_unit_locations(analyzer, backend="ipywidgets")

Earlier, I had a look at the data and noticed that units 0, 16, 29 and 34 were very close together...

Units that are close together are candidates for oversplitting. Maybe our sorting algorithm has split one unit into two. We can investigate how similar their firing rates are by looking at the cross correlograms.

In [None]:
sw.plot_crosscorrelograms(analyzer.sorting,  unit_ids=[0,16,29,34])

Very suspicious! Units 16 and 29 almost always spike at the same time. Let's take a look at their templates...

In [None]:
sw.plot_unit_templates(analyzer, unit_ids=[16,29])

Should these be merged? That's not the point of this tutorial! Instead, we've seen how the widgets can be used to do some detective work. Another very useful widget is related to spike amplitudes:

In [None]:
sw.plot_amplitudes(analyzer, plot_histograms=True, backend="ipywidgets")

Two things to look out for: drift and sudden amplitude cut-offs.
There's also a nice way to view all the amplitude distributions at once:

In [None]:
sw.plot_all_amplitudes_distributions(analyzer, figsize=(10,4))

### Exercises

1. Run this notebook
2. Go have a look at the **Widget Tutorial** page of the documentation: (https://spikeinterface.readthedocs.io/en/latest/tutorials/index.html#widgets-tutorials)
and plot a widget we've not looked at yet. Note: some of the widgets take in a _recording_ or _sorting_. For these you need to pass `analyzer.recording` or `analyzer.sorting` instead of `analyzer`.
3. Try and find a suspicious unit. One that should be merged or split!



# END

That's the end of this notebook. Hopefully you've learned about
- Combining your recording and sorting into a sorting analyzer
- Calculating extensions, their dependences and what happens when you recompute
- Accessing extension data and visualising them using widgets