# Ground truth recordings for validation of spike sorting algorithms


Spampinato from Institut de la Vision have publish data from mice retina recorded with from dense array.
The data contain one ground truth to benchmark spike sorting tools.

Here the official publication of this open dataset:
https://zenodo.org/record/1205233#.W9mq1HWLTIF


This datasets was used by Pierre Yger publish spyking circus:
https://elifesciences.org/articles/34518


Here a notebook that compare some sorter on theses recording.

Each recording have several units and **one** of theses have a ground truth recorded with juxta cellular.
The SNR on MEA is differents on each file so we can easily compare the false positive and true positive score by sorter and SNR.


all tar.gz files must in "rawfiles" path


In [10]:
import zipfile, tarfile
import re
import os, shutil

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from spiketoolkit.sorters import run_sorters
import spikeextractors as se

%matplotlib notebook

## global variables

In [9]:
# my working path
basedir = '/media/samuel/SamCNRS/DataSpikeSorting/pierre/zenodo/'

# input file
recording_folder = basedir + 'rawfiles/'

# where output will be
working_folder = basedir + 'run_comparison/'

# file_list
rec_names = ['20160415_patch2', '20170630_patch1', '20170627_patch1']

# sorter list
sorter_list = ['tridesclous', 'spykingcircus']


## Step 1 : unzip all

This extract tar.gz files to folder

In [None]:
for rec_name in rec_names:
    filename = recording_folder + rec_name + '.tar.gz'

    if os.path.exists(recording_folder+rec_name) and os.path.isdir(recording_folder+rec_name):
        continue
    t = tarfile.open(filename, mode='r|gz')
    t.extractall(recording_folder+rec_name)

## Step 2: run sorters on all files

Important note : the file have 256 channels but only 252 are usefull.
The PRB file contain all channels needed so we need to explicit **grouping_property='group'**
to be sure to only take in account the channel in the unique group.

In [None]:
# make a recordings dict
recordings = {}
for rec_name in rec_names:
    dirname = recording_folder + name + '/'

    for f in os.listdir(dirname):
        if f.endswith('.raw') and not f.endswith('juxta.raw'):
            raw_filename = dirname + f

    # raw files have an internal offset that depend on the channel count
    # a simple built header can be parsed to get it
    with open(raw_filename.replace('.raw', '.txt'), mode='r') as f:
        offset = int(re.findall('padding = (\d+)', f.read())[0])
    
    # recording
    rec = se.BinDatRecordingExtractor(raw_filename, 20000, 256, 'uint16', offset=offset, frames_first=True)
    chan_ids = rec.getChannelIds()

    rec = se.loadProbeFile(rec, basedir + 'mea_256.prb')

    recordings[name] = rec


In [None]:
# run then all
results = run_sorters(sorter_list, recordings, working_folder, grouping_property='group', debug=False)