# EPhys Data Analysis Tutorial

This tutorial demonstrates a step-by-step approach to processing electrophysiology (EPhys) data for analysis using Meghan's multirecording_spikeanalysis script, the RCE Pilot 2 behavior spreadsheet, and directories of phy folders (with spike_times.npy, spike_clusters.npy, & cluster_group.tsv) for each ephys recording.

## Setup

Import all the libraries you'll be using, including Meghan's multirecording_spikeanalysis.py:

In [1]:
import pandas as pd
import numpy as np
import ast
import pickle
from pathlib import Path
import multirecording_spikeanalysis as spike

## Data Loading
First, we load the relevant EPhys data from the RCE Pilot 2 spreadsheet:

In [2]:
cols = ['condition ', 'session_dir', 'all_subjects', 'tone_start_timestamp', 'tone_stop_timestamp']

# Load the data
df = pd.read_excel('rce_pilot_2_per_video_trial_labels.xlsx', usecols=cols, engine='openpyxl')

## Preprocessing

Next, we rearrange the spreadsheet in order for it to prepare it for ephys recordings:

In [3]:
df2 = df.dropna() # Drop the rows missing data
df3 = df2.copy()
df3['all_subjects'] = df3['all_subjects'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x) # Make the 'all_subjects' column readable as a list
df4 = df3[df3['all_subjects'].apply(lambda x: len(x) < 3)] # Ignore novel sessions for now

## Data Structuring
We'll structure the data into a new DataFrame that aligns with our analysis goals:

In [4]:
# Initialize an empty list to collect data for the new DataFrame
new_df_data = []

for _, row in df4.iterrows():
    session_dir = row['session_dir']
    subjects = row['all_subjects']
    condition = row['condition ']

    # Split session_dir on '_subj_' and take the first part only
    # This ensures everything after '_subj_' is ignored
    base_session_dir = session_dir.split('_subj_')[0]

    for subject in subjects:
        subject_formatted = subject.replace('.', '-')
        # Append formatted subject to the base session_dir correctly
        subj_recording = f"{base_session_dir}_subj_{subject_formatted}"
        new_df_data.append({
            'session_dir': session_dir,
            'subject': subject,
            'subj_recording': subj_recording,
            'condition': condition if condition in ['rewarded', 'omission', 'both_rewarded', 'tie'] else ('win' if str(condition) == str(subject) else 'lose'),
            'tone_start_timestamp': row['tone_start_timestamp'],
            'tone_stop_timestamp': row['tone_stop_timestamp']
        })

# Convert list to DataFrame
new_df = pd.DataFrame(new_df_data)
new_df = new_df.drop_duplicates()

## Timestamp Dictionary Preparation
Prepare dictionaries of event timestamps to match with the ephys recordings:

In [5]:
# Prepare timestamp_dicts from new_df
timestamp_dicts = {}
for _, row in new_df.iterrows():
    key = row['subj_recording']
    condition = row['condition']
    timestamp_start = int(row['tone_start_timestamp']) // 20
    timestamp_end = int(row['tone_stop_timestamp']) // 20
    tuple_val = (timestamp_start, timestamp_end)

    if key not in timestamp_dicts:
        timestamp_dicts[key] = {cond: [] for cond in ['rewarded', 'win', 'lose', 'omission', 'both_rewarded', 'tie']}
    timestamp_dicts[key][condition].append(tuple_val)

# Convert lists in timestamp_dicts to numpy arrays
for subj_recording in timestamp_dicts:
    for condition in timestamp_dicts[subj_recording]:
        timestamp_dicts[subj_recording][condition] = np.array(timestamp_dicts[subj_recording][condition], dtype=np.int64)

## EPhys Recording Collection
Load EPhys recordings:

In [6]:
# Construct the path in a platform-independent way (HiPerGator or Windows)
ephys_path = Path('.') / 'export' / 'updated_phys' / 'non-novel' / 'all_non_novel'

ephys_data = spike.EphysRecordingCollection(str(ephys_path))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
20230618_100636_standard_comp_to_omission_D2_subj_1-4_t4b3L_box1_merged.rec
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
20230620_114347_standard_comp_to_omission_D4_subj_1-2_t3b3L_box_1_merged.rec
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
Unit 96 is unsorted & has 5811 spikes
Unit 96 will be deleted
Unit 95 is unsorted & has 6458 spikes
Unit 95 will be deleted
20230625_112913_standard_comp_to_both_rewarded_D4_subj_1-4_t3b3L_box1_merged.rec
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
20230621_111240_standard_comp_to_omission_D5_subj_1-4_t3b3L_box1_merged.rec
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
20230617_115521_standard_comp_to_omission_D1_subj_1-1_t1b3L_box1_merged.rec
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
20230624_105855_standard_comp_to_both_rewarded_D3_subj_1-2_t1b2L_box1_merged.rec
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
20230619_115321_standard_comp_to_omission_D3_subj_1-4_t3b3L_box2_merged.r

## Assign Dictionaries to Collection
Create dictionaries for each recording, and assign it and the subject number to the recording:

In [7]:
for recording in ephys_data.collection.keys():
    # Check if the recording key (without everything after subject #) is in timestamp_dicts
    start_pos = recording.find('subj_')
    # Add the length of 'subj_' and 3 additional characters to include after 'subj_'
    end_pos = start_pos + len('subj_') + 3
    # Slice the recording key to get everything up to and including the subject identifier plus three characters
    recording_key_without_suffix = recording[:end_pos]
    if recording_key_without_suffix in timestamp_dicts:
        # Assign the corresponding timestamp_dicts dictionary to event_dict
        ephys_data.collection[recording].event_dict = timestamp_dicts[recording_key_without_suffix]
        
        # Extract the subject from the recording key
        start = recording.find('subj_') + 5  # Start index after 'subj_'
        subject = recording[start:start+3]
        
        # Assign the extracted subject
        ephys_data.collection[recording].subject = subject

### (Optional) Save the ephys_data for later use:
If you don't, you'll have to redo the previous steps each time you want to do an analysis:

In [8]:
# pickle.dump(ephys_data, open("ephys_data.pkl", "wb"))

### To import the pickle you can use this:

In [9]:
# ephys_data = pickle.load(open('ephys_data.pkl', 'rb'))

## Analysis Initialization
Finally, initialize the spike analysis with the organized EPhys data (it would be nice to pickle this, but even with 4 CPUs & 64 GB RAM, it still crashed trying to pickle):

In [21]:
spike_analysis = spike.SpikeAnalysis_MultiRecording(ephys_data, timebin = 5, smoothing_window=250, ignore_freq = 0.5)

All set to analyze


### Now you can do functions like:

In [31]:
# Access the collection dictionary
recordings = spike_analysis.ephyscollection.collection

In [33]:
recording_name = '20230618_100636_standard_comp_to_omission_D2_subj_1-4_t4b3L_box1_merged.rec'
recording1 = recordings.get(recording_name)

if recording1 is None:
    print(f"Recording named {recording_name} not found.")
else:
    print(f"Recording {recording_name} successfully retrieved.")

Recording 20230618_100636_standard_comp_to_omission_D2_subj_1-4_t4b3L_box1_merged.rec successfully retrieved.


In [34]:
if recording1:
    # Example of accessing an attribute or method of the recording
    print(dir(recording1))  # Lists all methods and attributes of the recording
    # If there are specific operations you need to perform, do them here.

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'event_dict', 'freq_dict', 'get_spike_specs', 'get_unit_labels', 'get_unit_timestamps', 'labels_dict', 'path', 'sampling_rate', 'spiketrain', 'subject', 'timestamps_var', 'unit_array', 'unit_firing_rate_array', 'unit_firing_rates', 'unit_spiketrains', 'unit_timestamps', 'wilcox_dfs', 'zscored_events']


In [48]:
recording1.event_dict['win']

array([[  54962,   64962],
       [ 379962,  389962],
       [ 484962,  494962],
       [ 579962,  589962],
       [ 654962,  664962],
       [1554961, 1564961]])

In [49]:
preevent = recording1.event_dict['win'] - 10000

In [50]:
preevent

array([[  44962,   54962],
       [ 369962,  379962],
       [ 474962,  484962],
       [ 569962,  579962],
       [ 644962,  654962],
       [1544961, 1554961]])

In [None]:
recording1.unit_firing_rates

{65: array([4., 4., 4., ..., 0., 0., 0.]),
 123: array([4.8, 4.8, 4.8, ..., 0. , 0. , 0. ]),
 103: array([8.8, 8.8, 8.8, ..., 3.2, 3.2, 3.2]),
 83: array([2.4, 2.4, 2.4, ..., 8.8, 8.8, 8. ]),
 118: array([1.6, 1.6, 1.6, ..., 0. , 0. , 0. ]),
 93: array([0. , 0. , 0. , ..., 1.6, 1.6, 1.6]),
 99: array([0. , 0. , 0. , ..., 2.4, 2.4, 2.4]),
 105: array([0., 0., 0., ..., 0., 0., 0.]),
 87: array([0. , 0. , 0. , ..., 1.6, 1.6, 1.6]),
 19: array([0., 0., 0., ..., 0., 0., 0.]),
 9: array([0., 0., 0., ..., 0., 0., 0.])}

In [None]:
len(recording1.unit_firing_rates[65])

670988

In [None]:
len(recording1.unit_firing_rates[123])

670988

In [56]:
spike_analysis = spike.SpikeAnalysis_MultiRecording(ephys_data, timebin = 100, smoothing_window=250, ignore_freq = 0.5)

All set to analyze


In [57]:
recordings = spike_analysis.ephyscollection.collection

recording_name = '20230618_100636_standard_comp_to_omission_D2_subj_1-4_t4b3L_box1_merged.rec'
recording1 = recordings.get(recording_name)

In [69]:
recording1.spiketrain

array([14,  9,  7, ..., 11,  7,  9])

In [70]:
len(recording1.spiketrain)

33549

In [71]:
recording1.unit_spiketrains

{65: array([2, 2, 0, ..., 0, 0, 0]),
 123: array([2, 0, 0, ..., 0, 0, 0]),
 103: array([2, 2, 1, ..., 1, 0, 1]),
 83: array([0, 1, 1, ..., 1, 3, 1]),
 118: array([0, 1, 0, ..., 0, 0, 0]),
 93: array([0, 0, 0, ..., 1, 0, 1]),
 99: array([0, 0, 0, ..., 1, 1, 1]),
 105: array([0, 0, 0, ..., 0, 0, 0]),
 87: array([0, 0, 0, ..., 0, 1, 0]),
 19: array([0, 0, 0, ..., 0, 0, 0]),
 9: array([0, 0, 0, ..., 0, 0, 0])}

In [72]:
len(recording1.unit_spiketrains[65])

33549

In [73]:
recording1.unit_firing_rates

{65: array([3.  , 3.  , 3.  , ..., 0.92, 0.92, 0.92]),
 123: array([2.  , 2.04, 2.04, ..., 1.92, 1.92, 1.92]),
 103: array([6.16, 6.2 , 6.24, ..., 1.68, 1.68, 1.64]),
 83: array([3.16, 3.2 , 3.24, ..., 9.  , 8.96, 8.8 ]),
 118: array([2.04, 2.08, 2.12, ..., 0.12, 0.12, 0.12]),
 93: array([0.44, 0.44, 0.44, ..., 0.56, 0.56, 0.56]),
 99: array([0.2, 0.2, 0.2, ..., 2.4, 2.4, 2.4]),
 105: array([0.04, 0.04, 0.04, ..., 0.  , 0.  , 0.  ]),
 87: array([0.04, 0.04, 0.04, ..., 0.84, 0.84, 0.84]),
 19: array([0.36, 0.36, 0.36, ..., 0.08, 0.04, 0.04]),
 9: array([0.  , 0.  , 0.  , ..., 0.96, 0.96, 0.96])}

In [74]:
len(recording1.unit_firing_rates[65])

33549

In [77]:
np.mean(recording1.unit_spiketrains[65])

0.2686816298548392

In [78]:
np.mean(recording1.unit_firing_rates[65])

2.6793370890339507

In [84]:
def get_firing_rate(spiketrain, smoothing_window, timebin):
    """
    calculates firing rate (spikes/second)

    Args (3 total, 1 required):
        spiketrain: numpy array, in timebin (ms) bins
        smoothing_window: int, default=250, smoothing average window (ms)
            min smoothing_window = 1
        timebin: int, default = 1, timebin (ms) of spiketrain

    Return (1):
        firing_rate: numpy array of firing rates in timebin sized windows

    """
    weights = np.ones(smoothing_window) / smoothing_window * 1000 / timebin
    firing_rate = np.convolve(spiketrain, weights, mode="same")

    return firing_rate

fr_65_man = get_firing_rate(recording1.unit_spiketrains[65], 250, 100)

In [85]:
fr_65_man

array([3.  , 3.  , 3.  , ..., 0.92, 0.92, 0.92])

In [86]:
len(fr_65_man)

33549

In [87]:
np.mean(fr_65_man)

2.6793370890339507

In [88]:
smoothing_window = 250
timebin=100
weights = np.ones(smoothing_window) / smoothing_window * 1000 / timebin

In [89]:
weights

array([0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.

In [94]:
recording1.freq_dict

{65: 2.6867823023312964,
 123: 2.4372996323677625,
 103: 8.95037707725806,
 83: 10.921200749658166,
 118: 1.4062834371421187,
 106: 0.39225710116130313,
 93: 0.8390605925296871,
 99: 3.601850159903637,
 105: 0.6047793755746839,
 87: 2.4703851477392704,
 19: 0.5144648606416482,
 9: 1.6411607895092208}

In [97]:
recording_name2 = '20230620_114347_standard_comp_to_omission_D4_subj_1-2_t3b3L_box_1_merged.rec'
recording2 = recordings.get(recording_name2)

In [100]:
recording2.freq_dict

{42: 12.966737226679907,
 207: 4.308202978206947,
 262: 2.037371013189812,
 144: 1.7530050417911194,
 41: 1.583321643219883,
 162: 11.206125687711454,
 48: 10.816731543748842,
 45: 15.460205512925853,
 68: 15.887047027607634,
 259: 1.7571008479635286,
 153: 4.2567128434680885,
 269: 0.8229644830704971,
 53: 2.4940534014134332,
 168: 3.099647599762502,
 58: 3.0358700465064166,
 258: 2.8448299443219036,
 14: 1.9080605611751804,
 241: 5.830965201591923,
 245: 0.9730465235309185,
 28: 1.4361651785968967,
 43: 1.4897032164219592,
 124: 8.735476892997502,
 221: 1.8194156133008963,
 172: 0.4265489570980394,
 244: 0.5236780749008851,
 227: 0.7208618863440118,
 226: 1.869442960121037,
 266: 1.4376279665156142}

In [99]:
recording2.unit_spiketrains

{42: array([2, 1, 0, ..., 4, 2, 0]),
 207: array([2, 1, 1, ..., 0, 3, 0]),
 262: array([1, 0, 0, ..., 0, 0, 0]),
 144: array([1, 1, 4, ..., 0, 0, 1]),
 41: array([1, 0, 0, ..., 0, 0, 0]),
 162: array([2, 1, 1, ..., 2, 1, 0]),
 48: array([2, 0, 2, ..., 1, 0, 1]),
 45: array([1, 1, 3, ..., 3, 3, 0]),
 68: array([0, 1, 3, ..., 1, 0, 3]),
 259: array([0, 1, 0, ..., 0, 0, 0]),
 153: array([0, 1, 0, ..., 2, 0, 0]),
 269: array([0, 1, 0, ..., 0, 0, 0]),
 53: array([0, 1, 0, ..., 0, 0, 0]),
 168: array([0, 1, 0, ..., 0, 0, 0]),
 58: array([0, 0, 1, ..., 0, 0, 1]),
 258: array([0, 0, 1, ..., 0, 1, 1]),
 14: array([0, 0, 1, ..., 0, 1, 0]),
 241: array([0, 0, 1, ..., 0, 1, 1]),
 245: array([0, 0, 0, ..., 0, 0, 0]),
 28: array([0, 0, 0, ..., 0, 0, 0]),
 43: array([0, 0, 0, ..., 0, 0, 0]),
 124: array([0, 0, 0, ..., 1, 0, 0]),
 221: array([0, 0, 0, ..., 0, 1, 0]),
 244: array([0, 0, 0, ..., 0, 0, 0]),
 227: array([0, 0, 0, ..., 0, 0, 0]),
 226: array([0, 0, 0, ..., 0, 1, 0]),
 266: array([0, 0, 0, 

In [101]:
len(recording2.freq_dict)

28

In [102]:
len(recording2.unit_spiketrains)

27