<br>


# Collect CG data windows during SWR occurrence

In this notebook we collect all CG activity windows during previously detected ripples. 

To do it, it:
- Applies an exclusion window of 500 ms before to prevent contamination - so only ripples separated by more that 500 ms will be included in the analysis;
- Collects necessary CG data and stores it in a dataframe

### Imports

In [1]:
import pandas as pd
from functools import reduce
import os
import re
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import random
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

<br>

### Define function(s)

In [55]:
def associate_ripples_with_chunk_file(row, chunk_info):
    
    '''
    Add chunk number to row using the chunk_info dataframe
    '''
    mask = (chunk_info['session']==row['path']) & \
        (
        # check for ripple start time
        ((chunk_info['t_first']<=row['ephys_tfirst']) & (chunk_info['t_final']>=row['ephys_tfirst'])) |\
        # check for ripple end time
        ((chunk_info['t_first']<=row['ephys_tlast']) & (chunk_info['t_final']>=row['ephys_tlast']))
        )
    
    # Get match(es)
    chunk=chunk_info[mask]
    
    if  len(chunk)==1:
        return pd.Series([chunk.nr.iloc[0], np.nan])
    elif len(chunk==2):
        return pd.Series([chunk.nr.iloc[0], chunk.nr.iloc[1]])
    else:
        return np.nan, np.nan
    

<br>

### Open datasets

In [56]:
main_path = '/VOLUMES/E/EPHYS/data/'    
local_path = 'PreProcessedData/'

# Open SWR events
ripples_classified = pd.read_csv(os.path.join(local_path, 'ripples_classified.csv'))

<br>

### Apply exclusion window 
Only keep ripples with no ripple occurrence in the x s before 

In [57]:
window_size = 0.5

In [58]:
# Add exclusion window
ripples_sorted=ripples_classified.sort_values(['session_code','start_time'], ascending=True)
# Calculate time from previous event detected
ripples_sorted['time_diff_to_previous']=ripples_sorted.groupby(['session_code']).start_time.diff()
# Calculate time to following event detected
ripples_sorted['time_diff_to_next']=ripples_sorted.groupby(['session_code']).start_time.diff(-1)

In [59]:
# Create exclusion mask
exclusion_mask =(
    ((ripples_sorted.time_diff_to_previous>=window_size) & \
     (ripples_sorted.time_diff_to_next <= - window_size)) |\
    # for first ripple of sessions
    ((ripples_sorted.time_diff_to_previous.isna()) & \
     (ripples_sorted.time_diff_to_next <= - window_size)) |\
    # for last ripple of sessions
    ((ripples_sorted.time_diff_to_previous >= window_size) & \
     (ripples_sorted.time_diff_to_next.isna()))
)

# Get ripples for analysis
ripples_analysis=ripples_sorted[exclusion_mask]

In [60]:
len(ripples_analysis)

628

In [61]:
ripples_analysis.head()

Unnamed: 0,start_time,end_time,trial_nr,outcome,run_type,rat_code,session_code,x_ripple,y_ripple,duration_sec,duration_ms,phase,time_diff_to_previous,time_diff_to_next
706,127.722667,127.87,1.0,1.0,S,MAG,20190126160731,179.647251,185.001791,0.147333,147.333333,Sample,3.162333,-1.289667
707,129.012333,129.256,1.0,1.0,S,MAG,20190126160731,179.584625,183.255619,0.243667,243.666667,Sample,1.289667,-9.183333
710,139.889333,139.945,1.0,1.0,S,MAG,20190126160731,207.251575,142.12047,0.055667,55.666667,Sample,1.38,-5.138667
711,145.028,145.082667,1.0,1.0,S,MAG,20190126160731,178.55532,172.187209,0.054667,54.666667,Sample,5.138667,-4.355667
712,149.383667,149.442,1.0,1.0,S,MAG,20190126160731,178.33335,169.518474,0.058333,58.333333,Sample,4.355667,-111.42


<br>

### Label ripples by chunk

So we only need to open CG data from the chunks required

In [62]:
folders=os.listdir(main_path)

# Add path to each ripple
ripples_analysis['path']=ripples_analysis.apply(lambda x: 
                                            [f for f in folders if str(x.session_code) in f][0], 
                                            axis=1)


In [65]:
chunk_info = pd.DataFrame(columns=['session', 'nr', 't_first', 't_final'])

# For each path and ephys csv file get first and last timestamps
for folder in ripples_analysis.path.unique():
    
    timestamps_path = os.path.join(main_path, folder, 'Ephys_timestamps')
    
    for csv in os.listdir(timestamps_path):
        
        # Get first and last timestamps from each ephys timestamps csv
        csv_content = pd.read_csv(os.path.join(timestamps_path, csv))
        t1 = csv_content.head(1)['0'].iloc[0]
        tn = csv_content.tail(1)['0'].iloc[0]
        # Store all info in dataframe
        row = pd.DataFrame([{
            'session':folder, 
            'nr': re.search(r'([0-9]{1,2})', csv).group(1),
            't_first': t1,
            't_final': tn}])
        chunk_info = pd.concat([chunk_info, row])

chunk_info = chunk_info.reset_index(drop=True)

In [70]:
window = .5

In [71]:
# Calculate timestamps to read for each ripple
ripples_analysis['ephys_tfirst']=ripples_analysis['start_time']-window
ripples_analysis['ephys_tlast']=ripples_analysis['start_time']+window

In [72]:
#  Add chunk number to row using the chunk_info dataframe based on start and end time of ech ripple
ripples_analysis[['chunk_start', 'chunk_end']]=ripples_analysis.apply(associate_ripples_with_chunk_file, args=(chunk_info,), axis=1)

In [73]:
# Order by dataset and label ripple events
# This will be the ref_id to cross between ripples_analysis and the dataframe containing ephys data
ripples_analysis.sort_values(by=['path', 'chunk_start'], inplace=True)
ripples_analysis['ripple_nr']=range(0,len(ripples_analysis))

<br>
<br>

### Get data from CG tetrodes

This section includes:
- Order ripples_analysis by dataset and label ripple events (numbering);
- For each dataset x tt x chunk combination, open the tetrode data and collect 500 ms before and after each event. Store in a dataframe;
- Storage of each CG activity data into a dictionary __data_merged__ containing the ephys data sorroundingthe detected event. Each key holds the CG ephys data for 1 ripple
- __ripple_analysis__ will still hold the properties of each event.

In [156]:
# Create list with CG tetrode folder names
cg_tetrodes =['TT{}'.format(tt_nr) for tt_nr in range(1,15)]
cg_tetrodes

['TT1',
 'TT2',
 'TT3',
 'TT4',
 'TT5',
 'TT6',
 'TT7',
 'TT8',
 'TT9',
 'TT10',
 'TT11',
 'TT12',
 'TT13',
 'TT14']

In [75]:
ripples_analysis.head()

Unnamed: 0,start_time,end_time,trial_nr,outcome,run_type,rat_code,session_code,x_ripple,y_ripple,duration_sec,duration_ms,phase,time_diff_to_previous,time_diff_to_next,path,ephys_tfirst,ephys_tlast,chunk_start,chunk_end,ripple_nr
816,83.75725,83.81275,1.0,1.0,S,HOM,20191113131818,142.895008,123.167376,0.0555,55.5,Sample,,-17.771,HOMERO_DNMP16_20trials_20191113131818,83.25725,84.25725,1,,0
872,492.71775,492.79775,4.0,1.0,S,HOM,20191113131818,181.876754,188.205562,0.08,80.0,Sample,53.589,-2.1755,HOMERO_DNMP16_20trials_20191113131818,492.21775,493.21775,10,,1
875,550.10425,550.15575,4.0,1.0,T,HOM,20191113131818,185.688022,23.871025,0.0515,51.5,Test (Past-choice),54.8125,-1.0605,HOMERO_DNMP16_20trials_20191113131818,549.60425,550.60425,12,,2
876,551.16475,551.18825,4.0,1.0,T,HOM,20191113131818,183.437648,21.86766,0.0235,23.5,Test (Past-choice),1.0605,-91.5735,HOMERO_DNMP16_20trials_20191113131818,550.66475,551.66475,12,,3
877,642.73825,642.79525,5.0,1.0,T,HOM,20191113131818,32.472432,117.420806,0.057,57.0,Delay,91.5735,-3.0075,HOMERO_DNMP16_20trials_20191113131818,642.23825,643.23825,14,,4


In [76]:
ripples_analysis.ripple_nr.nunique()

628

### SOME DATASETS DONT HAVE ALL TT FOLDERS FROM CG. NEED TO FETCH 

In [199]:
from functools import reduce 

In [203]:
def fetch_cg_data(ripple, chunk):
    
    '''
    Fetches the CG data for the given chunk, for all TTs available. 
    Returns a dataframe with ripple nr and timestamp as index, TT as column and ephys data as values.    
    '''
    
    df_list = []
    
    # -- Read CG data from tetrodes
    tt_folders = [f for f in os.listdir(os.path.join(main_path, ripple.path)) if f in cg_tetrodes]

    # -- Read timestamps
    timestamps = pd.read_csv(
                        os.path.join(
                            main_path,
                            ripple.path,
                            'Ephys_timestamps', 
                            'timestamps_chunk{}.csv'.format(chunk)
                        )
                    )
       
        
    for folder in tt_folders:
        
        file_to_read = '{}_chunk{}.csv'.format(folder, chunk)          
                
        # Get ripples ephys from that chunk    
        chunk_data = pd.read_csv(os.path.join(
                main_path, 
                ripple.path, 
                folder,
                file_to_read))
                
        # Timestamps and chunk data must be the same length. 
        # Otherwise they might be wrongly paired or have differences due to crashes
        try:
            assert len(chunk_data)==len(timestamps)
        except: 
            print("Length difference found! Chunk len:{}, Timestamps len: {}".format(
            len(chunk_data), len(timestamps)))
            continue
                                
        # Get rows to read from timestamps file
        timestamps_ripple = timestamps.loc[timestamps['0'].between(
                ripple['ephys_tfirst'], ripple['ephys_tlast']), '0']
                    
        indices_to_read=timestamps_ripple.index.tolist()
           
        
        # Get chunk data using rows_to_read - read first channel (0 is an index)
        ripple_data = chunk_data.iloc[indices_to_read, 1].values
                      
        # Store into a multi-index dataframe
        iterables = [[ripple['ripple_nr']], timestamps_ripple.values.tolist()]
        multi_index = pd.MultiIndex.from_product(
                        iterables, 
                        names=['ripple_nr', 'timestamp'])
                    
        tt_stored = pd.DataFrame(ripple_data, index=multi_index, columns=[folder])
        
        df_list.append(tt_stored)  
     
    
    # Merge DataFrames in list of dataframes        
    all_tts_stored = reduce(
                        lambda left, right:     
                             pd.merge(left , right, on = ['ripple_nr', 'timestamp'], how = 'left'),
                         df_list
                    )
    
    return all_tts_stored

In [216]:
def collect_cg_data_for_ripple(ripple):
    
    ''' 
    For each ripple it collects all CG data for the given window, for all chunks required and TTs available   
    Returns a dataframe with ripple_nr and timestamp as index, TT as column and ephys points as values
    '''

    # Chunks to open for ripple
    chunks_to_open = ripple[['chunk_start', 'chunk_end']].dropna().unique()
    
    print('RIPPLE NR: {}'.format(ripple.ripple_nr))
    
    for chunk in chunks_to_open:
        print('---- CHUNK: {}'.format(chunk))
        
        # Fetches all data for ripple windown in chunk (all TTs)
        data = fetch_cg_data(ripple, chunk)
         
        # In case there is more than 1 chunk opened for a given ripple, concat data
        try:
            all_data = pd.concat([all_data, data])
        except:
            all_data = data
          
    
    return all_data             

In [217]:
data_list = []

for i, row in ripples_analysis.iterrows():
    
    ripple_cg_data = collect_cg_data_for_ripple(row)
    data_list.append(ripple_cg_data)

RIPPLE NR: 0
---- CHUNK: 1
RIPPLE NR: 1
---- CHUNK: 10
RIPPLE NR: 2
---- CHUNK: 12
RIPPLE NR: 3
---- CHUNK: 12
RIPPLE NR: 4
---- CHUNK: 14
RIPPLE NR: 5
---- CHUNK: 14
RIPPLE NR: 6
---- CHUNK: 14
RIPPLE NR: 7
---- CHUNK: 14
RIPPLE NR: 8
---- CHUNK: 14
RIPPLE NR: 9
---- CHUNK: 14
RIPPLE NR: 10
---- CHUNK: 14
RIPPLE NR: 11
---- CHUNK: 14
RIPPLE NR: 12
---- CHUNK: 15
RIPPLE NR: 13
---- CHUNK: 15
RIPPLE NR: 14
---- CHUNK: 16
RIPPLE NR: 15
---- CHUNK: 17
RIPPLE NR: 16
---- CHUNK: 17
RIPPLE NR: 17
---- CHUNK: 17
RIPPLE NR: 18
---- CHUNK: 17
RIPPLE NR: 19
---- CHUNK: 2
RIPPLE NR: 20
---- CHUNK: 2
RIPPLE NR: 21
---- CHUNK: 20
RIPPLE NR: 22
---- CHUNK: 21
RIPPLE NR: 23
---- CHUNK: 21
RIPPLE NR: 24
---- CHUNK: 21
RIPPLE NR: 25
---- CHUNK: 21
RIPPLE NR: 26
---- CHUNK: 22
RIPPLE NR: 27
---- CHUNK: 23
RIPPLE NR: 28
---- CHUNK: 23
RIPPLE NR: 29
---- CHUNK: 23
---- CHUNK: 24
RIPPLE NR: 30
---- CHUNK: 25
RIPPLE NR: 31
---- CHUNK: 25
RIPPLE NR: 32
---- CHUNK: 25
RIPPLE NR: 33
---- CHUNK: 25
RIPPLE NR: 3

RIPPLE NR: 275
---- CHUNK: 23
RIPPLE NR: 276
---- CHUNK: 23
RIPPLE NR: 277
---- CHUNK: 23
RIPPLE NR: 278
---- CHUNK: 23
RIPPLE NR: 279
---- CHUNK: 25
RIPPLE NR: 280
---- CHUNK: 27
RIPPLE NR: 281
---- CHUNK: 27
RIPPLE NR: 282
---- CHUNK: 29
RIPPLE NR: 283
---- CHUNK: 31
RIPPLE NR: 284
---- CHUNK: 33
RIPPLE NR: 285
---- CHUNK: 35
RIPPLE NR: 286
---- CHUNK: 37
RIPPLE NR: 287
---- CHUNK: 37
RIPPLE NR: 288
---- CHUNK: 37
RIPPLE NR: 289
---- CHUNK: 4
RIPPLE NR: 290
---- CHUNK: 4
RIPPLE NR: 291
---- CHUNK: 4
RIPPLE NR: 292
---- CHUNK: 4
RIPPLE NR: 293
---- CHUNK: 4
RIPPLE NR: 294
---- CHUNK: 41
RIPPLE NR: 295
---- CHUNK: 43
RIPPLE NR: 296
---- CHUNK: 43
RIPPLE NR: 297
---- CHUNK: 43
RIPPLE NR: 298
---- CHUNK: 43
---- CHUNK: 44
RIPPLE NR: 299
---- CHUNK: 6
RIPPLE NR: 300
---- CHUNK: 6
RIPPLE NR: 301
---- CHUNK: 6
RIPPLE NR: 302
---- CHUNK: 7
RIPPLE NR: 303
---- CHUNK: 8
RIPPLE NR: 304
---- CHUNK: 8
RIPPLE NR: 305
---- CHUNK: 8
RIPPLE NR: 306
---- CHUNK: 8
RIPPLE NR: 307
---- CHUNK: 9
RIPPLE NR

RIPPLE NR: 547
---- CHUNK: 10
---- CHUNK: 11
RIPPLE NR: 548
---- CHUNK: 12
RIPPLE NR: 549
---- CHUNK: 14
RIPPLE NR: 550
---- CHUNK: 15
RIPPLE NR: 551
---- CHUNK: 16
RIPPLE NR: 552
---- CHUNK: 16
RIPPLE NR: 553
---- CHUNK: 19
RIPPLE NR: 554
---- CHUNK: 21
RIPPLE NR: 555
---- CHUNK: 21
RIPPLE NR: 556
---- CHUNK: 25
RIPPLE NR: 557
---- CHUNK: 27
RIPPLE NR: 558
---- CHUNK: 28
RIPPLE NR: 559
---- CHUNK: 28
RIPPLE NR: 560
---- CHUNK: 28
RIPPLE NR: 561
---- CHUNK: 30
RIPPLE NR: 562
---- CHUNK: 30
RIPPLE NR: 563
---- CHUNK: 32
RIPPLE NR: 564
---- CHUNK: 32
RIPPLE NR: 565
---- CHUNK: 32
RIPPLE NR: 566
---- CHUNK: 33
RIPPLE NR: 567
---- CHUNK: 34
RIPPLE NR: 568
---- CHUNK: 34
RIPPLE NR: 569
---- CHUNK: 34
RIPPLE NR: 570
---- CHUNK: 34
RIPPLE NR: 571
---- CHUNK: 36
RIPPLE NR: 572
---- CHUNK: 36
RIPPLE NR: 573
---- CHUNK: 36
RIPPLE NR: 574
---- CHUNK: 36
---- CHUNK: 37
RIPPLE NR: 575
---- CHUNK: 39
RIPPLE NR: 576
---- CHUNK: 39
RIPPLE NR: 577
---- CHUNK: 4
RIPPLE NR: 578
---- CHUNK: 4
RIPPLE NR: 5

AttributeError: module 'pandas' has no attribute 'contact'

In [228]:
cg_data = pd.concat(data_list).reset_index()
cg_data.head()

Unnamed: 0,ripple_nr,timestamp,TT1,TT2,TT3,TT4,TT5,TT6,TT7,TT8,TT9,TT10,TT11,TT12,TT13,TT14
0,0,83.25725,151.905,105.105,158.73,191.685,-38.61,121.485,115.245,146.835,149.76,147.225,8.97,73.905,37.83,70.59
1,0,83.25775,144.495,125.19,151.515,148.98,-86.775,116.415,108.42,145.47,145.08,130.455,7.02,70.785,44.07,67.275
2,0,83.25825,76.44,95.94,108.42,106.47,-139.62,68.25,33.15,111.735,109.59,75.855,-79.755,15.405,12.87,23.01
3,0,83.25875,0.0,17.16,54.99,18.525,-220.155,-26.715,-52.455,40.365,65.52,-0.78,-129.87,-38.61,-71.955,-40.365
4,0,83.25925,-20.28,-10.92,28.665,15.795,-262.47,-54.015,-74.685,16.38,18.915,-39.585,-130.65,-69.615,-82.485,-88.14


In [229]:
print('Shape of dataframe: {}'.format(cg_data.shape))
print('Number of ripples: {}'.format(cg_data.ripple_nr.nunique()))

Shape of dataframe: (1809626, 16)
Number of ripples: 628



<br>

### Drop dead channels

 MAG: TT6?

<br>

### Prepare dataset for storage

In [235]:
ripple_info = ripples_analysis[['ripple_nr', 'start_time', 'end_time', 'phase']]

In [236]:
cg_data_all = pd.merge(
    cg_data, 
    ripple_info, 
    left_on=['ripple_nr'], 
    right_on='ripple_nr', 
    how='left'
)

<br>

### Calculate relative timestamp

In [237]:
cg_data_all['relative_timestamp']=cg_data_all['timestamp'] - cg_data_all['start_time']

In [238]:
cg_data_all.head()

Unnamed: 0,ripple_nr,timestamp,TT1,TT2,TT3,TT4,TT5,TT6,TT7,TT8,TT9,TT10,TT11,TT12,TT13,TT14,start_time,end_time,phase,relative_timestamp
0,0,83.25725,151.905,105.105,158.73,191.685,-38.61,121.485,115.245,146.835,149.76,147.225,8.97,73.905,37.83,70.59,83.75725,83.81275,Sample,-0.5
1,0,83.25775,144.495,125.19,151.515,148.98,-86.775,116.415,108.42,145.47,145.08,130.455,7.02,70.785,44.07,67.275,83.75725,83.81275,Sample,-0.4995
2,0,83.25825,76.44,95.94,108.42,106.47,-139.62,68.25,33.15,111.735,109.59,75.855,-79.755,15.405,12.87,23.01,83.75725,83.81275,Sample,-0.499
3,0,83.25875,0.0,17.16,54.99,18.525,-220.155,-26.715,-52.455,40.365,65.52,-0.78,-129.87,-38.61,-71.955,-40.365,83.75725,83.81275,Sample,-0.4985
4,0,83.25925,-20.28,-10.92,28.665,15.795,-262.47,-54.015,-74.685,16.38,18.915,-39.585,-130.65,-69.615,-82.485,-88.14,83.75725,83.81275,Sample,-0.498


<br>

### Save data in local folder

In [239]:
# Save CG data
#cg_data.to_csv(os.path.join(main_path, 'cg_data.csv'))
cg_data_all.to_csv(os.path.join(local_path, 'cg_data.csv'), index=False)

# Save ripple data 
#ripples_analysis.to_csv(os.path.join(main_path, 'cg_analysis_ripple_library.csv'))
ripples_analysis.to_csv(os.path.join(local_path, 'cg_analysis_ripple_library.csv'), index=False)

<br>
<br>
<br>

#### THE END.