# Convert immobilized to nwb

For an initial analyzing task to play around with different algorithms, we will be using immobilized data from former coworkers Kerem and Rebecca. They used the [whole brain analyzer ](https://github.com/Zimmer-lab/whole_brain_analyzer_MATLAB) to preprocess the raw data which includes:

- normalizing (delta f over f)
- bleach correction
- bg correction (if shining from neighbouring neurons disturb the signal)

We should check the bleach correction version of the data to check if further preprocessing is needed, in case some artifacts remain.

Unfortunately, in Rebecca's data the ID columns are empty..

In [6]:
import pandas as pd
import numpy as np
from collections import defaultdict
import helper as helper
import dill
import importlib

# in case we want to reload our module
#importlib.reload(helper)

<module 'helper' from 'c:\\Users\\LAK\\Documents\\helper.py'>

### Get deltaFOverF Data

In [10]:
# defining directory paths where all recordings are located, first is from Rebecca and second from Kerem Uzel
directory_paths = ["Y:\\lisc\\project\\neurobiology\\zimmer\\Rebecca",
                   "Y:\\lisc\\project\\neurobiology\\zimmer\\Kerem_Uzel\\Whole_brain_imaging\\Cleaned_up_datasets\\WT"]

# defining target file name which should be wbstruct.mat since this is the file from the wba that we want to convert into a DF
target_file = "wbstruct.mat"

# since we usually have multiple recordings per animal we want to include only those that are relevant for our analysis
# following is specific to Rebecca's and Kerem's Data
include_Rebecca=["Ctrl","used Datasets"]
include_Kerem=["Head","Tail"]
exclude=["not_used","not used","notUsed","cat-2_tdc-1_tph-1"]
recording_type="deltaFOverF_bc"

In [11]:
# import and save rebecca's data
datasets_rebecca = helper.get_datasets_dict(directory_paths[0],target_file,include_Rebecca,exclude,recording_type) 

Searching for paths
Found 56 paths


Loading Files: 100%|██████████| 56/56 [01:50<00:00,  1.97s/it]


In [13]:
# import and save kerem's data
datasets_kerem = helper.get_datasets_dict(directory_paths[1],target_file,include_Kerem,exclude,recording_type)

Searching for paths
Found 12 paths


Loading Files: 100%|██████████| 12/12 [29:18<00:00, 146.51s/it]


## Specific to Rebecca's Data
### Get IDs from Excel Sheets
Following is specific to Rebecca's data (see the directory path below) where the IDs are stored in separate Excel Sheets. In the get_IDs_dict, we iterate through Excel sheets and if we have columns associated with the ID and the index then we will collect it

In [4]:
# we want to import the excel files that contain the information about the recordings
directory_path_ID = "Y:\\lisc\\project\\neurobiology\\zimmer\\Rebecca\\Analyses"
target_ID = ".xlsx"
include_ID = ["Analyses_"]
exclude_ID = ["._","cat-2_tdc-1_tph-1"]

In [5]:
dictofIDs_og = helper.get_IDs_dict(directory_path_ID, target_ID, include_ID, exclude_ID)

Searching for paths
Found 3 paths


Loading Files: 100%|██████████| 3/3 [00:03<00:00,  1.07s/it]


### Merging Head and Tail of Rebecca's Data and converting to DF

In [6]:
datasets_rebecca_copy = copy.deepcopy(datasets_rebecca)
dictofIDs = copy.deepcopy(dictofIDs_og)

In [7]:
for trial,trialvalue in datasets_rebecca.items():
    
    if "notUsed" not in trial:
        
        id_names = dictofIDs[trial]

        # merging head and tail data if both are available
        try:
            merged_datasets=np.hstack((datasets_rebecca[trial]["Head"]['deltaFOverF'],datasets_rebecca[trial]["Tail"]['deltaFOverF']))
        except ValueError as e:
            merged_datasets=np.hstack((datasets_rebecca[trial]["Head"]['deltaFOverF']))

        id_length = merged_datasets.shape[1]

        # if the number of neurons in the recording is not equal to the number of IDs we want to exclude this recording
        if not len(id_names)==id_length:
            if not len(id_names)==(id_length-datasets_rebecca[trial]["Head"]['deltaFOverF'].shape[1]):
                del datasets_rebecca_copy[trial]
                continue
            else:
                id_length=id_length-datasets_rebecca[trial]["Head"]['deltaFOverF'].shape[1]
                merged_datasets=datasets_rebecca[trial]["Tail"]['deltaFOverF']

        # creating column names for the DF from the IDs that we got
        colnames = [f"neuron_{i:03d}" for i in range(id_length)]
        colnames = [dummy if pd.isna(ID) else ID for dummy, ID in zip(colnames, id_names)]
        datasets_rebecca_copy[trial] = pd.DataFrame(merged_datasets, columns=colnames) 
            

In [33]:
print("Available Recordings:",list(datasets_rebecca_copy.keys()))

Available Recordings: ['20200629_w1', '20200708_w3', '20200714_w3', '20200716_w2', '20200724_w3', '20200724_w4', '20200826_w3', '20200826_w6', '20200922_w1', '20200930_w4', '20201007_w4', '20210112_w3', '20210113_w1', '20210126_w2', '20210203_w2', '20210323_w6', '20210324_w5', '20210330_w5', '20210414_w7']


## Merging Kerem's Head and Tail
Following is specific to Kerem's data. In this data case we have the IDs already stored in the datasets dictionary, so the merge should be easier.

In [23]:
# load kerem's dictionary
with open('datasets_kerem_dFOF_bc.pkl' ,'rb') as f:
    datasets_kerem = pickle.load(f)

datasets_kerem_copy = copy.deepcopy(datasets_kerem)

In [28]:
for trial,trialvalue in datasets_kerem.items():
    
    
    id_names = datasets_kerem[trial]["Head"]["ID1"]+datasets_kerem[trial]["Tail"]["ID1"]

    # merging head and tail data
    merged_datasets=np.hstack((datasets_kerem[trial]["Head"][recording_type],datasets_kerem[trial]["Tail"][recording_type]))
    id_length = merged_datasets.shape[1]
    colnames = [f"neuron_{i:03d}" for i in range(id_length)]
    colnames = [dummy if pd.isna(ID) else ID for dummy, ID in zip(colnames, id_names)]
    datasets_kerem_copy[trial] = pd.DataFrame(merged_datasets, columns=colnames) 

            

In [35]:
print("Available Recordings:",list(datasets_kerem_copy.keys()))

Available Recordings: ['Dataset1_20190125_ZIM1428_Ctrl_w2', 'Dataset2_20180207_TKU862_Ctrl_w6', 'Dataset3_20180112_TKU761_Ctrl_w1', 'Dataset4_20170927_ZIM1428_Ctrl_w7', 'Dataset5_20181217_ZIM1428_w2', 'Dataset6_20190315_ZIM1428_1xHis_w1']
