# Converting MATLAB wbstructs to pandas dataframes

Check the README.md for more information when to use this tool. This notebook simply calls the functions in the wbstruct_converter.py file. To get a better understanding of the code, please refer to the wbstruct_converter.py file. If there are unhandled errors, please open an issue on GitHub.

In [9]:
import utils.wbstruct_to_dicts as wbstruct_dictioniaries
import utils.wbstruct_dicts_to_dataframes as wbstruct_dataframes
import dill


# in case we want to reload our module
import importlib
importlib.reload(wbstruct_dictioniaries)
importlib.reload(wbstruct_dataframes)

<module 'utils.wbstruct_dicts_to_dataframes' from 'c:\\Users\\LAK\\Documents\\neumand\\data\\wbstruct_converter\\utils\\wbstruct_dicts_to_dataframes.py'>

### Convert the wbstructs to python dictionaries

In [4]:
# defining directory paths where all recordings are located, first is from Rebecca and second from Kerem Uzel
directory_paths = ["Y:\\lisc\\project\\neurobiology\\zimmer\\Rebecca",
                   "Y:\\lisc\\project\\neurobiology\\zimmer\\Kerem_Uzel\\Whole_brain_imaging\\Cleaned_up_datasets\\WT"]

# defining target file name which should be wbstruct.mat since this is the file from the wba that we want to convert into a DF
target_file = "wbstruct.mat"

# since we usually have multiple recordings per animal we want to include only those that are relevant for our analysis
# following is specific to Rebecca's and Kerem's Data
include_Rebecca=["Ctrl","used Datasets"]
include_Kerem=["Head","Tail"]
exclude=["not_used","not used","notUsed","cat-2_tdc-1_tph-1"]
recording_type="deltaFOverF_bc"
simple=False
save_as='csv'

In [3]:
# import and save rebecca's data
datasets_rebecca = wbstruct_dictioniaries.get_datasets_dict(directory_paths[0],target_file,include_Rebecca,exclude,recording_type, simple) 

Searching for paths
Found 56 paths


Loading Files: 100%|██████████| 56/56 [01:51<00:00,  1.99s/it]


In [47]:
# import and save kerem's data
datasets_kerem = wbstruct_dictioniaries.get_datasets_dict(directory_paths[1],target_file, include_Kerem,exclude,recording_type)

Searching for paths
Found 12 paths


Loading Files: 100%|██████████| 12/12 [31:44<00:00, 158.73s/it]


The dictionary returned by the function is also saved as a pickle file named "datasets.pkl". In order to load this file, perform following command:

In [9]:
# load kerem's dictionary
with open('datasets.pkl' ,'rb') as f:
    datasets_kerem = dill.load(f)

Now we can use the dictionaries as they are or convert them to pandas dataframes.

### Convert dictionaries to dataframes
For the following function **get_dataframes()** we assume that the "ID1" contains all IDs (non-IDed objects have the ID "None").

In [65]:
dataframes_kerem = wbstruct_dataframes.get_dataframes(datasets_kerem, recording_type, save_as_hdf5)

Dataframes stored as hdf5 files in the directory 'hdf5_files'


In [51]:
print("Available Recordings:",list(dataframes_kerem.keys()))
dataframes_kerem['Dataset1_20190125_ZIM1428_Ctrl_w2'].head()

Available Recordings: ['Dataset1_20190125_ZIM1428_Ctrl_w2', 'Dataset2_20180207_TKU862_Ctrl_w6', 'Dataset3_20180112_TKU761_Ctrl_w1', 'Dataset4_20170927_ZIM1428_Ctrl_w7', 'Dataset5_20181217_ZIM1428_w2', 'Dataset6_20190315_ZIM1428_1xHis_w1']


Unnamed: 0,neuron_000,neuron_001,neuron_002,neuron_003,neuron_004,neuron_005,neuron_006,neuron_007,neuron_008,neuron_009,...,DA09,VA12,neuron_146,neuron_147,VA11,VD11,AS10,DA07,DB07,VB11
0,0.393794,0.307361,0.040831,0.039502,0.10237,0.015577,0.165189,0.009465,0.179284,0.033344,...,0.078039,0.199771,0.133681,0.013904,0.093948,0.129712,0.328577,0.211715,0.115133,0.084223
1,0.431958,0.338261,0.032891,0.055835,0.114245,0.016791,0.147697,0.0,0.183983,0.043461,...,0.046031,0.198323,0.155385,0.009219,0.095156,0.116418,0.335416,0.198499,0.132506,0.065558
2,0.475861,0.39207,0.012653,0.058784,0.084672,0.022044,0.163992,0.006254,0.154195,0.021439,...,0.078579,0.193963,0.180774,0.02064,0.063496,0.113001,0.324969,0.226145,0.087854,0.079178
3,0.469389,0.382305,0.020624,0.074705,0.087241,0.0,0.16551,0.02922,0.156195,0.02273,...,0.069116,0.188425,0.181836,4.9e-05,0.083021,0.111012,0.316187,0.210039,0.114409,0.071539
4,0.515858,0.396868,0.033309,0.05437,0.108983,0.021315,0.152534,0.028071,0.142395,0.0,...,0.061056,0.184018,0.188677,0.027912,0.072947,0.136352,0.323424,0.216971,0.108904,0.092562


### If the IDs are in separate Excel Sheets
In case the IDs of the neurons are stored in separate Excel Sheets. To construct the final dataframe we have to merge the collected IDs with the datasets from the dictionary.

In [5]:
# following directory contains all the excel files that hold the information about the recordings
directory_path_ID = "Y:\\lisc\\project\\neurobiology\\zimmer\\Rebecca\\Analyses"
target_ID = ".xlsx"
include_ID = ["Analyses_"]
exclude_ID = ["._","cat-2_tdc-1_tph-1"]

In [6]:
dictofIDs_og = wbstruct_dictioniaries.get_IDs_dict(directory_path_ID, target_ID, include_ID, exclude_ID)

Searching for paths
Found 3 paths


Loading Files: 100%|██████████| 3/3 [00:03<00:00,  1.21s/it]


In [10]:
dataframes_rebecca=wbstruct_dataframes.get_dataframes_from_excel(datasets_rebecca, dictofIDs_og, recording_type, save_as='csv')

Dataframes stored as csv files in the directory 'csv_files'


In [57]:
print("Available Recordings:",list(dataframes_rebecca.keys()))
dataframes_rebecca['20200629_w1'].head()

Available Recordings: ['20200629_w1', '20200708_w3', '20200714_w3', '20200716_w2', '20200724_w3', '20200724_w4', '20200826_w3', '20200826_w6', '20200922_w1', '20200930_w4', '20201007_w4', '20210112_w3', '20210113_w1', '20210126_w2', '20210203_w2', '20210323_w6', '20210324_w5', '20210330_w5', '20210414_w7']


Unnamed: 0,neuron_000,neuron_001,neuron_002,neuron_003,neuron_004,neuron_005,neuron_006,neuron_007,neuron_008,neuron_009,...,VA12,neuron_185,DA09,neuron_187,neuron_188,neuron_189,VA11,neuron_191,DA07,DB07
0,0.19013,0.160321,0.219987,0.226076,0.173091,0.098439,0.21637,0.23961,0.191948,0.168094,...,0.345903,0.248828,0.468808,0.34825,0.363683,0.117297,0.258854,1.105442,0.24541,0.259243
1,0.165069,0.170646,0.198396,0.240675,0.132144,0.07338,0.21093,0.198651,0.167014,0.1376,...,0.29201,0.26399,0.430236,0.312688,0.325351,0.117811,0.195009,1.103618,0.285437,0.246301
2,0.140882,0.192148,0.224722,0.24321,0.182661,0.101228,0.226743,0.199701,0.181488,0.131127,...,0.345514,0.265548,0.463654,0.338033,0.386599,0.098495,0.196194,1.105227,0.267343,0.265042
3,0.130964,0.175004,0.121349,0.184547,0.12173,0.073926,0.172044,0.156691,0.126577,0.075437,...,0.265186,0.184721,0.395756,0.271826,0.326238,0.079108,0.170831,0.986962,0.232933,0.207334
4,0.179889,0.130667,0.17324,0.129928,0.148944,0.065578,0.167654,0.189717,0.156886,0.131135,...,0.273023,0.233412,0.398289,0.275193,0.333255,0.050726,0.165638,1.01569,0.218799,0.230359
