# Tutorial 2 - Data organization inside Imaging+

In [1]:
import imagingplus as ip


imported imagingplus successfully
	version: 0.2-beta



**This notebook demonstrates the basics of how data is organized inside of a trial object.**




## Importing data analysis objects
The first step to begin using the data analysis objects is to import the previously created objects. In general, it is advised to import th high-level `Experiment` object first, and then use the `.load_trial` method to load an individual `-Trial` object.


In [24]:
# import experiment obj and TwoPhotonImagingTrial object
expobj = ip.import_obj(pkl_path='/mnt/qnap_share/Data/imagingplus-example/RL109_analysis.pkl')
print(expobj)
trialobj = expobj.load_trial(trialID='t-013')


|- Loaded imagingplus.Experiment object (expID: RL109)109_analysis.pkl ... 

imagingplus Experiment object (last saved: Sun Oct 23 13:04:11 2022), expID: RL109
file path: /mnt/qnap_share/Data/imagingplus-example/RL109_analysis.pkl

trials in Experiment object:
	t-005: awake spont. 2p imaging + LFP
	t-006: awake spont. 2p imaging + LFP
	t-013: all optical trial with LFP


|- Loaded TwoPhotonImagingTrial.alloptical experimental trial object ... 



## paq data
The `paq` sub-module is used to retrieve and store data from a .paq file for each trial. This temporal-data was saved into `trialobj.tmdata`:

In [25]:
# show all attributes saved in `trialobj.tmdata`
print(trialobj.tmdata.data)

          frame_clock  x_galvo_uncaging  slm2packio  markpoints2packio  packio2slm  packio2markpoints  pycontrol_rsync   voltage  stim_start_times
0            0.005525         -1.167153    3.338084           0.005854   -0.001710           0.000264         1.320265  0.055181             False
1            0.005854         -1.166824    3.339728           0.007827    0.000264          -0.000065         1.237395  0.053866             False
2            0.005854         -1.165509    3.336768           0.007169   -0.000394           0.000592         1.158142  0.056168             False
3            0.005525         -1.166495    3.334138           0.006183   -0.000065           0.000264         1.084809  0.054524             False
4            0.005854         -1.166824    3.331178           0.007169    0.002565           0.000592         1.018710  0.055510             False
...               ...               ...         ...                ...         ...                ...              ...

We can see that the 'frame_clock' paq channel was used as the primary channel for retrieving imaging frame timestamps synchronized to the paq clock. There are a number of other channels associated with this .paq file (however their data is not saved in this object to save space). Data from any of the other channels can stored directly to the `.paq` object using the `.storePaqChannel()` object method. In this case, the 'voltage' paq channel was stored in its entirety under `.paq.voltage`.

## Suite2p data

Suite2p is the primary Ca2+ imaging library that is integrated into the analysis pipeline. The dedicated `suite2p` submodule handles accessing suite2p functionality, as well as the data imported from Suite2p processing. Suite2p related data and methods are accessed using `trialobj.Suite2p`.

Some example functionality is shown below, refer to the reference documentation for more extensive information.

In [26]:
trialobj.Suite2p

Suite2p Results (trial level) Object, 16368 key_frames x 640 s2p ROIs

In [27]:
# the ROIs x raw data:
trialobj.Suite2p.imdata

# NOTE: we recommend working with the data that is stored in the anndata table (`trialobj.data`) 
# for your processing/analysis work.

array([[352.13678 , 411.9472  , 280.92416 , ..., 401.3014  , 515.2566  ,
        541.41565 ],
       [192.22421 , 395.29306 , 330.7496  , ..., 257.25806 , 285.31506 ,
        126.660484],
       [336.64996 , 539.26746 , 219.30368 , ..., 423.15295 , 433.1515  ,
        220.52742 ],
       ...,
       [308.56497 , 303.55536 , 413.3554  , ..., 482.61044 , 386.2576  ,
        283.1643  ],
       [133.96815 , 122.96908 ,  84.63106 , ..., 109.2256  , 187.91866 ,
        159.50813 ],
       [252.49574 , 240.2455  , 273.2785  , ..., 181.601   , 229.0061  ,
        278.74188 ]], dtype=float32)

In [28]:
# meta-information about ROIs from the Suite2p stat file:
trialobj.Suite2p.stat

array([{'ypix': array([102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 104,
       104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105,
       105, 105, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107,
       107, 107, 107, 107, 108, 108, 108, 108, 108]), 'xpix': array([457, 458, 459, 460, 461, 456, 457, 458, 459, 460, 461, 462, 455,
       456, 457, 458, 459, 460, 461, 462, 455, 456, 457, 458, 459, 460,
       461, 462, 455, 456, 457, 458, 459, 460, 461, 462, 456, 457, 458,
       459, 460, 461, 462, 457, 458, 459, 460, 461]), 'lam': array([0.00638468, 0.00895854, 0.01136301, 0.01110086, 0.00705759,
       0.01243491, 0.02065784, 0.02732179, 0.03081292, 0.02857405,
       0.01981334, 0.00643057, 0.01076856, 0.02522881, 0.03536875,
       0.03719155, 0.03523667, 0.03170755, 0.02582202, 0.01195719,
       0.01138049, 0.02936709, 0.0388124 , 0.03670786, 0.03174627,
       0.02781939, 0.0247726 , 0.01484124, 0.0064307 , 0.02178741,
       0.03363076, 0.035843

In [29]:
# meta-information about the Suite2p run:
trialobj.Suite2p.output_ops

{'suite2p_version': '0.9.3',
 'look_one_level_down': False,
 'fast_disk': '/mnt/sandbox/pshah/suite2p_tmp/suite2p/plane0',
 'delete_bin': True,
 'mesoscan': False,
 'bruker': False,
 'h5py': [],
 'h5py_key': 'data',
 'save_path0': '/home/pshah/mnt/qnap/Data/2020-12-19',
 'save_folder': '/home/pshah/mnt/qnap/Analysis/2020-12-19/suite2p/alloptical-2p-1x-alltrials',
 'subfolders': [],
 'move_bin': False,
 'nplanes': 1,
 'nchannels': 1,
 'functional_chan': 1,
 'tau': 1.26,
 'fs': 30.0,
 'force_sktiff': False,
 'frames_include': -1,
 'multiplane_parallel': False,
 'preclassify': 0.0,
 'save_mat': True,
 'save_NWB': False,
 'combined': True,
 'aspect': 1.0,
 'do_bidiphase': False,
 'bidiphase': 0,
 'bidi_corrected': True,
 'do_registration': True,
 'two_step_registration': False,
 'keep_movie_raw': False,
 'nimg_init': 200,
 'batch_size': 2000,
 'maxregshift': 0.1,
 'align_by_chan': 1,
 'reg_tif': True,
 'reg_tif_chan2': False,
 'subpixel': 10,
 'smooth_sigma_time': 0,
 'smooth_sigma': 1.15,

## Annotated Data

The AnnData library is the primary protocol that is used to store imaging data in an efficient, multi-functional format. It is created using the `anndata` sub-module and can be accessed using `trialobj.data`. By default, `trialobj.data` is a data array generated from Suite2p processed data.
For all guidance on AnnData objects, visit: https://anndata.readthedocs.io/en/latest/index.html.


The AnnData object is built around the raw Flu matrix of each `trialobj` . In keeping with AnnData conventions, the data structure is organized in *n* observations (obs) x *m* variables (var), where observations are suite2p ROIs and variables are imaging frame timepoints.


In [30]:
print(trialobj.data)  # this is the anndata object for this trial

Annotated Data of n_obs × n_vars = 640 × 16368 
available attributes: 
	.X (primary datamatrix) of .data_label: 
		|- suite2p raw - neuropil corrected
	.obs (obs metadata): 
		|- 'ypix', 'xpix', 'lam', 'footprint', 'mrs', 'mrs0', 'compact', 'med', 'npix', 'overlap', 'radius', 'aspect_ratio', 'npix_norm', 'skew', 'std'
	.var (vars metadata): 
		|- 'frame_clock', 'x_galvo_uncaging', 'slm2packio', 'markpoints2packio', 'packio2slm', 'packio2markpoints', 'pycontrol_rsync', 'voltage'
	.obsm: 
		|- 'ypix', 'xpix'


### storage of Flu data

The raw data is stored in `.X`

In [31]:
print(trialobj.data.X)

print('shape: ', trialobj.data.X.shape)

[[352.13678  411.9472   280.92416  ... 401.3014   515.2566   541.41565 ]
 [192.22421  395.29306  330.7496   ... 257.25806  285.31506  126.660484]
 [336.64996  539.26746  219.30368  ... 423.15295  433.1515   220.52742 ]
 ...
 [308.56497  303.55536  413.3554   ... 482.61044  386.2576   283.1643  ]
 [133.96815  122.96908   84.63106  ... 109.2256   187.91866  159.50813 ]
 [252.49574  240.2455   273.2785   ... 181.601    229.0061   278.74188 ]]
shape:  (640, 16368)


Processed data is added to `trialobj.data` as a unique `layers` key. 

In [34]:
trialobj.data.layers

# NOTE: we haven't added any layers to this dataset yet.

Layers with keys: 

The entire AnnData data object is built according to the dimensions of the original Flu data input.

### observations (Suite2p ROIs metadata and associated processing info)

For instance, the metadata for each suite2p ROI stored in Suite2p’s stat.npy output is added to `trialobject.data` under `obs` and `obsm` (1D and >1-D observations annotations, respectively).

In [35]:
trialobj.data.obs

Unnamed: 0,ypix,xpix,lam,footprint,mrs,...,radius,aspect_ratio,npix_norm,skew,std
0,"[102, 102, 102, 102, 102, 103, 103, 103, 103, ...","[457, 458, 459, 460, 461, 456, 457, 458, 459, ...","[0.0063846777, 0.008958542, 0.011363007, 0.011...",1.0,0.909815,...,3.565604,1.051397,0.649175,3.016955,353.675049
1,"[46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 4...","[116, 117, 118, 119, 120, 121, 114, 115, 116, ...","[0.009095913, 0.014569374, 0.01832514, 0.01890...",1.0,0.912076,...,3.538468,1.074428,0.622126,3.784652,422.922577
2,"[18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 1...","[202, 203, 204, 205, 200, 201, 202, 203, 204, ...","[0.00545189, 0.006088022, 0.0062021483, 0.0052...",1.0,1.088559,...,4.124215,1.027475,0.919665,3.603348,342.368134
3,"[43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 4...","[352, 352, 352, 352, 353, 352, 353, 354, 351, ...","[0.0036495698, 0.0043396214, 0.0031816224, 0.0...",1.0,1.561322,...,8.133019,1.348522,1.325399,3.187822,357.666168
4,"[156, 156, 156, 156, 156, 157, 157, 157, 157, ...","[382, 383, 384, 385, 386, 380, 381, 382, 383, ...","[0.013304887, 0.02187323, 0.023734575, 0.01969...",1.0,0.869808,...,3.62042,1.139261,0.554504,2.59998,263.609039
...,...,...,...,...,...,...,...,...,...,...,...
1241,"[290, 291, 291, 291, 291, 291, 291, 291, 291, ...","[299, 298, 299, 300, 301, 302, 305, 306, 307, ...","[0.0029507184, 0.0032565512, 0.005071437, 0.00...",2.0,2.336259,...,11.46564,1.251471,2.447931,2.540658,94.641136
1242,"[354, 354, 355, 355, 355, 355, 355, 356, 356, ...","[309, 310, 308, 309, 310, 311, 312, 307, 308, ...","[0.00252066, 0.0021455055, 0.007094776, 0.0078...",2.0,2.386548,...,10.831544,1.129317,2.934812,2.229956,79.882561
1246,"[15, 15, 16, 16, 16, 17, 17, 17, 17, 17, 17, 1...","[488, 489, 486, 487, 489, 486, 487, 488, 489, ...","[0.010669279, 0.007242187, 0.013514522, 0.0124...",2.0,2.238235,...,13.789857,1.460301,1.636462,4.38241,55.825489
1250,"[472, 472, 472, 473, 473, 473, 473, 473, 473, ...","[55, 56, 67, 55, 56, 57, 63, 64, 65, 66, 67, 6...","[0.0023643558, 0.0034383552, 0.0021977199, 0.0...",2.0,2.079421,...,10.794925,1.331867,2.583175,1.233372,64.879417


In [36]:
trialobj.data.obsm

AxisArrays with keys: ypix, xpix

The `.obsm` includes the ypix and xpix outputs for each suite2p ROI which represent the pixel locations of the ROI mask.

In [37]:
print('ypix:', trialobj.data.obsm['ypix'][:5], '\n\nxpix: \t', trialobj.data.obsm['xpix'][:5])

ypix: [array([102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 104,
       104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105,
       105, 105, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107,
       107, 107, 107, 107, 108, 108, 108, 108, 108])
 array([46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
       48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 49, 49, 50, 50, 50,
       50, 50, 50, 50, 51, 51, 51, 51, 51, 52, 52, 52])
 array([18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20,
       20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
       22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24,
       24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26])
 array([43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 48, 48, 49, 49, 49, 49,
       49, 49, 50, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 52,
       52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 5

### variables (temporal synchronization of paq channels and imaging)

And the temporal synchronization data of the experiment collected in .paq output is added to the variables annotations under `var`. These variables are timed to the imaging frame clock timings. The total # of variables is the number of imaging frames in the original Flu data input.

In [38]:
trialobj.data.var

Unnamed: 0,frame_clock,x_galvo_uncaging,slm2packio,markpoints2packio,packio2slm,packio2markpoints,pycontrol_rsync,voltage
139577,4.972792,-1.165180,3.329534,0.007827,0.000264,0.000592,0.017035,-0.116807
140252,4.974765,-1.164851,3.325917,0.007827,-0.000394,-0.000394,0.019666,-0.196060
140925,4.971806,-1.165180,3.337755,0.008156,-0.000065,0.000592,0.021310,-0.210858
141595,4.970819,-1.164851,3.331507,0.007827,0.000592,0.000264,0.021310,-0.204939
142267,4.975094,-1.166166,3.340715,0.006841,0.000264,0.001250,0.021639,-0.234206
...,...,...,...,...,...,...,...,...
11136216,4.974436,-1.165837,3.337426,0.006841,-0.000065,0.002565,0.021639,2.423554
11136886,4.975751,-1.164851,3.334138,0.012102,-0.000065,-0.000065,0.017035,2.473211
11137559,4.960953,-1.164851,3.337097,0.008485,0.000264,0.000921,0.019994,2.473211
11138232,4.971477,-1.167153,3.319340,0.006183,-0.000065,-0.000065,0.005525,2.466963


### Creating or Modifying AnnData arrays of trialobj

There are a number of helper functions to create anndata arrays or modify existing anndata arrays.

In [39]:
# creating new anndata object. This is identical to the base AnnData library.
# the example below is from the Getting Started Tutorial for AnnData:

# any given anndata object is created from constituent data arrays.


# 1) Primary data matrix
import numpy as np
import pandas as pd

n_rois, n_frames = 10, 10000
X = np.random.random((n_rois, n_frames))  # create random data matrix

df = pd.DataFrame(X, columns=range(n_frames), index=np.arange(n_rois, dtype=int).astype(str))
df  # show the dataframe

Unnamed: 0,0,1,2,3,4,...,9995,9996,9997,9998,9999
0,0.760647,0.110865,0.013329,0.935462,0.540991,...,0.473592,0.3687,0.200333,0.580795,0.397233
1,0.971282,0.03678,0.315767,0.216254,0.759866,...,0.982392,0.137152,0.935467,0.262601,0.867271
2,0.227014,0.16365,0.851788,0.527026,0.444399,...,0.996837,0.685767,0.307295,0.282857,0.553372
3,0.054008,0.280003,0.651034,0.93338,0.087784,...,0.899018,0.389161,0.996816,0.67248,0.860496
4,0.474105,0.987455,0.986814,0.977501,0.322008,...,0.571053,0.896681,0.143181,0.967625,0.332282
5,0.473564,0.460252,0.806648,0.583812,0.419692,...,0.22674,0.973925,0.974531,0.315231,0.431784
6,0.891266,0.953494,0.12049,0.178379,0.894506,...,0.089454,0.563263,0.897826,0.295088,0.194042
7,0.82937,0.295575,0.703529,0.639097,0.606885,...,0.408049,0.901431,0.242319,0.211567,0.089038
8,0.27185,0.935636,0.1208,0.487115,0.638711,...,0.700857,0.31419,0.338174,0.822562,0.709709
9,0.486074,0.555504,0.725748,0.992619,0.433843,...,0.968497,0.225324,0.896016,0.659566,0.66078


In [40]:
#2) Observations matrix

obs_meta = pd.DataFrame({
    'cell_type': np.random.choice(['exc', 'int'], n_rois),
},
    index=np.arange(n_rois, dtype=int).astype(str),    # these are the same IDs of observations as above!
)
obs_meta


Unnamed: 0,cell_type
0,int
1,exc
2,exc
3,exc
4,exc
5,int
6,int
7,int
8,exc
9,exc


In [41]:
#3) Variables matrix


var_meta = pd.DataFrame({
    'exp_group': np.random.choice(['A','B', 'C'], n_frames),
},
    index=np.arange(n_frames, dtype=int).astype(str),    # these are the same IDs of observations as above!
)
var_meta


Unnamed: 0,exp_group
0,A
1,C
2,B
3,C
4,B
...,...
9995,B
9996,C
9997,C
9998,A


In [43]:
#4) Creating a new anndata attribute for the trialobj

import imagingplus.processing.anndata as ad  # from the processing module, import anndata submodule

trialobj.new_anndata = ad.AnnotatedData(X=df,obs=obs_meta, var=var_meta)

print(trialobj.new_anndata)

Created AnnData object: 
	Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 10 × 10000
Annotated Data of n_obs × n_vars = 10 × 10000 
available attributes: 
	.X (primary datamatrix)
	.obs (obs metadata): 
		|- 'cell_type'
	.var (vars metadata): 
		|- 'exp_group'


In [44]:
# adding an 'obs' to existing anndata object

new_obs = pd.DataFrame({
    'cell_loc_x': np.random.random_integers(0, 512, n_rois),
    'cell_loc_y': np.random.random_integers(0, 512, n_rois),
},
    index=np.arange(n_rois, dtype=int).astype(str),    # these are the same IDs of observations as above!
)

cell_loc_x = np.random.random_integers(0, 512, n_rois)
cell_loc_y = np.random.random_integers(0, 512, n_rois)


trialobj.new_anndata.add_obs(obs_name='cell_loc_x', values=cell_loc_x)
trialobj.new_anndata.add_obs(obs_name='cell_loc_y', values=cell_loc_y)

print(trialobj.new_anndata)



Annotated Data of n_obs × n_vars = 10 × 10000 
available attributes: 
	.X (primary datamatrix)
	.obs (obs metadata): 
		|- 'cell_type', 'cell_loc_x', 'cell_loc_y'
	.var (vars metadata): 
		|- 'exp_group'


In [45]:
# deleting an 'obs' to existing anndata object
# uses the pop method

trialobj.new_anndata.del_obs('cell_type')
print(trialobj.new_anndata)

Annotated Data of n_obs × n_vars = 10 × 10000 
available attributes: 
	.X (primary datamatrix)
	.obs (obs metadata): 
		|- 'cell_loc_x', 'cell_loc_y'
	.var (vars metadata): 
		|- 'exp_group'


*Note: adding and deleting an 'var' to existing anndata object can be done in the exact same manner as demonstrated above for 'obs' using .add_var() and .del_var() methods on an anndata object.*