# Tutorial 6 - Annotated Data Module

The AnnData library is the primary protocol that is used to store imaging data in an efficient, multi-functional format. It is created using the `anndata` sub-module and can be accessed using `trialobj.data`. By default, `trialobj.data` is a data array generated from Suite2p processed data.
For all guidance on AnnData objects, visit: https://anndata.readthedocs.io/en/latest/index.html.


The AnnData object is built around the raw Flu matrix of each `trialobj` . In keeping with AnnData conventions, the data structure is organized in *n* observations (obs) x *m* variables (var), where observations are suite2p ROIs and variables are imaging frame timepoints.


In [1]:
import imagingplus as ip


imported imagingplus successfully
	version: 0.2-beta



In [2]:
expobj: ip.Experiment = ip.import_obj(pkl_path='/mnt/qnap_share/Data/imagingplus-example/RL109_analysis.pkl')
print(f'Trials in expobj: {expobj.trialIDs}')
trialobj = expobj.load_trial(trialID=expobj.trialIDs[2])


|- Loaded imagingplus.Experiment object (expID: RL109)109_analysis.pkl ... 

Trials in expobj: ['t-005', 't-006', 't-013']

|- Loaded TwoPhotonImagingTrial.alloptical experimental trial object ... 



In [3]:
trialobj.data  # this is the anndata object for this trial

Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 640 × 16368

## storage of Flu data

The raw data is stored in `.X`

In [4]:
print(trialobj.data.X)

print('shape: ', trialobj.data.X.shape)

[[352.13678  411.9472   280.92416  ... 401.3014   515.2566   541.41565 ]
 [192.22421  395.29306  330.7496   ... 257.25806  285.31506  126.660484]
 [336.64996  539.26746  219.30368  ... 423.15295  433.1515   220.52742 ]
 ...
 [308.56497  303.55536  413.3554   ... 482.61044  386.2576   283.1643  ]
 [133.96815  122.96908   84.63106  ... 109.2256   187.91866  159.50813 ]
 [252.49574  240.2455   273.2785   ... 181.601    229.0061   278.74188 ]]
shape:  (640, 16368)


Processed data is added to `trialobj.data` as a unique `layers` key.

In [5]:
trialobj.data.layers

Layers with keys: 

In [6]:
# Let's add dFF processing of the raw calcium sigals as a new layer:

from imagingplus.processing.imaging import normalize_dff

dff_arr = normalize_dff(arr=trialobj.data.X, normalize_pct=50)

trialobj.data.add_layer(layer_name='dFF', data=dff_arr)
print(trialobj.data.layers)

Cell 16: contains nan
      Mean of the sub-threshold for this cell: nan
Cell 410: contains nan
      Mean of the sub-threshold for this cell: nan
Add new dFF layer. 
	Layers in object: Layers with keys: dFF
Layers with keys: dFF


In [7]:
print(trialobj.data.layers['dFF'])

print('shape: ', trialobj.data.layers['dFF'].shape)


[[  3.0254273   20.524294   -17.809402   ...  17.409626    50.74975
   58.40316   ]
 [-25.098614    54.028458    28.878689   ...   0.24223907  11.17483
  -50.645935  ]
 [ 17.842701    88.76798    -23.2338     ...  48.122658    51.6226
  -22.805437  ]
 ...
 [ 27.989037    25.911108    71.45485    ... 100.18099     60.215004
   17.453144  ]
 [ 14.804955     5.379218   -27.474817   ...  -6.3983517   61.038216
   36.69162   ]
 [ 42.591587    35.67352     54.32821    ...   2.5552905   29.326313
   57.41353   ]]
shape:  (640, 16368)


The rest of the AnnData data object is built according to the dimensions of the original Flu data input.

## observations (Suite2p ROIs metadata and associated processing info)

For instance, the metadata for each suite2p ROI stored in Suite2p’s stat.npy output is added to `trialobject.data` under `obs` and `obsm` (1D and >1-D observations annotations, respectively).

In [8]:
trialobj.data.obs

Unnamed: 0,ypix,xpix,lam,footprint,mrs,...,radius,aspect_ratio,npix_norm,skew,std
0,"[102, 102, 102, 102, 102, 103, 103, 103, 103, ...","[457, 458, 459, 460, 461, 456, 457, 458, 459, ...","[0.0063846777, 0.008958542, 0.011363007, 0.011...",1.0,0.909815,...,3.565604,1.051397,0.649175,3.016955,353.675049
1,"[46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 4...","[116, 117, 118, 119, 120, 121, 114, 115, 116, ...","[0.009095913, 0.014569374, 0.01832514, 0.01890...",1.0,0.912076,...,3.538468,1.074428,0.622126,3.784652,422.922577
2,"[18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 1...","[202, 203, 204, 205, 200, 201, 202, 203, 204, ...","[0.00545189, 0.006088022, 0.0062021483, 0.0052...",1.0,1.088559,...,4.124215,1.027475,0.919665,3.603348,342.368134
3,"[43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 4...","[352, 352, 352, 352, 353, 352, 353, 354, 351, ...","[0.0036495698, 0.0043396214, 0.0031816224, 0.0...",1.0,1.561322,...,8.133019,1.348522,1.325399,3.187822,357.666168
4,"[156, 156, 156, 156, 156, 157, 157, 157, 157, ...","[382, 383, 384, 385, 386, 380, 381, 382, 383, ...","[0.013304887, 0.02187323, 0.023734575, 0.01969...",1.0,0.869808,...,3.62042,1.139261,0.554504,2.59998,263.609039
...,...,...,...,...,...,...,...,...,...,...,...
1241,"[290, 291, 291, 291, 291, 291, 291, 291, 291, ...","[299, 298, 299, 300, 301, 302, 305, 306, 307, ...","[0.0029507184, 0.0032565512, 0.005071437, 0.00...",2.0,2.336259,...,11.46564,1.251471,2.447931,2.540658,94.641136
1242,"[354, 354, 355, 355, 355, 355, 355, 356, 356, ...","[309, 310, 308, 309, 310, 311, 312, 307, 308, ...","[0.00252066, 0.0021455055, 0.007094776, 0.0078...",2.0,2.386548,...,10.831544,1.129317,2.934812,2.229956,79.882561
1246,"[15, 15, 16, 16, 16, 17, 17, 17, 17, 17, 17, 1...","[488, 489, 486, 487, 489, 486, 487, 488, 489, ...","[0.010669279, 0.007242187, 0.013514522, 0.0124...",2.0,2.238235,...,13.789857,1.460301,1.636462,4.38241,55.825489
1250,"[472, 472, 472, 473, 473, 473, 473, 473, 473, ...","[55, 56, 67, 55, 56, 57, 63, 64, 65, 66, 67, 6...","[0.0023643558, 0.0034383552, 0.0021977199, 0.0...",2.0,2.079421,...,10.794925,1.331867,2.583175,1.233372,64.879417


In [9]:
trialobj.data.obsm

AxisArrays with keys: ypix, xpix

The `.obsm` includes the ypix and xpix outputs for each suite2p ROI which represent the pixel locations of the ROI mask.

In [10]:
print('ypix:', trialobj.data.obsm['ypix'][:5], '\n\nxpix: \t', trialobj.data.obsm['xpix'][:5])

ypix: [array([102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 104,
       104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105,
       105, 105, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107,
       107, 107, 107, 107, 108, 108, 108, 108, 108])
 array([46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
       48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 49, 49, 50, 50, 50,
       50, 50, 50, 50, 51, 51, 51, 51, 51, 52, 52, 52])
 array([18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20,
       20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
       22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24,
       24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26])
 array([43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 48, 48, 49, 49, 49, 49,
       49, 49, 50, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 52,
       52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 5

## variables (temporal synchronization of paq channels and imaging)

And the temporal synchronization data of the experiment collected in .paq output is added to the variables annotations under `var`. These variables are timed to the imaging frame clock timings. The total # of variables is the number of imaging frames in the original Flu data input.

In [11]:
trialobj.data.var

Unnamed: 0,frame_clock,x_galvo_uncaging,slm2packio,markpoints2packio,packio2slm,packio2markpoints,pycontrol_rsync,voltage
139577,4.972792,-1.165180,3.329534,0.007827,0.000264,0.000592,0.017035,-0.116807
140252,4.974765,-1.164851,3.325917,0.007827,-0.000394,-0.000394,0.019666,-0.196060
140925,4.971806,-1.165180,3.337755,0.008156,-0.000065,0.000592,0.021310,-0.210858
141595,4.970819,-1.164851,3.331507,0.007827,0.000592,0.000264,0.021310,-0.204939
142267,4.975094,-1.166166,3.340715,0.006841,0.000264,0.001250,0.021639,-0.234206
...,...,...,...,...,...,...,...,...
11136216,4.974436,-1.165837,3.337426,0.006841,-0.000065,0.002565,0.021639,2.423554
11136886,4.975751,-1.164851,3.334138,0.012102,-0.000065,-0.000065,0.017035,2.473211
11137559,4.960953,-1.164851,3.337097,0.008485,0.000264,0.000921,0.019994,2.473211
11138232,4.971477,-1.167153,3.319340,0.006183,-0.000065,-0.000065,0.005525,2.466963


### Creating or Modifying AnnData arrays of trialobj

There are a number of helper functions to create anndata arrays or modify existing anndata arrays.

In [12]:
# creating new anndata object. This is identical to the base AnnData library.
# the example below is from the Getting Started Tutorial for AnnData:

# any given anndata object is created from constituent data arrays.


# 1) Primary data matrix
import numpy as np
import pandas as pd

n_rois, n_frames = 10, 10000
X = np.random.random((n_rois, n_frames))  # create random data matrix

df = pd.DataFrame(X, columns=range(n_frames), index=np.arange(n_rois, dtype=int).astype(str))
df  # show the dataframe

Unnamed: 0,0,1,2,3,4,...,9995,9996,9997,9998,9999
0,0.08847,0.594978,0.267262,0.521285,0.42739,...,0.628226,0.49195,0.023748,0.910001,0.342909
1,0.421657,0.02494,0.345641,0.285778,0.339881,...,0.282903,0.818589,0.758343,0.068396,0.809684
2,0.642708,0.101187,0.787579,0.822067,0.329221,...,0.674419,0.082625,0.676742,0.711652,0.515747
3,0.741156,0.563763,0.390991,0.809422,0.62827,...,0.746982,0.588162,0.203452,0.662033,0.523288
4,0.266626,0.48464,0.430566,0.882055,0.785261,...,0.655115,0.442506,0.116492,0.861459,0.589859
5,0.695614,0.571977,0.633992,0.7064,0.355071,...,0.156109,0.22279,0.958219,0.484075,0.236766
6,0.313605,0.101705,0.08071,0.854698,0.220697,...,0.482442,0.171771,0.278977,0.321641,0.124504
7,0.700918,0.319251,0.173709,0.844428,0.99237,...,0.493461,0.930643,0.548558,0.948738,0.416265
8,0.670791,0.416993,0.405371,0.213854,0.712764,...,0.250209,0.956986,0.325717,0.696112,0.219828
9,0.064023,0.667027,0.198786,0.437727,0.811632,...,0.449985,0.016948,0.336893,0.156778,0.746549


In [13]:
#2) Observations matrix

obs_meta = pd.DataFrame({
    'cell_type': np.random.choice(['exc', 'int'], n_rois),
},
    index=np.arange(n_rois, dtype=int).astype(str),    # these are the same IDs of observations as above!
)
obs_meta


Unnamed: 0,cell_type
0,exc
1,int
2,int
3,int
4,exc
5,exc
6,int
7,exc
8,exc
9,int


In [14]:
#3) Variables matrix


var_meta = pd.DataFrame({
    'exp_group': np.random.choice(['A','B', 'C'], n_frames),
},
    index=np.arange(n_frames, dtype=int).astype(str),    # these are the same IDs of observations as above!
)
var_meta


Unnamed: 0,exp_group
0,C
1,B
2,B
3,A
4,A
...,...
9995,A
9996,B
9997,B
9998,B


In [15]:
#4) Creating a new anndata attribute for the trialobj

import imagingplus.processing.anndata as ad  # from the processing module, import anndata submodule

trialobj.new_anndata = ad.AnnotatedData(X=df,obs=obs_meta, var=var_meta)

print(trialobj.new_anndata)

Created AnnData object: 
	Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 10 × 10000
Annotated Data of n_obs × n_vars = 10 × 10000 
available attributes: 
	.X (primary datamatrix)
	.obs (obs metadata): 
		|- 'cell_type'
	.var (vars metadata): 
		|- 'exp_group'


In [16]:
# adding an 'obs' to existing anndata object

new_obs = pd.DataFrame({
    'cell_loc_x': np.random.random_integers(0, 512, n_rois),
    'cell_loc_y': np.random.random_integers(0, 512, n_rois),
},
    index=np.arange(n_rois, dtype=int).astype(str),    # these are the same IDs of observations as above!
)

cell_loc_x = np.random.random_integers(0, 512, n_rois)
cell_loc_y = np.random.random_integers(0, 512, n_rois)


trialobj.new_anndata.add_obs(obs_name='cell_loc_x', values=cell_loc_x)
trialobj.new_anndata.add_obs(obs_name='cell_loc_y', values=cell_loc_y)

print(trialobj.new_anndata)



Annotated Data of n_obs × n_vars = 10 × 10000 
available attributes: 
	.X (primary datamatrix)
	.obs (obs metadata): 
		|- 'cell_type', 'cell_loc_x', 'cell_loc_y'
	.var (vars metadata): 
		|- 'exp_group'


In [17]:
# deleting an 'obs' to existing anndata object
# uses the pop method

trialobj.new_anndata.del_obs('cell_type')
print(trialobj.new_anndata)

Annotated Data of n_obs × n_vars = 10 × 10000 
available attributes: 
	.X (primary datamatrix)
	.obs (obs metadata): 
		|- 'cell_loc_x', 'cell_loc_y'
	.var (vars metadata): 
		|- 'exp_group'


*Note: adding and deleting an 'var' to existing anndata object can be done in the exact same manner as demonstrated above for 'obs' using .add_var() and .del_var() methods on an anndata object.*