This script takes an `xarray` Dataset that was generated from analysis of Calium data, and writes it to disk in such a way that it can be read and post-processed by `R` scripts which test the statistical significance of the different activity of two groups or more, with knowledge of the mouse ID that generated this data. Or in other words - ANOVA with a nested design. These functions currently don't exist in Python, which is why it had to be done in R.

The R scripts as well as this one are currently optimized for Amit's FMR-WT data, but that should be easily changed.

In [80]:
import pathlib
import itertools
import pickle

import pandas as pd
import numpy as np
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt

from calcium_bflow_analysis.single_fov_analysis import filter_da
from calcium_bflow_analysis.dff_analysis_and_plotting import dff_analysis
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [81]:
foldername = pathlib.Path('/data/Amit_QNAP/Calcium_FXS/')
fname_glob = '*.nc'
full_fnames = list(foldername.glob(fname_glob))
full_fnames

[PosixPath('/data/Amit_QNAP/Calcium_FXS/data_of_day_1.nc')]

In [82]:
data = xr.open_dataset(full_fnames[0])

We're now build a dictionary which divides the data into the different groupings. We'll pickle it to avoid running everything everytime, but besides that the dictionary is a helper variable that lets us construct specific dataframes containing only the relevant data. 

In [83]:
fxs_wt = {'FXS': {'spont': {}, 'stim': {}, 'all': {}}, 'WT': {'spont': {}, 'stim': {}, 'all': {}}}
epochs = ('spont', 'stim', 'all')

for mouse_id, ds in data.groupby('mouse_id'):
    for epoch in epochs:
        try:
            dff = filter_da(ds, epoch)
            if dff.shape[0] < 10:
                print(f"dF/F shape of mouse {mouse_id} in epoch {epoch} contained too few rows.")
                continue
            condition = str(ds.condition[0].values)
            mean_dff = dff_analysis.calc_mean_dff(dff)
            mean_spike_rate = dff_analysis.calc_mean_spike_num(dff, fps=ds.attrs['fps'], thresh=0.70)      
            mean_dff_no_bg = dff_analysis.calc_mean_dff_no_background(dff)
            mean_spike_rate_no_bg = dff_analysis.calc_mean_spike_num_no_background(dff, fps=ds.attrs['fps'], thresh=0.70)
            fxs_wt[condition][epoch][mouse_id] = {
                'mean_dff': mean_dff, 
                'mean_spike_rate': mean_spike_rate, 
                'mean_dff_no_bg': mean_dff_no_bg, 
                'mean_spike_rate_no_bg': mean_spike_rate_no_bg
            }
        except AssertionError:  # some mice don't have all epochs
            continue

dF/F shape of mouse 293 in epoch spont contained too few rows.
dF/F shape of mouse 293 in epoch stim contained too few rows.
dF/F shape of mouse 293 in epoch all contained too few rows.
dF/F shape of mouse 595 in epoch stim contained too few rows.
dF/F shape of mouse 596 in epoch stim contained too few rows.
dF/F shape of mouse 648 in epoch spont contained too few rows.
dF/F shape of mouse 648 in epoch stim contained too few rows.
dF/F shape of mouse 648 in epoch all contained too few rows.
dF/F shape of mouse 650 in epoch spont contained too few rows.
dF/F shape of mouse 650 in epoch stim contained too few rows.
dF/F shape of mouse 650 in epoch all contained too few rows.


In [74]:
print(f"FXS mice: {fxs_wt['FXS']['all'].keys()}")
print(f"WT mice: {fxs_wt['WT']['all'].keys()}")

FXS mice: dict_keys(['517', '518', '609', '614', '647'])
WT mice: dict_keys(['595', '596', '615', '640', '674'])


In [84]:
with open(fname.with_suffix('.p'), 'wb') as f:
    pickle.dump(fxs_wt, f)

In [85]:
df_list = []
for geno, genodata in fxs_wt.items():
    for epoch, epochdata in genodata.items():
        for mid, midata in epochdata.items():
            for measure, measurement in midata.items():
                df_list.append(pd.DataFrame({'Epoch': epoch, 'Genotype': geno, 'MouseID': mid, 'Measure': measure, 'Value': measurement}))
                
df = pd.concat(df_list, ignore_index=True)

In [86]:
measures = ('mean_dff', 'mean_spike_rate', 'mean_dff_no_bg', 'mean_spike_rate_no_bg')

for epoch in epochs:
    for measure in measures:
        cur_data = df.query(f'Epoch == "{epoch}" and Measure == "{measure}"')
        cur_data.loc[:, ['Genotype', 'MouseID', 'Value']].to_csv(fname.with_name(f'epoch_{epoch}_measure_{measure}.csv'), index=False)