# Can voltage clamp be used to predict current clamp parameters?
Corinne Teeter

### Objective 
Determine parameters found by fitting current clamp data can be predicted from from parameters found fitting voltage clamp data (using fits to the average first pulse trace).  If so, do these relationships differ between different cre lines or layers?

### Methods
a) Fit the average of the first pulse voltage clamp and current clamp data. <br>
b) Rate the quality of the fit by eye. <br>
c) Remove heteroscedasticity. <br>
d) Break into catagories (i.e. inhibitory/excitatory, cre line, layer). <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;test for statistical significance of linear regression. <br>
e) Test for statistical significance that intercepts and slopes of catagories with significant regressions are different. <br>

### Subquestions addressed in this analysis
What is the best way to get rid of heteroscedasticity? <br>
How good does the fit need to be to assess statistical differences? <br>

### Caveats
In many of the fits, it does not look like rise is fit well with one exponential.  This throws off the location of the amplitude.

### Discussion/remaining questions
Think about why NRMSE, amp and rise time would be heteroscedastic. <br>
What do the different predictive relationships between different catagories mean?

### Potenial future directions
Use double exponentials in fit to clean up analysis


# Analysis

In [None]:
# Note that these .csv files loaded here have been processed though "extract_first_pulse_fit_data_from_DB.py"
# "and catagorize_goodness_of_fit_by_eye.py"
# A fantastic explanation of how to interpret the output and input of linear regression
# of different catagories in statsmodels (or R) is at: 
# https://www.andrew.cmu.edu/user/achoulde/94842/lectures/lecture10/lecture10-94842.html
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
font={'size':22}
matplotlib.rc('font', **font)
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [None]:
# load csv files.
i_df=pd.read_csv('ML_connected_iclamp_2018_12_18.csv')
v_df=pd.read_csv('ML_connected_vclamp_2018_12_12.csv')
i_df['uid']=i_df.apply(lambda row: "%.3f" % float(row.uid), axis=1)
v_df['uid']=v_df.apply(lambda row: "%.3f" % float(row.uid), axis=1)

In [None]:
# get rid of wacky Unnamed columns if they exist
v_df=v_df[v_df.columns.drop(list(v_df.filter(regex='Unnamed')))]
i_df=i_df[i_df.columns.drop(list(i_df.filter(regex='Unnamed')))]
i_df.keys()

In [None]:
#Merge voltage and current clamp data frames
merged_df = pd.merge(i_df, v_df, on=['uid', 'pre_cell_id', 'post_cell_id', 
                                     'distance', 'acsf','post_cre', 'pre_cre',
                                    'boolean_connection', 'pre_layer', 
                                     'post_layer'], how='inner', suffixes={'_i', '_v'})
merged_df['uid']=merged_df['uid'].astype(str)

# note that the length of the merged data frame equaling the len of the smallest 
# individual dataframe shows that the values being merged on are the same in the
# two databases.
print(len(i_df))
print(len(v_df))
print(len(merged_df))

# Data Sets: excellent versus well fit data
The quality of the fit was assessed by eye (See images at the end of the document for examples.).  More data will be included if the 'well fit' data are used in addition to the 'excellent fit' data.  However it is unclear if including well fit data will benifit the analysis as it may add  noise.  Throughout, results on the excellent versus well fit data will be shown.  The decay tau values are not yet well fit and many are hitting the boundries.  Here, the values hitting the boundry in vclamp are removed, however, this does not resolve the issue, as in general, decay tau is not well fit. 

In [None]:
# merge the data that is 'excellent'
excellent_df=merged_df[(merged_df['good_fit_i']=='excellent') & (merged_df['good_fit_v']=='excellent') & 
                      (merged_df['data_clarity_v']=='well') & (merged_df['data_clarity_i']=='well')]
# merge the data that is pretty good
good_df=merged_df[((merged_df['good_fit_i']=='excellent') | (merged_df['good_fit_i']=='good')) & 
                       ((merged_df['good_fit_v']=='excellent') | (merged_df['good_fit_v']=='good')) & 
                       ((merged_df['data_clarity_v']=='well') | (merged_df['data_clarity_v']=='ok')) & 
                       ((merged_df['data_clarity_i']=='well') | (merged_df['data_clarity_i']=='ok'))]


# Heteroscedasticity
Heteroscedasticity is when the regression errors systematically increase or decrease with the variables.  It can be seen by eye in this data.  Here, I explore if two data transforms will abolish heteroscedasticity.  White’s Lagrange Multiplier Test for Heteroscedasticity provides a p-value to assess whether the heteroscedasticity is statistically significant. Note that in transforms of the amplitude data, negative numbers are converted to positive.   
### Code

In [None]:
def pos_sqrt(n):
    if n<0:
        out = np.sqrt(np.abs(n))
    if n>=0:
        out = np.sqrt(n)
    if np.isnan(n):
        return np.nan
    return out

def pos_log(n):
    if n<0:
        out = np.log(np.abs(n))
    if n>=0:
        out = np.log(n)
    if np.isnan(n):
        return np.nan
    return out

def plot_hetero(df):
    variables=(('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i'),
               ('latency_v', 'latency_i'),
               ('decay_tau_v', 'decay_tau_i'))

    def plotting(var, df): 
        # plot data on left
        f.add_subplot(131)
        regression=smf.ols(formula='%s ~ %s' %(var[1], var[0]), data=df).fit() 
        _,_,_,p_unchanged=statsmodels.stats.diagnostic.het_white(regression.resid, regression.model.exog)
        sns.regplot(var[0], var[1], data=df, fit_reg=True, color ='k', label='p=%f' % p_unchanged)
#         sns.regplot(var[0], var[1], data=df[df['syn_excitation_v']=='in'], fit_reg=True, color ='r')
#         sns.regplot(var[0], var[1], data=df[df['syn_excitation_v']=='ex'], fit_reg=True, color ='g')
        plt.xlim([np.min(df[var[0]]), np.max(df[var[0]])])
        plt.ylim([np.min(df[var[1]]), np.max(df[var[1]])])
        plt.legend()


        # plot square root transformed data on right
        f.add_subplot(132)
        df['sqrt_'+var[0]]=df[var[0]].apply(lambda x: pos_sqrt(x))
        df['sqrt_'+var[1]]=df[var[1]].apply(lambda x: pos_sqrt(x))
        regression=smf.ols(formula='%s ~ %s' %('sqrt_'+var[1], 'sqrt_'+var[0]), data=df).fit() 
        _,_,_,p_sqrt=statsmodels.stats.diagnostic.het_white(regression.resid, regression.model.exog)
        sns.regplot('sqrt_'+var[0], 'sqrt_'+var[1], data=df, color ='k', label='p=%f' % p_sqrt)
#         sns.regplot('sqrt_'+var[0], 'sqrt_'+var[1], data=df[df['syn_excitation_v']=='in'], fit_reg=True, color ='r')
#         sns.regplot('sqrt_'+var[0], 'sqrt_'+var[1], data=df[df['syn_excitation_v']=='ex'], fit_reg=True, color ='g')
        plt.xlim([np.min(df['sqrt_'+var[0]]), np.max(df['sqrt_'+var[0]])])
        plt.ylim([np.min(df['sqrt_'+var[1]]), np.max(df['sqrt_'+var[1]])])
        plt.legend()

        # plot log transformed data on the far right
        f.add_subplot(133)
        df['log_'+var[0]]=df[var[0]].apply(lambda x: pos_log(x))
        df['log_'+var[1]]=df[var[1]].apply(lambda x: pos_log(x))
        regression=smf.ols(formula='%s ~ %s' %('log_'+var[1], 'log_'+var[0]), data=df).fit() 
        _,_,_,p_log=statsmodels.stats.diagnostic.het_white(regression.resid, regression.model.exog)
        sns.regplot('log_'+var[0], 'log_'+var[1], data=df, fit_reg=True, color ='k', label='p=%f' % p_log)
#         sns.regplot('log_'+var[0], 'log_'+var[1], data=df[df['syn_excitation_v']=='in'], fit_reg=True, color ='r')
#         sns.regplot('log_'+var[0], 'log_'+var[1], data=df[df['syn_excitation_v']=='ex'], fit_reg=True, color ='g')
        plt.xlim([np.min(df['log_'+var[0]]), np.max(df['log_'+var[0]])])
        plt.ylim([np.min(df['log_'+var[1]]), np.max(df['log_'+var[1]])])
        plt.legend()

    for var in variables:
        f=plt.figure(figsize=(20,6))
        if var==('decay_tau_v', 'decay_tau_i'):
            plotting(var,df[df['decay_tau_v']<.49])
        else:
            plotting(var, df)

        plt.show()
        
    return df

## Heteroscedasticity in 'excellent fit' data set
### Results
Amplitude, rise tau, and NMRSE require log transform to remove heteroscedasticity (p-value is less significant).  Latency and decay tau do not require a transform.  The following analysis testing catagorical regression significance and differences will incorporate the necessary transforms.

In [None]:
excellent_df=plot_hetero(excellent_df)

## Plot the 'well fit' (larger) data set
### Results
Similar to 'excellent fit' data set, NRMSE, rise_time, and amplitude require log transforms to remove heteroscedasticity and decay tau does not require a transform.  The one difference here is that the latency value shows a statistically significant p-value for heteroscedasticity when there is no transform, non significant with a sqrt transform and sigfificant again with a log transform.  I am going to ignore this deviation from the 'excellent fit' data since the data does not look blatently heteroscedastic, the p-value is only passes 95% significance, and it may be the case that one value has triggered the significant p-value here. 

In [None]:
good_df=plot_hetero(good_df)

# Code to test if the data can be fit via a regression

In [None]:
def regression_significance(df_list, catagory_type, data_type='', 
                            variables=(('NRMSE_v', 'NRMSE_i'),
                                       ('amp_v', 'amp_i'),
                                       ('rise_time_v', 'rise_time_i'),
                                       ('latency_v', 'latency_i'),
                                       ('decay_tau_v', 'decay_tau_i'))):
    """This test if the variables can be fit with regression lines. 
    Should later be used to make catagory DataFrames for testing 
    individual statistical significance."""
    font={'size':22}
    matplotlib.rc('font', **font)
    df_num=len(df_list)
    font={'size':22/df_num}
    matplotlib.rc('font', **font)

    
    if catagory_type=='excitation':
        col_name=('syn_excitation_v', 'syn_excitation_v')
        values=(('ex', 'ex', 'b'),
                ('in', 'in', 'r'))
        
    elif catagory_type=='cre':
        col_name=('pre_cre', 'post_cre')
        values=(('pvalb', 'pvalb', 'b'),
                ('rorb', 'rorb', 'r'),
                ('sim1', 'sim1', 'g'),
                ('tlx3','tlx3', 'm'),
                ('unknown', 'unknown', 'c'))
    elif catagory_type=='layer':
        col_name=('pre_layer', 'post_layer')
        values=(('2', '2','b'),
               ('2/3', '2/3', 'r'),
               ('3', '3', 'g'),
               ('4', '4', 'm'),
               ('5', '5', 'c'),
               ('6', '6', 'y'))
    else:
        raise Exception()
    
#     variables=(('NRMSE_v', 'NRMSE_i'),
#                ('amp_v', 'amp_i'),
#                ('rise_time_v', 'rise_time_i'),
#                ('latency_v', 'latency_i'),
#                ('decay_tau_v', 'decay_tau_i'))

    fs=20.                  
    def plotting(var, df_list):
        for ii,df in enumerate(df_list):
            plt.subplot(1,df_num, ii+1)
            mod = smf.ols(formula='%s ~ %s' % (data_type+var[1], data_type+var[0]), data=df)
            res=mod.fit()
            sns.regplot(data_type+var[0], data_type+var[1], data=df, fit_reg=True, color ='k', 
                            label='all, n=%i, p_slope=%f' % 
                            (len(df), res.pvalues[1]))
            for value in values: 
                plot_df=df[(df[col_name[0]]==value[0]) & (df[col_name[1]]==value[1])]
                mod = smf.ols(formula='%s ~ %s' % (data_type+var[1], data_type+var[0]), data=plot_df)
                res=mod.fit()
                sns.regplot(data_type+var[0], data_type+var[1], data=plot_df, fit_reg=True, color=value[2], 
                            label='%s to %s, n=%i, slope=%f, p_slope=%f' % 
                            (value[0], value[1], len(plot_df),res.params[1], res.pvalues[1]))
    #             sns.regplot(var[0], var[1], data=plot_df, fit_reg=True, color=value[2], 
    #                         label='%s to %s, n=%i, slope=%.2E, int=%.2E' % 
    #                         (value[0], value[1], len(plot_df), res.params[var[0]], res.params.Intercept))
            plt.xlim([np.min(df[data_type+var[0]]), np.max(df[data_type+var[0]])])
            plt.ylim([np.min(df[data_type+var[1]]), np.max(df[data_type+var[1]])])
            plt.legend()
    
    for var in variables:
        plt.figure(figsize=(fs,fs/df_num))
        if var==('decay_tau_v', 'decay_tau_i'):
            sub_df_list=[]
            for df in df_list:
                new=df[df['decay_tau_v']<.49]
                sub_df_list.append(new)
            plotting(var,sub_df_list)
        else:
            plotting(var, df_list)        
        plt.show()
        
        

# regression_significance([excellent_df, good_df], 'excitation', data_type='log_', variables=(('NRMSE_v', 'NRMSE_i'),
#                ('amp_v', 'amp_i'),
#                ('rise_time_v', 'rise_time_i')))
# regression_significance([excellent_df, good_df], 'excitation', data_type='', variables=(
#                ('latency_v', 'latency_i'),
#                ('decay_tau_v', 'decay_tau_i')))    

# Code to test if linear regressions of different catagories are statistically different
## Would like to split this up to excellent and well on right and left and add significance to plot 

In [None]:
def cat_sig_diff(df, catagory_type, data_type='', variables=(('NRMSE_v', 'NRMSE_i'),
                                       ('amp_v', 'amp_i'),
                                       ('rise_time_v', 'rise_time_i'),
                                       ('latency_v', 'latency_i'),
                                       ('decay_tau_v', 'decay_tau_i'))):
    """
    df: pandas DataFrame
        should be proccessed from catagory_df
    catagory: string
        specifies which catagory to apply. Options are 'cre','layer', 'excitation'.
    data_type: string
        specifies which data transform you are using.
        options: ''       non transformed data
                 'log_'   sqrt of data (negative values made positive before transform)
                 'sqrt_'  log of data (negative values made positive before transform)
    Note: make sure the supplied df corresponds to the supplied catagory_type 
    """

    if catagory_type=='excitation':
        col_name='syn_excitation_v'
        print ('TESTING EXCITATORY AND INHIBITION')
    elif catagory_type=='cre':
        col_name='cre_catagory'
        print ('TESTING CRE LINE CATAGORIES')
    elif catagory_type=='layer':
        col_name='layer_catagory'
        print ('TESTING LAYER CATAGORIES')

    else:
        raise Exception('catagory doesnt exist')
        
    for var in variables:
        # each catagory is allowed independent intercepts and slopes
        model=smf.ols(formula='%s ~ %s * %s' %(data_type+var[1], col_name, data_type+var[0]), data=df).fit()
        # all data lumped together
        int_same_model=smf.ols(formula='%s ~ %s' %(data_type+var[1], data_type+var[0]), data=df).fit() 
        # each catagory is allowed independent intercepts
        int_diff_model=smf.ols(formula='%s ~ %s + %s' %(data_type+var[1],col_name, data_type+var[0]), data=df).fit()

        print var
        # Note that simplier model must go first in the sm.satats.anova_lm!!!!
        # check to see if the intercepts are significantly different
        print('check if intercepts are different')
        print(sm.stats.anova_lm(int_same_model, int_diff_model))
        # check to see if slopes are significantly different
        print('check if slopes are different')
        print(sm.stats.anova_lm(int_diff_model, model))
        print("\n")
        
        # plot catagories
        plt.figure(figsize=(10,10))
        for value in df[col_name].unique(): 
            plot_df=df[df[col_name]==value]
#             mod = smf.ols(formula='%s ~ %s' % (data_type+var[1], data_type+var[0]), data=plot_df)
#             res=mod.fit()
            sns.regplot(data_type+var[0], data_type+var[1], data=plot_df, fit_reg=True, 
                        label='%s, n=%i' % (value, len(plot_df)))
        plt.xlim([np.min(df[data_type+var[0]]), np.max(df[data_type+var[0]])])
        plt.ylim([np.min(df[data_type+var[1]]), np.max(df[data_type+var[1]])])
        plt.legend()
        plt.show()
        

        
# cat_sig_diff(cre_cat_df, 'cre', data_type='sqrt_')
# cat_sig_diff(layer_cat_df, 'layer', data_type='sqrt_')
# cat_sig_diff(excellent_df, 'excitation', data_type='log_')


# Code to reduce DataFrames to catagories to be tested

In [None]:
def create_catagory_df(df, catagory_type):
    ''' To test if the different catagories are the same we must make new catagorical variables
    Must get rid of data from the data set we don't want (it might be true that statmodels
    skips empty cells but I don't know) and set a catagory value to the rest.'''
    
    if catagory_type=='cre':
        col_name=('pre_cre', 'post_cre')
        values=(('pvalb', 'pvalb'),
#                ('rorb', 'rorb'),
                ('sim1', 'sim1'),
#                ('tlx3','tlx3'),
                ('unknown', 'unknown'))
    elif catagory_type=='layer':
        col_name=('pre_layer', 'post_layer')
        values=(('2', '2'),
               ('2/3', '2/3'),
#               ('3', '3'),
               ('4', '4'),
               ('5', '5'))
#               ('6', '6'))
    else:
        raise Exception('catagory doesnt exist')
        
    #for each cre catagory get a subset make a new column for catagory
    new_df=pd.DataFrame()
    for value in values: 
        cat_df=df[(df[col_name[0]]==value[0]) & (df[col_name[1]]==value[1])]
        key=catagory_type+'_catagory'
        v=value[0]+'_to_'+value[1]
        cat_df[key]=v
        # concatenate to whole dataFrame
#         print(new_df)
#         print(cat_df)
        new_df=pd.concat([new_df, cat_df], axis=0, join='outer', join_axes=None, ignore_index=True,
          keys=None, levels=None, names=None, verify_integrity=False,
          copy=False)
    return new_df

# Excitation

### Results:


## Significance of regression for excitation catagories
#### Results
All parameters except decau tau have statistically significant regression lines for inhibitory and excitatory catagories for both 'excellent fit' data (shown on left) and 'well fit' data (shown on right)

In [None]:
regression_significance([excellent_df, good_df], 'excitation', data_type='log_', variables=(('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
regression_significance([excellent_df, good_df], 'excitation', data_type='', variables=(
               ('latency_v', 'latency_i'),
               ('decay_tau_v', 'decay_tau_i')))  

## Are excitatory and inhibitory linear regressions significantly different

### Results 
NRMSE is not significantly different. <br>
Amplitude is significantly different. <br>
Rise_time is significantly different. <br>
Latency intercept is significantly difference but not the slope.  Not sure what the interpretation for this would be. 

## 'Excellent fit' data

In [None]:
cat_sig_diff(excellent_df, 'excitation', data_type='log_', variables=(('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
cat_sig_diff(excellent_df, 'excitation', data_type='', variables=(
               ('latency_v', 'latency_i'),))
# not doing decay tau as it doesnt is not related
#               ('decay_tau_v', 'decay_tau_i')))


## 'Well fit' data

In [None]:
cat_sig_diff(good_df, 'excitation', data_type='log_', variables=(('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
cat_sig_diff(good_df, 'excitation', data_type='', variables=(
               ('latency_v', 'latency_i'),))
# not doing decay tau as it doesnt is not related
#               ('decay_tau_v', 'decay_tau_i')))

# Cre lines

## Cre line groups available in the data
### Results
pv to pv, sim1 to sim1, and unknown to unknown have enough points in both excellent and good fits.

rorb to rorb and tlx3 to tlx3 are on the cusp with 6 data points each in the excellent data set and only a couple are added with the good data set.  Not sure whether these will be enough to add significance.

In [None]:
# look at what is in data sets
print('excellent fits')
print(excellent_df.groupby(['pre_cre', 'post_cre']).size())

print('\ngood fits')
print(good_df.groupby(['pre_cre', 'post_cre']).size())

## Significance of regression for cre-line catagories
#### Results
NRMSE <b>
pv to pv, sim1 to sim1, and unknown to unknown have a statistically significant slope in both 'excellent' and 'well' fit data.  Rorb to rorb is not statistically significant in either case and tlx3 to tlx3 is significant in 'excellent' and not in 'well'.  

In [None]:
#plot all the data
regression_significance([excellent_df, good_df], 'cre', data_type='log_', variables=(
               ('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
regression_significance([excellent_df, good_df], 'cre', data_type='', variables=(
               ('latency_v', 'latency_i'),
               ('decay_tau_v', 'decay_tau_i'))) 

## Create reduced catagory DataFrame for cre-lines based on regression significance tests

In [None]:
cre_cat_excellent_df=create_catagory_df(excellent_df, 'cre')
print(cre_cat_excellent_df.groupby('cre_catagory').size())
cre_cat_good_df=create_catagory_df(good_df, 'cre')
print(cre_cat_good_df.groupby('cre_catagory').size())

## Are cre-lines with statistically significant regression lines significantly different from one another?

### 'Excellent fit' data

In [None]:
cat_sig_diff(cre_cat_excellent_df, 'cre', data_type='log_', variables=(('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
cat_sig_diff(cre_cat_excellent_df, 'cre', data_type='', variables=(
               ('latency_v', 'latency_i'),))

### 'Well fit' data

In [None]:
cat_sig_diff(cre_cat_good_df, 'cre', data_type='log_', variables=(('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
cat_sig_diff(cre_cat_good_df, 'cre', data_type='', variables=(
               ('latency_v', 'latency_i'),))

# Layer catagories

## Layer groups available in the data
It appears adding the 'good' fits might add significance to the smaller groups.  For example layer 6 to layer 6 is doubled.???????

In [None]:
# look at what is in data sets
print('excellent fits')
print(excellent_df.groupby(['pre_layer', 'post_layer']).size())

print('\ngood fits')
print(good_df.groupby(['pre_layer', 'post_layer']).size())

## Regression significance for different layer catagories
### Results
'Excellent fit' data shown on the left and 'well fit' data on the right.  

In [None]:
#plot all the data
regression_significance([excellent_df, good_df], 'layer', data_type='log_', variables=(
               ('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
regression_significance([excellent_df, good_df], 'layer', data_type='', variables=(
               ('latency_v', 'latency_i'),
               ('decay_tau_v', 'decay_tau_i'))) 

## Create reduced catagory DataFrame for layers based on regression significance tests

In [None]:
layer_cat_excellent_df=create_catagory_df(excellent_df, 'layer')
print(layer_cat_excellent_df.groupby('layer_catagory').size())
layer_cat_good_df=create_catagory_df(good_df, 'layer')
print(layer_cat_good_df.groupby('layer_catagory').size())

## Are layers with statistically significant regression lines significantly different from one another?

### 'Excellent fit' data

In [None]:
cat_sig_diff(layer_cat_excellent_df, 'layer', data_type='log_', variables=(('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
cat_sig_diff(layer_cat_excellent_df, 'layer', data_type='', variables=(
               ('latency_v', 'latency_i'),))

### 'Well fit' data

In [None]:
cat_sig_diff(layer_cat_good_df, 'layer', data_type='log_', variables=(('NRMSE_v', 'NRMSE_i'),
               ('amp_v', 'amp_i'),
               ('rise_time_v', 'rise_time_i')))
cat_sig_diff(layer_cat_good_df, 'layer', data_type='', variables=(
               ('latency_v', 'latency_i'),))

# Below is an example on using NRMSE in cre lines.  All conditions will be in a section below

In [None]:
#catagorical stats of regressions of different cre lines note that they match plots
model=smf.ols(formula='NRMSE_i ~ cre_catagory * NRMSE_v', data=cre_cat_df).fit()
model.summary()

In [None]:
# check to see if the intercept is significantly different
int_same_model=smf.ols(formula='NRMSE_i ~ NRMSE_v', data=cre_cat_df).fit() 
int_diff_model=smf.ols(formula='NRMSE_i ~ cre_catagory + NRMSE_v', data=cre_cat_df).fit()
# I don't understand why the p-value is nan here....
sm.stats.anova_lm(int_diff_model, int_same_model)

In [None]:
#note the does not perfectly match the regression plot with all the data because only a subset of data is being used here.
int_same_model.summary()

In [None]:
# check to see if slopes are significantly different
sm.stats.anova_lm(int_diff_model, model)

# Show images of the voltage and current clamp best fits used in analysis above 

In [None]:
from IPython.display import Image, display 
import matplotlib.image as mpimg
for p in excellent_df[['image_path_i', 'image_path_v']].iterrows():
#    if type(p) is str:
    print (p[1].image_path_i)
    print (p[1].image_path_v)    
#     display(Image(filename=p[1].image_path_i, width=400, height=400))
#     display(Image(filename=p[1].image_path_v, width=400, height=400)) 
    f=plt.figure(figsize=(20,10))
    f.add_subplot(121)
    plt.imshow(mpimg.imread(p[1].image_path_i))
    plt.axis('off')
    f.add_subplot(122)
    plt.imshow(mpimg.imread(p[1].image_path_v)) 
    plt.axis('off')
    plt.show()
