# Description

In high throughput mouse phenotyping found thresholds are usually aggregated on two levels: 
1. thresholds for all stimuli of one individual are aggregated to an hearing curve,
2. hearing curves are aggregated to display mutant vs. control threshold means or medians.

In this notebook, raw data was subjected to NN and SLR threshold finding for all mice in both datasets ([GMC](https://www.mouseclinic.de/) and [ING](https://journals.plos.org/plosbiology)).

In a first step, all thresholds, both those determined manually and those determined automatically, are combined into a single data set.</br> 
Hearing curves are then generated for all mice in the data set to compare the differences between the hearing curves of mutants and controls using the three methods (manual, NN, SLR).

To  detect  mutant mouse lines that exhibit potential biologically meaningful changes in hearing two statistical metrics are used:
1. effect size, which descriptively spoken measures the degree of overlap between mutant and control group distributions of a stimulus-specific threshold. As no normal distribution can be assumed, **Cliff’s Delta** was used, which ranges between -1 and 1.
2. significance, using p-values resulting from a **Wilcoxon rank sum test**, defined as the probability of getting a test statistics as large or larger assuming mutant and control distributions are the same. 

These two metrics are displayed using so-called **volcano plots**.

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline 

In [None]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# Load libraries

In [None]:
import os
import re
import warnings
import time

import pandas as pd
import numpy as np
import seaborn as sns

import matplotlib.pyplot as plt

from matplotlib.patches import Patch
from matplotlib.lines import Line2D

from matplotlib.backends.backend_pdf import PdfPages

os.environ["CUDA_VISIBLE_DEVICES"]=""

warnings.filterwarnings("ignore")

# Definitions

In [None]:
"""Set the path to the data files, for example '../data'"""
path2data = ''
"""Set the path to result data"""
path2results = ''

# Prepare the data set

### Load ING data

Load the ABR curves and the mouse phenotyping data provided by Ingham et al. as well as the NN predicted and the SLR estimated thresholds.</br>
The files can be found under the path specified by _path2data_:

* _ING/ING_abr_curves.csv_, 
* _ING/ING_mouse_data.csv_,

or by _path2results_:

* NN GMC-ING predictions: _ING/ING_data_GMCtrained_NN_predictions.csv_
* NN ING-ING predictions: _ING/ING_data_INGtrained_NN_predictions.csv_
* SLR GMC-ING estimations: _ING/ING_data_GMCcalibrated_SLR_estimations.csv_
* SLR ING-ING estimations: _ING/ING_data_INGcalibrated_SLR_estimations.csv_


In [None]:
"""Load the ABR curves""" 
ING_data = pd.read_csv(os.path.join(path2data, 'ING', 'ING_abr_curves.csv'))

"""Load the threshold predictions made by neural networks trained with GMC-data (GMCtrained_NN)"""
ING_data_predictions1 = pd.read_csv(os.path.join(path2results, 'ING_data_GMCtrained_NN_predictions.csv'))

"""Load the threshold predictions made by neural networks trained with Ingham et al.-data (INGtrained_NN)"""
ING_data_predictions2 = pd.read_csv(os.path.join(path2results, 'ING_data_INGtrained_NN_predictions.csv'))

"""Load the threshold estimations made by the SLR method calibrated on GMC training data (GMCcalibrated_SLR)"""
ING_data_estimations1 = pd.read_csv(os.path.join(path2results, 'ING_data_GMCcalibrated_SLR_estimations.csv'))

"""Load the threshold estimations made by the SLR method calibrated on Ingham et al. training data (INGcalibrated_SLR)"""
ING_data_estimations2 = pd.read_csv(os.path.join(path2results, 'ING_data_INGcalibrated_SLR_estimations.csv'))

In [None]:
"""Load the mouse phenotyping data"""
ING_mouse_data = pd.read_csv(os.path.join(path2data, 'ING', 'ING_mouse_data.csv'))

### Load GMC data

Load the ABR curves and the mouse phenotyping data from the German Mouse Clinic as well as the NN predicted and the SLR estimated thresholds.</br>
The files can be found under the path specified by _path2data_:

* _GMC/GMC_abr_curves.csv_,
* _GMC/GMC_mouse_data.csv_,

or by _path2results_:

* NN GMC-GMC predictions: _GMC_data_GMCtrained_NN_predictions.csv_,
* NN ING-GMC predictions: _GMC_data_INGtrained_NN_predictions.csv_,
* SLR GMC-GMC estimations: _GMC_data_GMCcalibrated_SLR_estimations.csv_,
* SLR ING-GMC estimations: _GMC_data_INGcalibrated_SLR_estimations.csv_.

In [None]:
"""Load the ABR curves"""
GMC_data = pd.read_csv(os.path.join(path2data, 'GMC', 'GMC_abr_curves.csv'))

"""Load the threshold predictions made by neural networks trained with GMC data (GMCtrained_NN)"""
GMC_data_predictions1 = pd.read_csv(os.path.join(path2results, 'GMC_data_GMCtrained_NN_predictions.csv'))

"""Load the threshold predictions made by neural networks trained with Ingham et al. data (INGtrained_NN)"""
GMC_data_predictions2 = pd.read_csv(os.path.join(path2results, 'GMC_data_INGtrained_NN_predictions.csv'))

"""Load the threshold estimations made by the SLR method calibrated on GMC training data (GMCcalibrated_SLR)"""
GMC_data_estimations1 = pd.read_csv(os.path.join(path2results, 'GMC_data_GMCcalibrated_SLR_estimations.csv'))

"""Load the threshold estimations made by the SLR method calibrated on Ingham et al. training data (INGcalibrated_SLR)"""
GMC_data_estimations2 = pd.read_csv(os.path.join(path2results, 'GMC_data_INGcalibrated_SLR_estimations.csv'))

In [None]:
"""Load the mouse phenotyping data"""
GMC_mouse_data = pd.read_csv(os.path.join(path2data, 'GMC', 'GMC_mouse_data.csv'))

"""Exclude the mice in cohorts not finished yet"""
mouse_ids2exclude = np.load(os.path.join(path2data, 'GMC', 'GMC_mice_with_missing_ref_cohorts.npy'))
GMC_mouse_data = GMC_mouse_data[~GMC_mouse_data.mouse_id.isin(mouse_ids2exclude)]

In [None]:
"""Check if mice in reference cohorts can be found in the data set"""
cons = GMC_mouse_data[GMC_mouse_data.cohort_type == 'control']
muts = GMC_mouse_data[GMC_mouse_data.cohort_type == 'mutant']

print('Number of controls: %i' % cons.mouse_id.nunique())
print('Number of mutants: %i' % muts.mouse_id.nunique())
print('Number of rows: %i' % GMC_mouse_data.index.nunique())

muts[~muts.reference_cohort.isin(cons.cohort_id.unique())]

### Process data

The resulting data set should contain the following columns:<br/> 
* **mouse_id**, 
* **sex**, 
* **cohort_type** (mut|con), 
* **gene**, 
* **exp_date** (yyyy-mm-dd), 
* **source** (GMC|ING), 
* **stimulation** (click|6|12|18|24|30), 
* **th_manual**, 
* **th_NN_GMCtrained**, 
* **th_NN_INGtrained**, 
* **th_SLR_GMCcalibrated**, 
* **th_SLR_INGcalibrated**

In [None]:
mouse_columns = ['mouse_id', 'sex', 'cohort_type', 'cohort_id', 'reference_cohort', 'gene', 'exp_date']
exp_columns = ['simulation', 'th_manual', 'th_NN_GMCtrained', 'th_NN_INGtrained', 'th_SLR_GMCcalibrated', 'th_SLR_INGcalibrated']

#### GMC data

In [None]:
"""Ensure that phenotyping data is available for all mice for which ABR curves have been measured"""
GMC_data = GMC_data[GMC_data.mouse_id.isin(GMC_mouse_data.mouse_id.unique())]
display(GMC_data.head(2))

In [None]:
"""Adding mouse phenotyping data for all mice with measured ABR curves"""
print(' Number of rows: %d / mice: %d' % (GMC_data.index.nunique(), GMC_data.mouse_id.nunique()))
GMC_merged = pd.merge(left=GMC_data[['mouse_id', 'frequency', 'threshold']].drop_duplicates(), 
                      right=GMC_mouse_data[['mouse_id', 'mouse_sex', 'cohort_type', 'cohort_id', 'reference_cohort', 'gene_symbol', 'exp_date']].drop_duplicates(), 
                      how='left', on='mouse_id')
print(' Number of rows after merging: %d / mice: %d' % (GMC_merged.index.nunique(), GMC_merged.mouse_id.nunique()))

GMC_merged.rename(columns={'mouse_sex': 'sex', 'gene_symbol': 'gene'}, inplace=True)
GMC_merged.at[:,'source'] = 'GMC'
GMC_merged = GMC_merged[mouse_columns + ['source', 'frequency', 'threshold']]

display(GMC_merged.head(2))

In [None]:
"""Adding GMC trained NNs threshold predictions to data set"""
print(' Number of rows: %d' % GMC_merged.index.nunique())
GMC_merged = pd.merge(left=GMC_merged, 
                      right=GMC_data_predictions1[['mouse_id', 'frequency', 'threshold', 'nn_predicted_thr']], 
                      how='left', on=['mouse_id', 'frequency', 'threshold'])
print(' Number of rows after merging: %d' % GMC_merged.index.nunique())

GMC_merged.rename(columns={'nn_predicted_thr': 'th_NN_GMCtrained'}, inplace=True)
display(GMC_merged.head(2))

In [None]:
"""Adding ING trained NNs threshold predictions to data set"""
print(' Number of rows: %d' % GMC_merged.index.nunique())
GMC_merged = pd.merge(left=GMC_merged, 
                      right=GMC_data_predictions2[['mouse_id', 'frequency', 'threshold', 'nn_predicted_thr']], 
                      how='left', on=['mouse_id', 'frequency', 'threshold'])
print(' Number of rows after merging: %d' % GMC_merged.index.nunique())
GMC_merged.rename(columns={'nn_predicted_thr': 'th_NN_INGtrained'}, inplace=True)
display(GMC_merged.head(2))

In [None]:
"""Adding GMC calibrated SLR estimations to data set"""
print(' Number of rows: %d' % GMC_merged.index.nunique())
GMC_merged = pd.merge(left=GMC_merged, 
                      right=GMC_data_estimations1[['mouse_id', 'frequency', 'threshold', 'slr_estimated_thr']], 
                      how='left', on=['mouse_id', 'frequency', 'threshold'])
print(' Number of rows after merging: %d' % GMC_merged.index.nunique())
GMC_merged.rename(columns={'slr_estimated_thr': 'th_SLR_GMCcalibrated'}, inplace=True)
display(GMC_merged.head(2))

In [None]:
"""Adding ING calibrated SLR estimations to data set"""
print(' Number of rows: %d' % GMC_merged.index.nunique())
GMC_merged = pd.merge(left=GMC_merged, 
                      right=GMC_data_estimations2[['mouse_id', 'frequency', 'threshold', 'slr_estimated_thr']], 
                      how='left', on=['mouse_id', 'frequency', 'threshold'])
print(' Number of rows after merging: %d' % GMC_merged.index.nunique())
GMC_merged.rename(columns={'slr_estimated_thr': 'th_SLR_INGcalibrated'}, inplace=True)
display(GMC_merged.head(2))

#### ING data

In [None]:
"""Adding new column for the date of experiment"""
for idx in ING_mouse_data.index:
    date = ING_mouse_data.at[idx, 'Test Date']
    if date==date: 
        if date=='Test Date' or date.isdigit(): 
            ING_mouse_data.at[idx, 'exp_date'] = np.nan
        else:
            ING_mouse_data.at[idx, 'exp_date'] = date.replace(' 00:00:00', '')

In [None]:
"""Adding mouse phenotyping data for all mice with measured ABR curves"""
print(' Number of rows: %d / mice: %d' % (ING_data.index.nunique(), ING_data.mouse_id.nunique()))
ING_merged = pd.merge(left=ING_data[['mouse_id', 'frequency', 'threshold']].drop_duplicates(), 
                      right=ING_mouse_data[['mouse_id', 'Gene', 'exp_date', 'cohort_type']].drop_duplicates(), 
                      how='left', on='mouse_id')
print(' Number of rows after merging: %d / mice: %d' % (ING_merged.index.nunique(), ING_merged.mouse_id.nunique()))

ING_merged.rename(columns={'Gene': 'gene'}, inplace=True)
ING_merged.at[:,'source'] = 'ING'
ING_merged.at[:,'sex'] = np.nan
ING_merged.at[:,'cohort_id'] = np.nan
ING_merged.at[:,'reference_cohort'] = np.nan

ING_merged = ING_merged[mouse_columns + ['source', 'frequency', 'threshold']]

display(ING_merged.head(2))

In [None]:
"""Adding GMC trained NNs predictions to data set"""
print(' Number of rows: %d' % ING_merged.index.nunique())
ING_merged = pd.merge(left=ING_merged, 
                      right=ING_data_predictions1[['mouse_id', 'frequency', 'threshold', 'nn_predicted_thr']], 
                      how='left', on=['mouse_id', 'frequency', 'threshold'])
print(' Number of rowy after merging: %d' % ING_merged.index.nunique())

ING_merged.rename(columns={'nn_predicted_thr': 'th_NN_GMCtrained'}, inplace=True)
display(ING_merged.head(2))

In [None]:
"""Adding ING trained NNs predictions to data set"""
print(' Number of rows: %d' % ING_merged.index.nunique())
ING_merged = pd.merge(left=ING_merged, 
                      right=ING_data_predictions2[['mouse_id', 'frequency', 'threshold', 'nn_predicted_thr']], 
                      how='left', on=['mouse_id', 'frequency', 'threshold'])
print(' Number of rows after merging: %d' % ING_merged.index.nunique())

ING_merged.rename(columns={'nn_predicted_thr': 'th_NN_INGtrained'}, inplace=True)
display(ING_merged.head(2))

In [None]:
"""Adding GMC calibrated SLR estimations to data set"""
print(' Number of rows: %d' % ING_merged.index.nunique())
ING_merged = pd.merge(left=ING_merged, 
                      right=ING_data_estimations1[['mouse_id', 'frequency', 'threshold', 'slr_estimated_thr']], 
                      how='left', on=['mouse_id', 'frequency', 'threshold'])
print(' Number of rows after merging: %d' % ING_merged.index.nunique())

ING_merged.rename(columns={'slr_estimated_thr': 'th_SLR_GMCcalibrated'}, inplace=True)
display(ING_merged.head(2))

In [None]:
"""Adding ING calibrated SLR estimations to data set"""
print(' Number of rows: %d' % ING_merged.index.nunique())
ING_merged = pd.merge(left=ING_merged, 
                      right=ING_data_estimations2[['mouse_id', 'frequency', 'threshold', 'slr_estimated_thr']], 
                      how='left', on=['mouse_id', 'frequency', 'threshold'])
print(' Number of rows after merging: %d' % ING_merged.index.nunique())

ING_merged.rename(columns={'slr_estimated_thr': 'th_SLR_INGcalibrated'}, inplace=True)
display(ING_merged.head(2))

# Save the data set

In [None]:
"""Merge GMC and ING data sets"""
merged_data = pd.concat([GMC_merged, ING_merged], ignore_index=True)
display(merged_data.head(5))

In [None]:
for idx in merged_data.index:
    freq = merged_data.at[idx, 'frequency']
    if freq == 100:
        merged_data.at[idx, 'stimulation'] = 'click'
    else:
        merged_data.at[idx, 'stimulation'] = int(freq/1000)
    cohort_type = merged_data.at[idx, 'cohort_type']
    if cohort_type == 'control': 
        merged_data.at[idx, 'cohort_type'] = 'con'
    elif cohort_type == 'mutant': 
        merged_data.at[idx, 'cohort_type'] = 'mut'
display(merged_data.head(5))

In [None]:
for idx in merged_data.index:
    cohort_type = merged_data.at[idx, 'cohort_type']
    if cohort_type == 'con': 
        merged_data.at[idx, 'gene'] = np.nan
# merged_data = merged_data[merged_data.columns.drop('frequency')]
merged_data.rename(columns={'threshold': 'th_manual'}, inplace=True)
display(merged_data.head(5))

In [None]:
merged_data.to_csv(os.path.join(path2data, 'data4hearing_curves_analysis.csv'), index=False)

# Hearing curve analysis

## Load data

In [None]:
data = pd.read_csv(os.path.join(path2data, 'data4hearing_curves_analysis.csv'))
display(data.head(2))

## Plotting settings

In [None]:
plt.rcParams['figure.figsize'] = [20, 16]

colors = {'th_manual': '#004488'
          , 'th_manual_mut': '#004488'
          , 'th_manual_con': '#6699CC' 
          , 'th_manual_ref_con': '#332288'
          , 'th_manual_all_ref_cons': '#88CCEE'
          , 'th_NN_GMCtrained': '#225522'
          , 'th_NN_GMCtrained_mut': '#225522'
          , 'th_NN_GMCtrained_con': '#5AAE61'
          , 'th_NN_GMCtrained_ref_con': '#225555'
          , 'th_NN_GMCtrained_all_ref_cons': '#CCEEFF'
          , 'th_SLR_GMCcalibrated': '#B2182B'
          , 'th_SLR_GMCcalibrated_mut': '#B2182B'
          , 'th_SLR_GMCcalibrated_con': '#D6604D'
          , 'th_SLR_GMCcalibrated_ref_con': '#882255'
          , 'th_SLR_GMCcalibrated_all_ref_cons': '#FFCCCC'
          , 'th_NN_INGtrained': '#225522'
          , 'th_NN_INGtrained_mut': '#225522'
          , 'th_NN_INGtrained_con': '#5AAE61'
          , 'th_SLR_INGcalibrated': '#B2182B'
          , 'th_SLR_INGcalibrated_mut': '#B2182B'
          , 'th_SLR_INGcalibrated_con': '#D6604D'}

std_colors = {'th_manual': '#004488'
             , 'th_NN_GMCtrained': '#CCDDAA'
             , 'th_NN_INGtrained': '#FDDBC7'
             , 'th_SLR_GMCcalibrated': '#FFCCCC'
             , 'th_SLR_INGcalibrated': '#EEEEBB'}

markers = {'con': 'o'
           , 'ref_con': 's'
           , 'mut': '^'
           , 'th_manual': 'o'
           , 'th_NN_GMCtrained': '^'
           , 'th_SLR_GMCcalibrated': '*'
           , 'th_manual': 'o'
           , 'th_NN_INGtrained': '^'
           , 'th_SLR_INGcalibrated': '*'}

linestyles = {'mut': 'solid'
              , 'con': 'dashed'
              , 'ref_con': 'dashdot'
              , 'th_manual': 'solid'
              , 'th_NN_GMCtrained': 'dotted'
              , 'th_SLR_GMCcalibrated': 'dashdot'
              , 'th_NN_INGtrained': 'dotted'
              , 'th_SLR_INGcalibrated': 'dashdot'}

labels = {'th_manual': 'Manually assigned thresholds' 
          , 'th_NN_GMCtrained': 'Thresholds predicted by GMC trained NNs'
          , 'th_NN_INGtrained': 'Thresholds predicted by ING trained NNs'
          , 'th_SLR_GMCcalibrated': 'Thresholds estimated by GMC calibrated SLR'
          , 'th_SLR_INGcalibrated': 'Thresholds estimated by ING calibrated SLR'}

title_fontsize=20

"""Quantile calculation"""
def q1(x):
    return x.quantile(0.25)

def q3(x):
    return x.quantile(0.75)

def q(x, val):
    return x.quantile(val)

In [None]:
def get_result_file_names(_gene, _formats, _data_source): 
    
    """
    Sets the path to the files in which the visualisation plots are to be saved.
    
    Parameters
    ----------
        _gene: string
            A given gene name indicating the name of the folder.
            
        _formats: list
            The file formats in which to save the plots.
            
        _data_source: string
            The source of the data: GMC or ING.
            
    Returns
    -------
        The folder and file names.
    
    """
    
    if ' / ' in _gene:
        gene_name = str(_gene).replace(' / ', '_').strip()
    elif ' ' in _gene: 
        gene_name = str(_gene).replace(' ', '_').strip()
    elif ':' in _gene:
        gene_name = str(_gene).replace(':', '_').strip()
    else: 
        gene_name = str(_gene).strip()
    
    gene_dir = {}
    file = {}
    
    dir_name = 'hearing_curve_analysis' 
    
    for fmt in _formats:
        gene_dir[fmt] = os.path.join(path2results, dir_name, str(fmt)+'_format', _data_source)
        if not os.path.exists(gene_dir[fmt]):
            os.mkdir(gene_dir[fmt])
        gene_dir[fmt] = os.path.join(gene_dir[fmt], gene_name)
        if not os.path.exists(gene_dir[fmt]):
            os.mkdir(gene_dir[fmt])
        file[fmt] = os.path.join(gene_dir[fmt], gene_name + '_thr_median')
    
    return gene_dir,file

In [None]:
def plot_thresholds4gene(_gene, _th_cols, _data, _colors, _markers, _linestyles, _labels, 
                         _dodge=True, _fontsize=16, _markersize=7, _markerscale=1.5, 
                         _cohort_type=True, _estimator=np.mean, _ci='sd', _fst_quantile=0.05, _last_quantile=0.95, 
                         _mouse_id=None, 
                         _legend_outside=False, _figlegend=False, _xlabel=None, _ylabel=None, 
                         _fig=None, _ax=None):
    """
    Plots hearing curves for a given gene in the data set and returns the legend elements of the plot.
    """
    
    if _ax is not None and _fig is not None:
        fig = _fig
        ax = _ax
    else:
        fig, ax = plt.subplots()
        
    _data = _data.copy()
    
    for col in _th_cols:
        _data[col] = [100 if _data.at[idx, col] == 999 else _data.at[idx, col] for idx in _data.index]
    
    th = re.compile('th_*')
    all_th_columns = [col for col in _data.columns if th.match(col)]
    
    """create dataframe with the gene mutants and the corresponding controls"""
    gene_data = _data[_data.gene == _gene]
    if _cohort_type:
        if _data.source.unique().squeeze() == 'GMC':
            gene_data = pd.concat([gene_data, _data[_data.cohort_id.isin(gene_data.reference_cohort.unique())]])
            gene_data = gene_data.replace('con', 'ref_con')
        if _data.source.unique().squeeze() == 'ING' and 'pipeline' in _data.columns: 
            gene_data = pd.concat([gene_data, _data[(_data.cohort_type == 'con') & _data.pipeline.isin(gene_data.pipeline.unique())]])
        else:
            gene_data = pd.concat([gene_data, _data[_data.cohort_type == 'con']])
        gene_data = gene_data.reset_index(drop=True)
    
    """add threshold type column"""
    th_data = []
    for col in _th_cols:
        mouse_columns = list(gene_data.columns.drop(all_th_columns))
        mouse_columns.append(str(col)) 
        
        tmp_data = gene_data[mouse_columns]
        tmp_data = tmp_data.rename(columns={col: 'th_value'})
        
        for idx in tmp_data.index:
            stim = tmp_data.at[idx, 'stimulation']
            coh_type = tmp_data.at[idx, 'cohort_type']
            if stim == 'click': 
                if _cohort_type:
                    tmp_data.at[idx, 'th_type'] = col+'_con_click' if coh_type=='con' else col+'_ref_con_click' if coh_type=='ref_con' else col+'_mut_click'
                else:
                    tmp_data.at[idx, 'th_type'] = col+'_click'
            else:
                if _cohort_type:
                    tmp_data.at[idx, 'th_type'] = col+'_con_freq' if coh_type=='con' else col+'_ref_con_freq' if coh_type=='ref_con' else col + '_mut_freq'
                else:
                    tmp_data.at[idx, 'th_type'] = col+'_freq'
        th_data.append(tmp_data.copy())  
    gene_data1 = pd.concat(th_data)
    
    """create legend elements"""
    legend_elements = []
    hue_order = []
    palette = {}
    markers = []
    linestyles = []
    
    if _data.source.unique().squeeze() == 'GMC':
        cohort_types = ['mut', 'ref_con', 'con']
    else:
        cohort_types = ['mut', 'con']
    
    if _cohort_type: 
        for col in _th_cols: 
            for coh_type in cohort_types:   
                legend_elements.append(Line2D([0], [0], color=_colors[col + '_' + coh_type], marker=_markers[coh_type], linestyle=_linestyles[coh_type], 
                                              ms=_markersize, label='%s%s %s (n=%d)' % (col + ' - ' if len(_th_cols)>1 else '', 
                                                                                        'all controls' if coh_type=='con' else _gene+' reference controls' if coh_type=='ref_con' else _gene+' mutants', 
                                                                                        'mean' if _estimator==np.mean else 'median', gene_data[gene_data.cohort_type==coh_type].mouse_id.nunique())))
                for ho in [col + '_' + coh_type + '_click', col + '_' + coh_type + '_freq']:
                    hue_order.append(ho)
        
        for ho in hue_order:
            if 'mut' in ho: 
                for col in _th_cols:
                    if col in ho:
                        palette[ho] = _colors[col + '_mut'] 
                    markers.append(_markers['mut'])
                    linestyles.append(_linestyles['mut'])
            elif 'ref_con' in ho: 
                for col in _th_cols:
                    if col in ho:
                        palette[ho] = _colors[col + '_ref_con'] 
                    markers.append(_markers['ref_con'])
                    linestyles.append(_linestyles['ref_con'])
            else: 
                for col in _th_cols:
                    if col in ho:
                        palette[ho] = _colors[col + '_con'] 
                    markers.append(_markers['con'])
                    linestyles.append(_linestyles['con'])
    else:
        for col in _th_cols: 
            if _mouse_id is None: 
                if len(_th_cols)>1:
                    legend_lbl = '%s - %s' % (col, 'mean' if _estimator==np.mean else 'median')
                else: 
                    legend_lbl = '%s (n=%d)' % ('mean' if _estimator==np.mean else 'median', gene_data.mouse_id.nunique())
            else: 
                legend_lbl = '%s' % col
            legend_elements.append(Line2D([0], [0], color=_colors[col+'_mut'], marker=_markers[col], 
                                          linestyle=_linestyles['mut'], ms=_markersize, 
                                          label=legend_lbl))
            for ho in [col + '_click', col + '_freq']:
                hue_order.append(ho)
        for ho in hue_order:
            for col in _th_cols:
                if col in ho:
                    palette[ho] = _colors[col+'_mut'] 
                    markers.append(_markers[col])
                    linestyles.append(_linestyles['mut'])
    
    
    if _mouse_id is None:                
        sns.pointplot(x='stimulation', y='th_value', data=gene_data1, 
                      hue='th_type', hue_order=hue_order, dodge=_dodge, legend=False, 
                      estimator=_estimator, ci=_ci, capsize=.1, errwidth=2, label=hue_order,
                      markers=markers, scale=_markerscale, linestyles=linestyles, palette=palette, 
                      ax=ax)
    elif _mouse_id in gene_data1.mouse_id.unique(): 
        sns.pointplot(x='stimulation', y='th_value', data=gene_data1[gene_data1.mouse_id == _mouse_id], 
                      hue='th_type', hue_order=hue_order, dodge=_dodge, legend=False, 
                      estimator=_estimator, ci=_ci, capsize=.1, errwidth=2, label=hue_order,
                      markers=markers, scale=_markerscale, linestyles=linestyles, palette=palette, 
                      ax=ax)
    
    _alpha = 0.1
    
    if _estimator == np.mean:
        for col in _th_cols:
            if _cohort_type:
                bounds = gene_data[gene_data.cohort_type=='con'].groupby(['stimulation', 'frequency'])[col].agg(['mean', 'std']).reset_index().sort_values(by='frequency')
            else: 
                bounds = gene_data.groupby(['stimulation', 'frequency'])[col].agg(['mean', 'std']).reset_index().sort_values(by='frequency')
            
            x = bounds.iloc[:,0]
            collection1 = ax.fill_between(np.arange(-0.15, 0.15, 0.01), 
                                              bounds.iloc[:,2][5] - bounds.iloc[:,3][5],
                                              bounds.iloc[:,2][5] + bounds.iloc[:,3][5], alpha=_alpha,
                                              facecolor=_colors[col])
            collection1.set_zorder(0)
            collection2 = ax.fill_between(x, 
                                              bounds.iloc[:,2] - bounds.iloc[:,3],
                                              bounds.iloc[:,2] + bounds.iloc[:,3], alpha=_alpha, label='std dev con', 
                                              where=[False if x[idx] == 'click' else True for idx in x.index],
                                              facecolor=_colors[col]) 
            collection2.set_zorder(0)
            
            if len(_th_cols)>1:
                legend_lbl="%s - %sstandard deviation" % (col, 'all controls ' if _cohort_type else '')
                if _mouse_id: 
                    legend_lbl = legend_lbl + ' (n=%d)' % (gene_data[gene_data.cohort_type == 'con'].mouse_id.nunique() if _cohort_type else gene_data.mouse_id.nunique())
            else:
                legend_lbl="%sstandard deviation (n=%d)" % ('all controls ' if _cohort_type else '', 
                                                                 gene_data[gene_data.cohort_type == 'con'].mouse_id.nunique() if _cohort_type else gene_data.mouse_id.nunique())
            legend_elements.append(
                Patch(facecolor=_colors[col], alpha=_alpha, edgecolor=_colors[col], 
                      label=legend_lbl))
    elif _estimator == np.median: 
        step = _dodge/(len(ax.get_lines())-1)
        st = -_dodge/2
        steps = []
        while st <= 0.05:
            steps.append(st)
            st += step
        j = 0
        for col in _th_cols:
            
            if _cohort_type: 
                bounds = gene_data[_data.cohort_type=='con'].groupby(['stimulation', 'frequency'])[col].quantile((_fst_quantile,_last_quantile)).unstack().reset_index().sort_values(by='frequency')
            else:
                bounds = gene_data.groupby(['stimulation', 'frequency'])[col].quantile((_fst_quantile,_last_quantile)).unstack().reset_index().sort_values(by='frequency')
            
            x = bounds.iloc[:,0]
            collection1 = ax.fill_between(np.arange(-0.15, 0.15, 0.01), 
                                          bounds.iloc[:,2][bounds.iloc[:,0].index[0]], bounds.iloc[:,3][bounds.iloc[:,0].index[0]], 
                                          alpha=_alpha, facecolor=_colors[col]) 
            collection1.set_zorder(0)
            collection2 = ax.fill_between(bounds.iloc[:,0], 
                                          bounds.iloc[:,2], bounds.iloc[:,3], 
                                          alpha=_alpha, 
                                          where=[False if x[idx] == 'click' else True for idx in x.index],
                                          facecolor=_colors[col]) 
            collection2.set_zorder(0)
            
            if len(_th_cols)>1:
                legend_lbl="%s - %s[5;95] percentile range" % (col, 'all controls ' if _cohort_type else '')
                if _mouse_id: 
                    legend_lbl = legend_lbl + ' (n=%d)' % (gene_data[gene_data.cohort_type == 'con'].mouse_id.nunique() if _cohort_type else gene_data.mouse_id.nunique())
            else:
                legend_lbl="%s[5;95] percentile range (n=%d)" % ('all controls ' if _cohort_type else '', 
                                                                 gene_data[gene_data.cohort_type == 'con'].mouse_id.nunique() if _cohort_type else gene_data.mouse_id.nunique())
            legend_elements.append(
                Patch(facecolor=_colors[col], alpha=_alpha, edgecolor=_colors[col], 
                      label=legend_lbl))
             
            if _cohort_type:   
                for coh_type in cohort_types: 
                    bounds1 = gene_data[gene_data.cohort_type==coh_type].groupby(['stimulation', 'frequency'])[col].quantile((0.25,0.75)).unstack().reset_index().sort_values(by='frequency')
                    i = 0
                    for idx in bounds1.index:
                        x = i+steps[j]
                        if i == 0: 
                            j += 1
                        ymin = bounds1.at[idx,0.25]
                        ymax = bounds1.at[idx,0.75]
                        vline = ax.vlines(x, ymin=ymin, ymax=ymax, color=_colors[col + '_' + coh_type], lw=1)
                        vline.set_zorder(0)
                        hline1 = ax.hlines(ymin, xmin=x-0.05, xmax=x+0.05, color=_colors[col + '_' + coh_type], lw=1)
                        hline2 = ax.hlines(ymax, xmin=x-0.05, xmax=x+0.05, color=_colors[col + '_' + coh_type], lw=1)
                        i += 1
                    j += 1
            else:
                 if _mouse_id is None:
                    bounds1 = gene_data.groupby(['stimulation', 'frequency'])[col].quantile((0.25,0.75)).unstack().reset_index().sort_values(by='frequency')
                    i = 0
                    for idx in bounds1.index:
                        x = i+steps[j]
                        if i == 0: 
                            j += 1
                        ymin = bounds1.at[idx,0.25]
                        ymax = bounds1.at[idx,0.75]
                        vline = ax.vlines(x, ymin=ymin, ymax=ymax, color=_colors[col], lw=1)
                        vline.set_zorder(0)
                        hline1 = ax.hlines(ymin, xmin=x-0.05, xmax=x+0.05, color=_colors[col], lw=1)
                        hline2 = ax.hlines(ymax, xmin=x-0.05, xmax=x+0.05, color=_colors[col], lw=1)
                        i += 1
                    j += 1
    
    plt.setp(ax.lines,linewidth=1)
    if not _figlegend:
        if _legend_outside:
            ax.legend(handles=legend_elements, loc='lower left', bbox_to_anchor= (0.0, 1.01), ncol=2,
                      borderaxespad=0, frameon=False, fontsize=_fontsize)
        else: 
            ax.legend(handles=legend_elements, loc='upper left', frameon=False, fontsize=_fontsize-8)
    else:
        ax.legend(handles=[])
    
    ax.set_ylim(0, 120)   
    ax.set_ylabel(_ylabel, fontsize=_fontsize)
    ax.set_xlabel(_xlabel, fontsize=_fontsize, labelpad=20)
    xticklabels = []
    if ax.get_xticklabels():
        for lb in ax.get_xticklabels():
            txt = lb.get_text()
            if txt == 'click': 
                xticklabels.append(txt)
            else:
                xticklabels.append('%ikHz' % int(float(txt)))
    ax.set_xticklabels(xticklabels)
    ax.set_yticks([ytick for ytick in range(20,120,20)])
    ax.tick_params(axis='both', which='major', labelsize=_fontsize)
        
    return legend_elements

In [None]:
def plot_gene_thresholds2file(_gene, _data, _labels, 
                              _fontsize=40, _xlabel='stimulation', _ylabel='threshold [dB]', 
                              _file_output_only=False):
    """
    Plots hearing curves for a given gene in the data set, taking into account the manually assessed, NN predicted and SLR estimated thresholds. 
    """
    
    data_source = str(_data.source.unique().squeeze())
    
    muts = _data[_data.cohort_type == 'mut']
    
    cols = ['th_manual', 'th_NN_'+data_source+'trained', 'th_SLR_'+data_source+'calibrated']
    titles = {}
    for col in cols:
        titles[col] = '%s\n(%s)' % (_labels[col], col)
    
    text = ['A', 'B', 'C', 'D']
    
    gene = _gene
    dir_names, file_names = get_result_file_names(gene, ['pdf', 'jpg'], data_source)
    
    markersize = 10
    markerscale = 2
    
    with PdfPages(file_names['pdf']+'.pdf') as pdf:
        
        nrows = 2
        ncols = 2

        fig1, axs1 = plt.subplots(nrows, ncols, figsize=(ncols*15,nrows*13), sharey=True, sharex=True, constrained_layout=True) #(60,48)

        nrow = -1
        ncol = 0
        for idx,col in enumerate(cols):
    
            xlabel = _xlabel
            ylabel = _ylabel
    
            ncol = idx%ncols
    
            if ncol == 0:
                nrow += 1
            else:
                ylabel = None
    
            if nrow == 0:
                xlabel = None
    
            plot_thresholds4gene(gene, [col], _data, colors, markers, linestyles, labels, 
                                 _estimator=np.median, _ci=None, _dodge=0.1, 
                                 _fontsize=_fontsize, _markersize=markersize, _markerscale=markerscale, 
                                 _xlabel=xlabel, _ylabel=ylabel, _fig=fig1, _ax=axs1[nrow,ncol])
            
            axs1[nrow, ncol].set_title(titles[col], fontsize=_fontsize+5, y=1.01)
            axs1[nrow, ncol].text(-0.3, 3., text[idx], fontsize=_fontsize+10, fontweight='bold')
            axs1[nrow, ncol].grid(zorder=0, color='lightgray')    
            
            ncol += 1
     
        plot_thresholds4gene(gene, cols, _data, colors, markers, linestyles, labels, 
                             _cohort_type=False, _estimator=np.median, _ci=None, _dodge=0.1,
                             _fontsize=_fontsize, _markersize=markersize, _markerscale=markerscale, 
                             _xlabel=xlabel, _fig=fig1, _ax=axs1[nrow,ncol])
        
        axs1[nrow,ncol].set_title(
            'Method comparison for %s mutants (n=%d)' % (_gene, _data[(_data.gene==gene)&(_data.cohort_type=='mut')].mouse_id.nunique()), 
            fontsize=_fontsize+5, y=1.01)
        axs1[nrow, ncol].text(-0.3, 3., text[-1], fontsize=_fontsize+10, fontweight='bold')
        axs1[nrow, ncol].grid(zorder=0, color='lightgray')

#         fig1.suptitle(gene, y=0.99, fontsize=_fontsize+5, fontweight='bold')
        fig1.tight_layout()    
        
        pdf.savefig(fig1, bbox_inches = 'tight', papertype = 'letter')
        fig1.savefig(file_names['jpg']+'_gene.jpg')
        if _file_output_only:
            plt.close()
        
        ################################################################################################
        
        mice = _data[_data.gene == gene]['mouse_id']
        noofmice = mice.nunique()

        ncols = 4
        if noofmice > ncols:
            if noofmice%2 == 0:
                nrows = int(noofmice/ncols)
            else:
                nrows = int(noofmice/ncols) + 1
        else:
            nrows = 1

        fig2, axs2 = plt.subplots(nrows, ncols, figsize=(ncols*15,nrows*12), sharey=True, sharex=True)
        
        axes = []
        if nrows == 1:
            for ax in axs2:
                axes.append(ax)
            legend_bbox = (.035, 1.24)
        else: 
            for ax1 in axs2: 
                for ax2 in ax1:
                    axes.append(ax2)
            legend_bbox = (.035, 1.06)
         
        for idx,ax in enumerate(axes):
        
            xlabel = _xlabel
            ylabel = _ylabel
        
            if idx%ncols > 0:
                ylabel = None
                    
            if idx < nrows*ncols - ncols:
                xlabel = None 
                
            if idx < noofmice:
                mouseid = mice.unique()[idx]
                    
                legend_elements = plot_thresholds4gene(gene, cols, muts, colors, markers, linestyles, labels, 
                                                       _cohort_type=False, _estimator=np.median, _ci=None, _dodge=0.1, _mouse_id=mouseid,
                                                       _markersize=markersize, _markerscale=markerscale, 
                                                       _fontsize=_fontsize, _figlegend=True, _legend_outside=True, 
                                                       _xlabel=xlabel, _ylabel=ylabel, _fig=fig2, _ax=ax)  
                ax.text(.01, 110, 'mouse_id = %s' % mouseid, fontsize=_fontsize-5)
    
            ax.set_ylabel(ylabel, fontsize=_fontsize-5)
            ax.set_xlabel(xlabel, fontsize=_fontsize-5, labelpad=20)
            ax.tick_params(axis='both', which='major', labelsize=_fontsize-5)
            ax.grid(zorder=0, color='lightgray')    
        
        fig2.legend(handles=legend_elements, loc='lower left', bbox_to_anchor=(.021, 1.01), ncol=2,
                    borderaxespad=0, fancybox=True, fontsize=_fontsize-5)
        fig2.subplots_adjust(top=0.9)
        
        plt.tight_layout()
        pdf.savefig(fig2, bbox_inches = 'tight', papertype = 'letter')
        fig2.savefig(file_names['jpg']+'_mice.jpg', bbox_inches = 'tight', papertype = 'letter')
        if _file_output_only:
            plt.close()

In [None]:
def plot_curves_per_mouse(_mouse_id, _data, _freq, _thr_cols, 
                          _fontsize, _fig, _ax, _colors, _linestyles, _markers, 
                          _legend_elements, _cohort_type='mut'):
    """
    Plots the ABR curves at different sound levels for a given mouse identifier and stimulus frequency.
    
    Parameters
    ----------
        _mouse_id: int (GMC) or string (ING)
            A given mouse identifier contained in _data.
            
        _data: pandas-data-frame
            It contains time series for ABR curves. 
            It must contain columns for frequency ('frequency') and sound level ('sound_level'). 
            
        _freq: int
            A given stimulus frequency.
            
        _thr_cols: list 
            The names of the columns with manually assessed, NN predicted and SLR estimated thresholds.
            
    Returns
    -------
        The current list of legend elements.
    """

    data_cols = ['t' + str(i) for i in range(0, 1000)]
    
    sound_levels = _data['sound_level'].unique()
    df = _data[_data.mouse_id == _mouse_id]

    """Calculating the thresholds to highlight them on the resulting plot"""
    nn_thr = None
    slr_thr = None
    human_thr = None
    
    thr_manual = _thr_cols[0]
    thr_NN = _thr_cols[1]
    thr_SLR = _thr_cols[2]
    
    thr = df[df['frequency'] == _freq][thr_NN].unique()
    if len(thr) > 0:
        nn_thr = thr[0]
    thr = df[df['frequency'] == _freq][thr_SLR].unique()
    if len(thr) > 0:
        slr_thr = thr[0]
    thr = df[df['frequency'] == _freq][thr_manual].unique()
    if len(thr) > 0:
        human_thr = thr[0]
    
    """Plot the ABR curves"""
    data_range = range(0, 1000)    
        
    for sound_level in df.loc[df['frequency'] == _freq, 'sound_level']:
        _ax.plot(data_range, 
                 sound_level + 2.5 * df[(df['sound_level'] == sound_level) & (df['frequency'] == _freq)][data_cols].iloc[0],
                 linewidth=1.5, color='black')

    if human_thr and human_thr != 999:
        _ax.hlines(y=human_thr, 
                   xmin=data_range[0], xmax=data_range[-1], 
                   linewidth=2.5, linestyles=_linestyles[thr_manual], #+'_'+_cohort_type], 
                   color=_colors[thr_manual], zorder=100)
        _ax.scatter(0, human_thr, marker=_markers[thr_manual], s=200, c=_colors[thr_manual])
        line = Line2D([0], [0], color=_colors[thr_manual], marker=_markers[thr_manual], 
                      linestyle=_linestyles[thr_manual], ms=15, label=thr_manual)
        if _legend_elements[thr_manual] is None:
            _legend_elements[thr_manual] = line
    if nn_thr and nn_thr != 999:
        _ax.hlines(y=nn_thr,
                   xmin=data_range[0], xmax=data_range[-1],
                   linewidth=2.5, linestyles=_linestyles[thr_NN], #+'_'+_cohort_type], 
                   color=_colors[thr_NN], zorder=100)
        _ax.scatter(0, nn_thr, marker=_markers[thr_NN], s=200, c=_colors[thr_NN])
        line = Line2D([0], [0], color=_colors[thr_NN], marker=_markers[thr_NN], 
                      linestyle=_linestyles[thr_NN], ms=15, label=thr_NN)
        if _legend_elements[thr_NN] is None:
            _legend_elements[thr_NN] = line                       
    if slr_thr and slr_thr != 999:
        _ax.hlines(y=slr_thr,
                   xmin=data_range[0], xmax=data_range[-1],
                   linewidth=2.5, linestyles=_linestyles[thr_SLR], #+'_'+_cohort_type], 
                   color=_colors[thr_SLR], zorder=100)
        _ax.scatter(0, slr_thr, marker=_markers[thr_SLR], s=200, c=_colors[thr_SLR])
        line = Line2D([0], [0], color=_colors[thr_SLR], marker=_markers[thr_SLR], 
                      linestyle=_linestyles[thr_SLR], ms=15, label=thr_SLR)
        if _legend_elements[thr_SLR] is None:
            _legend_elements[thr_SLR] = line

    return _legend_elements 

In [None]:
def plot_mouse_thresholds(_mouse_id, _gene, _thr_data, _curves, _file_names, 
                          _colors, _linestyles, _markers, 
                          _figsize=(15,12), _fontsize=40, 
                          _xlabel='stimulation', _ylabel='threshold [dB]', _file_output_only=False): 
    """
    Plots the ABR curves at different sound levels for a given mouse identifier and stimulus frequency.
    
    Parameters
    ----------
        _mouse_id: int (GMC) or string (ING)
            A given mouse identifier contained in _thr_data and _curves.
        
        _gene: string
            A given gene name contained in _thr_data.
            
        _thr_data: pandas-data-frame
            It contains mouse phenotyping and thresholding data.
        
        _curves: pandas-data-frame
            It contains time series for ABR curves. 
            It must contain columns for frequency ('frequency') and sound level ('sound_level'). 
            
        _file_names: list
            The file names under which the resulting plots are to be saved (both in pdf and jpg format).
    """
    
    data_source = _thr_data.source.unique().squeeze()
    cols = ['th_manual', 'th_NN_'+data_source+'trained', 'th_SLR_'+data_source+'calibrated']
    
    mouse_id = _mouse_id
    if data_source == 'GMC':
        mouse_id = int(_mouse_id)
    
    markersize = 10
    markerscale = 2
    
    with PdfPages(_file_names['pdf']+'.pdf') as pdf:
        
        fontsize = 40
        
        fig1, ax1 = plt.subplots(figsize=_figsize) #(30,24)
        
        xlabel = _xlabel
        ylabel = _ylabel
        
        legend_elements = plot_thresholds4gene(_gene, cols, _thr_data, colors, markers, linestyles, labels, 
                                               _cohort_type=False, _estimator=np.median, _ci=None, _dodge=0.1, _mouse_id=_mouse_id,
                                               _markersize=markersize, _markerscale=markerscale, 
                                               _fontsize=fontsize, _figlegend=True, _legend_outside=True, 
                                               _xlabel=xlabel, _ylabel=ylabel, _fig=fig1, _ax=ax1)  
        ax1.set_title(
            'mouse_id = %s' % mouse_id, fontsize=_fontsize+5, y=1.01)
#         ax1.text(.01, 110, 'mouse_id = %s' % _mouse_id, fontsize=fontsize)
        ax1.grid(zorder=0, color='lightgray')
        ax1.tick_params(axis='both', which='major', labelsize=_fontsize-5)
        ax1.legend(handles=legend_elements, loc='upper left', frameon=False, fontsize=_fontsize-5, ncol=1)
         
#         fig1.legend(handles=legend_elements, loc='lower left', bbox_to_anchor=(.035, 1.01), ncol=2,
#                     borderaxespad=0, fancybox=True, fontsize=_fontsize-5)
#         fig1.subplots_adjust(top=0.95)

        pdf.savefig(fig1, bbox_inches = 'tight', papertype = 'letter')
        fig1.savefig(_file_names['jpg']+'_thr_median.jpg', bbox_inches = 'tight', papertype = 'letter')
        if _file_output_only:
            plt.close()
        
        ##############################################################################################
        
        nrows = 3
        ncols = 2

        fig2, axs2 = plt.subplots(nrows, ncols, figsize=(ncols*15,nrows*12), sharey=True, sharex=True)
        
        axes = []
        for ax1 in axs2: 
            for ax2 in ax1:
                axes.append(ax2)
           
        legend_elements = {}
        for col in cols:
            legend_elements[col] = None
        for idx,ax in enumerate(axes):
            
            xlabel = 'timesteps [overall 10ms]'
            ylabel = _ylabel
            
            if idx%ncols > 0:
                ylabel = None
                    
            if idx < nrows*ncols - ncols:
                xlabel = None 
            
            freq = _curves.frequency.unique()[idx]
            legend_elements = plot_curves_per_mouse(mouse_id, _curves, freq, 
                                                    cols, fontsize, fig2, ax, _colors, _linestyles, _markers, 
                                                    legend_elements)
            
            ax.set_title('click' if freq == 100 else '%dkHz' % (freq/1000), fontsize=_fontsize+5, pad=15)
            ax.grid(zorder=0, color='lightgray')    
            ax.set_ylim(-10, 110) 
            ax.set_ylabel(ylabel, fontsize=_fontsize)
            ax.set_xlabel(xlabel, fontsize=_fontsize, labelpad=20)
            ax.tick_params(axis='both', which='major', labelsize=_fontsize-5)
        
        for key,val  in list(legend_elements.items()):
            if val is None:
                del legend_elements[key]

        fig2.legend(handles=[legend_elements[key] for key in legend_elements], loc='lower left', 
                    bbox_to_anchor=(.045, 1.0), ncol=3,
                    borderaxespad=0, fancybox=True, fontsize=_fontsize)
        fig2.subplots_adjust(top=0.9)
        
        plt.tight_layout()
        
        pdf.savefig(fig2, bbox_inches = 'tight', papertype = 'letter')
        fig2.savefig(_file_names['jpg']+'_curves.jpg', bbox_inches = 'tight', papertype = 'letter')
        if _file_output_only:
            plt.close()

## GMC data analysis

In [None]:
source = 'GMC'
dataGMC = data[data.source == source]
dataGMC = dataGMC.astype({"stimulation": str})
display(dataGMC.head(2))

curvesGMC = pd.read_csv(os.path.join(path2data, 'GMC', 'GMC_abr_curves.csv'))
display(curvesGMC.head(2))

### GMC data - visualisation results

In [None]:
genesGMC = dataGMC[dataGMC.cohort_type == 'mut'].gene.unique()

print('Number of genes:\t %d' % len(genesGMC))
print('Number of mutants:\t %d' % dataGMC[dataGMC.cohort_type == 'mut'].mouse_id.nunique())
print('Number of controls:\t %d' % dataGMC[dataGMC.cohort_type == 'con'].mouse_id.nunique())

In [None]:
plot_gene_thresholds2file('Nacc1', dataGMC, _labels=labels, _fontsize=30, _file_output_only=False)

In [None]:
'''Plot hearing curves for all GMC genes and save plots to pdf/jpg files'''
start_time = time.time()
print('\nStart time: ', time.strftime("%H:%M:%S", time.gmtime(start_time)))

for idx,gene in enumerate(genesGMC):
    print('%d. %s' % (idx, gene))
    plot_gene_thresholds2file(gene, dataGMC, _labels=labels, _fontsize=30, _file_output_only=True)
    
elapsed_time = time.time() - start_time            
print('\nElapsed time: %s' % time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))

In [None]:
gene = 'Nacc1'
_gene_data = dataGMC[dataGMC.gene == gene].reset_index(drop=True)[['mouse_id', 'th_manual', 'th_NN_GMCtrained', 'th_SLR_GMCcalibrated', 'frequency']]
_gene_data = _gene_data.astype({'mouse_id': 'int64'})
_mice = _gene_data.mouse_id.unique()
_df = pd.merge(left=curvesGMC[curvesGMC.mouse_id.isin(_mice)].reset_index(drop=True), 
               right=_gene_data, how='left', on=['mouse_id', 'frequency'])
    
dir_names,file_names = get_result_file_names(gene, ['pdf', 'jpg'], 'GMC')
    
print('%d. %s (n=%d)' % (1,gene,_gene_data.mouse_id.nunique()))

mouseid = '30414506'
for dn in dir_names:
    file_names[dn] = os.path.join(dir_names[dn], mouseid)
    plot_mouse_thresholds(mouseid, gene, dataGMC, _df, file_names, colors, linestyles, markers, 
                          _file_output_only=False, _figsize=(25,18), _fontsize=30)

In [None]:
'''Plot hearing curves for all GMC mutant mice and save plots to pdf/jpg files'''
start_time = time.time()
print('\nStart time: ', time.strftime("%H:%M:%S", time.gmtime(start_time)))

for idx,gene in enumerate(genesGMC):
    
    _gene_data = dataGMC[dataGMC.gene == gene].reset_index(drop=True)[['mouse_id', 'th_manual', 'th_NN_GMCtrained', 'th_SLR_GMCcalibrated', 'frequency']]
    _gene_data = _gene_data.astype({'mouse_id': 'int64'})
    _mice = _gene_data.mouse_id.unique()
    _df = pd.merge(left=curvesGMC[curvesGMC.mouse_id.isin(_mice)].reset_index(drop=True), 
                   right=_gene_data, how='left', on=['mouse_id', 'frequency'])
    
    dir_names,file_names = get_result_file_names(gene, ['pdf', 'jpg'], 'GMC')
    
    print('%d. %s (n=%d)' % (idx,gene,_gene_data.mouse_id.nunique()))
    
    for mouseid in _mice:
        mouseid = str(mouseid)
        for dn in dir_names:
            file_names[dn] = os.path.join(dir_names[dn], mouseid)
        plot_mouse_thresholds(mouseid, gene, dataGMC, _df, file_names, colors, linestyles, markers, 
                              _file_output_only=True, _figsize=(25,18), _fontsize=30)

elapsed_time = time.time() - start_time 
print('\nElapsed time: %s' % time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))

## ING data analysis

In [None]:
source = 'ING'
dataING = data[data.source == source]
dataING = dataING.astype({"stimulation": str})
dataING = dataING.reset_index(drop=True)
display(dataING.head(2))

curvesING = pd.read_csv(os.path.join(path2data, 'ING', 'ING_abr_curves.csv'))
display(curvesING.head(2))

### ING data - visualisation results

In [None]:
genesING = dataING[dataING.cohort_type == 'mut'].gene.unique()

print('Number of genes:\t %d' % len(genesING))
print('Number of mutants:\t %d' % dataING[dataING.cohort_type == 'mut'].mouse_id.nunique())
print('Number of controls:\t %d' % dataING[dataING.cohort_type == 'con'].mouse_id.nunique())

In [None]:
dataING1 = pd.merge(left=dataING, right=ING_mouse_data[['mouse_id', 'cohort_type', 'Pipeline']].rename(columns={'Pipeline': 'pipeline'}), 
                    how='left', on=['mouse_id', 'cohort_type'])
display(dataING1.head(2))
print('Number of genes:\t %d' % len(genesING))
print('Number of mutants:\t %d' % dataING1[dataING1.cohort_type == 'mut'].mouse_id.nunique())
print('Number of controls:\t %d' % dataING1[dataING1.cohort_type == 'con'].mouse_id.nunique())

In [None]:
valid_genes = []
for idx,gene in enumerate(genesING):
    if gene == gene and ' ' not in gene:
#         print('%d. %s' % (idx, gene))
        valid_genes.append(gene)

In [None]:
dataING1[(dataING1.cohort_type == 'mut')&(dataING1.gene.isin(valid_genes))].mouse_id.nunique()

In [None]:
gene = 'Gm12253'
plot_gene_thresholds2file(gene, dataING1, _labels=labels, _fontsize=30, _file_output_only=False)

In [None]:
'''Plot hearing curves for all ING genes and save plots to pdf/jpg files'''
for idx,gene in enumerate(genesING):
    if idx >= 500:
        if gene == gene and (' ' not in gene) and (dataING[dataING.gene == gene].stimulation.nunique() == 6):
            print('%d. %s\t(stimulations: %s)' % (idx, gene, dataING[dataING.gene == gene].stimulation.unique()))
            plot_gene_thresholds2file(gene, dataING1, _labels=labels, _fontsize=30, _file_output_only=True)

In [None]:
gene = 'Gm12253'
_gene_data = dataING1[dataING1.gene == gene].reset_index(drop=True)[['mouse_id', 'th_manual', 'th_NN_INGtrained', 'th_SLR_INGcalibrated', 'frequency']]
_mice = _gene_data.mouse_id.unique()
_df = pd.merge(left=curvesING[curvesING.mouse_id.isin(_mice)].reset_index(drop=True), 
               right=_gene_data, how='left', on=['mouse_id', 'frequency'])
    
dir_names,file_names = get_result_file_names(gene, ['pdf', 'jpg'], 'ING')
    
print('%d. %s (n=%d)' % (1,gene,_gene_data.mouse_id.nunique()))

mouseid = 'M02271106 ABR'
for dn in dir_names:
    file_names[dn] = os.path.join(dir_names[dn], mouseid)
    plot_mouse_thresholds(mouseid, gene, dataING1, _df, file_names, colors, linestyles, markers, 
                          _file_output_only=False, _figsize=(25,18), _fontsize=30)

In [None]:
'''Plot hearing curves for all ING mutant mice and save plots to pdf/jpg files'''
start_time = time.time()
print('\nStart time: ', time.strftime("%H:%M:%S", time.gmtime(start_time)))

for idx,gene in enumerate(genesING):
    if idx < 500:
        if gene == gene and (' ' not in gene) and (dataING1[dataING1.gene == gene].stimulation.nunique() == 6):
            _gene_data = dataING1[dataING1.gene == gene].reset_index(drop=True)[['mouse_id', 'th_manual', 'th_NN_INGtrained', 'th_SLR_INGcalibrated', 'frequency']]
            _mice = _gene_data.mouse_id.unique()
            _df = pd.merge(left=curvesING[curvesING.mouse_id.isin(_mice)].reset_index(drop=True), 
                           right=_gene_data, how='left', on=['mouse_id', 'frequency'])
    
            dir_names,file_names = get_result_file_names(gene, ['pdf', 'jpg'], 'ING')
    
            print('%d. %s (n=%d)' % (idx,gene,_gene_data.mouse_id.nunique()))
    
            for mouseid in _mice:
                for dn in dir_names:
                    file_names[dn] = os.path.join(dir_names[dn], mouseid.replace(' ', '_'))
                plot_mouse_thresholds(mouseid, gene, dataING1, _df, file_names, colors, linestyles, markers, 
                                      _file_output_only=True, _figsize=(25,18), _fontsize=30)
                
elapsed_time = time.time() - start_time 
print('\nElapsed time: %s' % time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))

## Mann-Whitney-U-Test

 Detect  mutant mouse lines that exhibit potential biologically meaningful changes in hearing using **Cliff's Delta** and</br>
 the p-values resulting from a **Wilcoxon rank sum test**, defined as the probability of getting a test statistics as large or larger</br>
 assuming mutant and control distributions are the same.

In [None]:
import pingouin as pg

from scipy.stats import mannwhitneyu

In [None]:
def mannwhitneyu_test(_data, _data_source, _mouse_sex=None, _gene=None, _output=False, _min_mice=3):
    
    """
    Performs the Mann-Whitney-U-Test either for gene-related data (with at least 3 mutants) or for all animals in the data set.
    All thresholding types are considered (manually assessed, NN predicted and SLR estimated).
    
    The effect size is also calculated by Cliff's delta.
    """
    
    th_columns=['th_manual', 'th_NN_'+_data_source+'trained', 'th_SLR_'+_data_source+'calibrated']
    print(th_columns)
    
    if _data_source == 'GMC':
        _data = _data.astype({'mouse_id': 'int64'})  
    
    for col in th_columns:
        _data[col] = [100 if _data.at[idx, col] == 999 else _data.at[idx, col] for idx in _data.index]      
    
    if _data_source == 'GMC' and _mouse_sex is not None: 
        _data = _data[_data.sex == _mouse_sex].reset_index(drop=True)

    if _gene is None:
        genes = _data[(_data.cohort_type == 'mut') & (_data.gene != 'wildtype')].gene.unique()
    else:
        genes = [_gene]
     
    stimul_df = {}
    for stimul in _data.stimulation.unique():
        
        key = str(int(float(stimul)))+'kHz' if stimul != 'click' else stimul
        
        stimul_df[key] = pd.DataFrame()
        
        stimul_data = _data[_data.stimulation == stimul].reset_index(drop=True)
        
        for idx,gene in enumerate(genes):
            
            stimul_df[key].at[idx, 'gene'] = gene
            
            muts = stimul_data[(stimul_data.gene == gene) 
                               & (stimul_data.cohort_type == 'mut')].reset_index(drop=True)
            if _data_source == 'GMC':
                cons = stimul_data[(stimul_data.cohort_id.isin(muts.reference_cohort.unique())) 
                                   & (stimul_data.cohort_type == 'con')].reset_index(drop=True)
            else: 
                cons = stimul_data[(stimul_data.pipeline.isin(muts.pipeline.unique())) 
                                   & (stimul_data.cohort_type == 'con')].reset_index(drop=True) 
            
            stimul_df[key].at[idx, 'mut_mice'] = int(muts.mouse_id.nunique())
            stimul_df[key].at[idx, 'con_mice'] = int(cons.mouse_id.nunique())
            
            if muts.mouse_id.nunique() >= _min_mice and cons.mouse_id.nunique() >= _min_mice:
                for th_col in th_columns:
                    
                    """ === use the pingouin statistical package === """
                    
#                    _res = pg.mwu(muts[th_col].values, cons[th_col].values)
#                     print('\n%s: %i. %s, %s\n %s\n%s\n' % (key, idx, gene, th_col, 
#                                                            res, mannwhitneyu(muts[th_col].values, cons[th_col].values)))
                    
#                    stimul_df[key].at[idx, key+'_'+th_col+'_pval'] = res['p-val'][0]
#                    stimul_df[key].at[idx, key+'_'+th_col+'_CLES'] = _res['CLES'][0]

                    """ === use scipy.stats === """   

#                     print(muts[th_col].values, cons[th_col].values)
                    res = mannwhitneyu(muts[th_col].values, cons[th_col].values)
#                     print('\n%s: %i. %s, %s\n%s\n%s\n' % (key, idx, gene, th_col, 
#                                                            res, pg.mwu(muts[th_col].values, cons[th_col].values)))
        
                    """
                    effect size: Cliff's delta (d)
                    interpretation: small, >= 0.11; medium, >= 0.28; large, >= 0.43
                    """
                    Cliffs_d = 2*res.statistic/(muts.mouse_id.nunique()*cons.mouse_id.nunique()) - 1
#                     print('Cliff\'s delta effect size: %f' % Cliffs_d)
        
                    VDA = (Cliffs_d + 1)/2
#                     print('Vargha-Delaney A: %f' % VDA)
                    
                    stimul_df[key].at[idx, key+'_'+th_col+'_pval'] = res.pvalue
                    stimul_df[key].at[idx, key+'_'+th_col+'_d'] = Cliffs_d
                    stimul_df[key].at[idx, key+'_'+th_col+'_VDA'] = VDA
    
    return stimul_df

In [None]:
def volcano_plot(_data, _test_results, _log=False, _fontsize=30, _ylim=(-0.2, 5.2),
                 _cols=['th_manual', 'th_NN_GMCtrained', 'th_SLR_GMCcalibrated'], 
                 _coltitles = ['Manual', 'NN GMC-GMC', 'SLR GMC-GMC'],
                 _stimul=None):
    
    """
    Creates volcano plots based on the Mann-Whitney-U-Test results.
    """
    
    fontsize = _fontsize
    
    small = 0.147
    medium = 0.33
    large = 0.474
    
    colors = {'small down': '#92C5DE',
              'small up': '#F4A582', 
              'negligible': '#BBBBBB',
              'medium down': '#4393C3',
              'medium up': '#D6604D', 
              'large down': '#2166AC',
              'large up': '#B2182B'}
    
    x_ticks = [-1., -large, -medium, -small, small, medium, large, 1., 1.5]
    y_ticks = range(6)

    if _stimul is None:
        stimul = [st if st == 'click' else str(int(float(st)))+'kHz' for st in _data.stimulation.unique()] 
        noofcols = 3
        noofrows = 6
    else: 
        stimul = _stimul
        noofcols = 3
        noofrows = len(_stimul)
    
    if noofrows == 1:
        figsize = (15*noofcols, 14)
    else:
        figsize = (15*noofcols, 13*noofrows)
        
    cols = []
    for st in stimul:
        for col in _cols:
            cols.append(st+'_'+col)
    
    fig, axs = plt.subplots(noofrows, noofcols, sharey=True, sharex=True, figsize=figsize)
    
    legend_elements = [] 
    for label in colors: #['small down', 'small up', 'negligible', 'medium down', 'medium up', 'large down', 'large up']:
        legend_elements.append(Line2D([0], [0], marker='o', color='w', label=label, 
                                      markerfacecolor=colors[label], markersize=20))
    
    axs1 = []
    if noofrows == 1:
        for ax in axs:
            axs1.append(ax)
    else:
        for ax1 in axs:
            for ax in ax1:
                axs1.append(ax) 
    
    # index for iterating the axes
    i = 0
    # index for iterating the columns
    j = 0
    for ax in axs1:
        
        x = _test_results[cols[i]+'_d']
        xlabel = 'Cliff\'s delta'
        c = [colors['large down'] if _x<=-large else 
             colors['medium down'] if _x<=-medium else
             colors['small down'] if _x<=-small else 
             colors['large up'] if _x>=large else 
             colors['medium up'] if _x>=medium else 
             colors['small up'] if _x>=small else 
             colors['negligible'] for _x in x]
        
        ax.axvline(x=-large, ls='--', lw=2, color=colors['large down'])
        ax.axvline(x=-medium, ls='--', lw=1.5, color=colors['medium down'])
        ax.axvline(x=-small, ls='--', lw=1.5, color=colors['small down'])
        ax.axvline(x=small, ls='--', lw=1.5, color=colors['small up'])
        ax.axvline(x=medium, ls='--', lw=1.5, color=colors['medium up'])
        ax.axvline(x=large, ls='--', lw=2, color=colors['large up'])
            
        y = [-np.log10(pval) for pval in _test_results[cols[i]+'_pval']]
        
        scatter = ax.scatter(x, y, s=240, c=c)
        if i%noofcols == 0:
            ax.set_ylabel('-log10(p-val)', fontsize=fontsize)
        if i >= (noofrows-1)*noofcols:
            ax.set_xlabel(xlabel, fontsize=fontsize)
        
        ax.set_xlim(-1.2, 1.7)
        ax.set_ylim(_ylim)
        
        if i > 0 and i%noofcols == 0: 
            j+=1
        
        if i < noofcols: 
            ax.set_title('%s\n%s' % (_coltitles[i], stimul[j]), fontsize=fontsize+5, fontweight='bold', pad=15)#y=1.01)
        else: 
            ax.set_title(stimul[j], fontsize=fontsize+5, fontweight='bold', pad=15)#y=1.01)
        
        ax.tick_params(axis='both', which='major', labelsize=fontsize-5)
        ax.set_xticks(x_ticks)
        ax.set_xticklabels(x_ticks, rotation=40)
        
        ax.axhline(y=-np.log10(0.05), ls='--', lw=2, color='#555555')
        ax.text(x=1.10, y=-np.log10(0.05)+0.08, s='p-val=0.05', fontsize=fontsize-5)
        ax.grid(zorder=0, color='lightgray') 
        
        i+=1

    leg = fig.legend(handles=legend_elements, loc='lower left', bbox_to_anchor= (0.02, 1.0), ncol=3, 
                     borderaxespad=0, frameon=True, fontsize=fontsize, title='Effect', title_fontsize=fontsize+10)
    leg._legend_box.align = "left"
    plt.tight_layout()
    
    return fig

### Mann-Withney-U-Test - GMC data

#### Overall

In [None]:
"""Compute the test"""
resultsGMC = mannwhitneyu_test(dataGMC, 'GMC')

In [None]:
"""Create a pandas-data-frame from the test results map"""
for key in resultsGMC:
    
    if key == 'click':
        dfGMC = resultsGMC[key]
    else: 
        dfGMC = pd.merge(left=dfGMC, right=resultsGMC[key], how='left', on=['gene', 'mut_mice', 'con_mice'])

dfGMC = dfGMC.astype({'mut_mice': 'int64', 'con_mice': 'int64'})
display(dfGMC.head(3))

In [None]:
dfGMC.to_csv(os.path.join(path2results, 'volcano_plots', 'GMC_mannwhitneyu_results.csv'), index=False)

In [None]:
"""Volcano plots"""

sns.set_style("white")

fig = volcano_plot(dataGMC, dfGMC)
fig.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot.png'), bbox_inches = 'tight', papertype = 'letter')

fig_click = volcano_plot(dataGMC, dfGMC, _stimul=['click'])
fig_click.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_click.png'), bbox_inches = 'tight', papertype = 'letter')

fig_30 = volcano_plot(dataGMC, dfGMC, _stimul=['30kHz'])
fig_30.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_30kHz.png'), bbox_inches = 'tight', papertype = 'letter')

fig_click_30 = volcano_plot(dataGMC, dfGMC, _stimul=['click','30kHz'])
fig_click_30.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_click_30kHz.png'), bbox_inches = 'tight', papertype = 'letter')

#### Females

In [None]:
"""Test computation for females"""
results_f = mannwhitneyu_test(dataGMC, _data_source='GMC', _mouse_sex='f')

In [None]:
"""Create a pandas-data-frame from the test results map"""
for key in results_f:
    
    if key == 'click':
        df_f = results_f[key]
    else: 
        df_f = pd.merge(left=df_f, right=results_f[key], how='left', on=['gene', 'mut_mice', 'con_mice'])

df_f = df_f.astype({'mut_mice': 'int64', 'con_mice': 'int64'})
display(df_f.head(5))

In [None]:
df_f.to_csv(os.path.join(path2results, 'volcano_plots', '_GMC_mannwhitneyu_results_f.csv'), index=False)

In [None]:
"""Volcano plots"""

sns.set_style("white")

fig_f = volcano_plot(dataGMC, df_f)
fig_f.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_f.png'), bbox_inches = 'tight', papertype = 'letter')

fig_f_click = volcano_plot(dataGMC, df_f, _stimul=['click'])
fig_f_click.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_f_click.png'), bbox_inches = 'tight', papertype = 'letter')

fig_f_30 = volcano_plot(dataGMC, df_f, _stimul=['30kHz'])
fig_f_30.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_f_30kHz.png'), bbox_inches = 'tight', papertype = 'letter')

fig_f_click_30 = volcano_plot(dataGMC, df_f, _stimul=['click','30kHz'])
fig_f_click_30.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_f_click_30kHz.png'), bbox_inches = 'tight', papertype = 'letter')

#### Males

In [None]:
"""Test computation for males"""
results_m = mannwhitneyu_test(dataGMC, _data_source='GMC', _mouse_sex='m')

In [None]:
"""Create a pandas-data-frame from the test results map"""
for key in results_m:
    
    if key == 'click':
        df_m = results_m[key]
    else: 
        df_m = pd.merge(left=df_m, right=results_m[key], how='left', on=['gene', 'mut_mice', 'con_mice'])

df_m = df_m.astype({'mut_mice': 'int64', 'con_mice': 'int64'})
display(df_m.head(5))

In [None]:
df_m.to_csv(os.path.join(path2results, 'volcano_plots', '_GMC_mannwhitneyu_results_m.csv'), index=False)

In [None]:
"""Volcano plots"""

sns.set_style("white")

fig_m = volcano_plot(dataGMC, df_m)
fig_m.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_m.png'), bbox_inches = 'tight', papertype = 'letter')

fig_m_click = volcano_plot(dataGMC, df_m, _stimul=['click'])
fig_m_click.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_m_click.png'), bbox_inches = 'tight', papertype = 'letter')

fig_m_30 = volcano_plot(dataGMC, df_m, _stimul=['30kHz'])
fig_m_30.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_m_30kHz.png'), bbox_inches = 'tight', papertype = 'letter')

fig_m_click_30 = volcano_plot(dataGMC, df_m, _stimul=['click','30kHz'])
fig_m_click_30.savefig(os.path.join(path2results, 'volcano_plots', 'GMC_volcano_plot_m_click_30kHz.png'), bbox_inches = 'tight', papertype = 'letter')

### Mann-Whitney-U-Test - ING data

#### Prepare data

In [None]:
valid_mice = []
for mouse_id in dataING.mouse_id.unique():
    
    _df = dataING1.loc[dataING1.mouse_id == mouse_id]
    
    gene = _df.gene.unique().squeeze()
    cohort_type = _df.cohort_type.unique().squeeze()
    no_of_stimulations = _df.stimulation.nunique()
    
#     print(mouse_id, gene, cohort_type, no_of_stimulations)
    
    if no_of_stimulations == 6:
        if cohort_type == 'con':
            valid_mice.append(mouse_id)
        elif cohort_type == 'mut':
            if gene == gene and (' ' not in gene):
                valid_mice.append(mouse_id)

dataING2 = dataING1[dataING1.mouse_id.isin(valid_mice)].reset_index(drop=True)
dataING2.head()

In [None]:
print('Number of genes:\t %d' % dataING2.gene.nunique())
print('Number of mutants:\t %d' % dataING2[dataING2.cohort_type == 'mut'].mouse_id.nunique())
print('Number of controls:\t %d' % dataING2[dataING2.cohort_type == 'con'].mouse_id.nunique())

#### Overall

In [None]:
"""Test computation for ING data"""
resultsING = mannwhitneyu_test(dataING2, 'ING')

In [None]:
"""Create a pandas-data-frame from the test results map"""
for key in resultsING:
    
    if key == 'click':
        dfING = resultsING[key]
    else: 
        dfING = pd.merge(left=dfING, right=resultsING[key], how='left', on=['gene', 'mut_mice', 'con_mice'])

dfING.head()

In [None]:
dfING.to_csv(os.path.join(path2results, 'volcano_plots', 'ING_mannwhitneyu_results.csv'), index=False)

In [None]:
"""Volcano plots"""

sns.set_style("white")

ylim = (-0.2, 12.2)

fig = volcano_plot(dataING, dfING, _ylim=ylim,
                   _cols=['th_manual', 'th_NN_INGtrained', 'th_SLR_INGcalibrated'], 
                   _coltitles = ['Manual', 'NN ING-ING', 'SLR ING-ING'])
# fig.savefig(os.path.join(path2results, 'volcano_plots', 'ING_volcano_plot.png'), bbox_inches = 'tight', papertype = 'letter')

fig_click = volcano_plot(dataING, dfING, _ylim=ylim, 
                         _cols=['th_manual', 'th_NN_INGtrained', 'th_SLR_INGcalibrated'], 
                         _coltitles = ['Manual', 'NN ING-ING', 'SLR ING-ING'], 
                         _stimul=['click'])
# fig_click.savefig(os.path.join(path2results, 'volcano_plots', 'ING_volcano_plot_click.png'), bbox_inches = 'tight', papertype = 'letter')

fig_30 = volcano_plot(dataING, dfING, _ylim=ylim, 
                      _cols=['th_manual', 'th_NN_INGtrained', 'th_SLR_INGcalibrated'], 
                      _coltitles = ['Manual', 'NN ING-ING', 'SLR ING-ING'], 
                      _stimul=['30kHz'])
# fig_30.savefig(os.path.join(path2results, 'volcano_plots', 'ING_volcano_plot_30kHz.png'), bbox_inches = 'tight', papertype = 'letter')

fig_click_30 = volcano_plot(dataING, dfING, _ylim=ylim, 
                            _cols=['th_manual', 'th_NN_INGtrained', 'th_SLR_INGcalibrated'], 
                            _coltitles = ['Manual', 'NN ING-ING', 'SLR ING-ING'], 
                            _stimul=['click','30kHz'])
# fig_click_30.savefig(os.path.join(path2results, 'volcano_plots', 'ING_volcano_plot_click_30kHz.png'), bbox_inches = 'tight', papertype = 'letter')