# Table of Contents
 <p><div class="lev2 toc-item"><a href="#The-RAS-pathway" data-toc-modified-id="The-RAS-pathway-0.1"><span class="toc-item-num">0.1&nbsp;&nbsp;</span>The RAS pathway</a></div><div class="lev2 toc-item"><a href="#Ras-GF" data-toc-modified-id="Ras-GF-0.2"><span class="toc-item-num">0.2&nbsp;&nbsp;</span>Ras GF</a></div><div class="lev2 toc-item"><a href="#The-WNT-Pathway" data-toc-modified-id="The-WNT-Pathway-0.3"><span class="toc-item-num">0.3&nbsp;&nbsp;</span>The WNT Pathway</a></div>

In this notebook, I will compare the *mdt-12* transcriptomes with the hypoxia, Dpy, Ras and Wnt transcriptional signatures. These signatures in theory ought to serve as powerful predictors of interaction  (for pathways) or as predictors of observable phenotypes (such as Dpy) if they are present in the mutant data. 

In [1]:
import pandas as pd
import numpy as np
import scipy as scipy
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rc
import tissue_enrichment_analysis as tea
import pretty_table as pretty

from scipy import stats

import os
rc('text', usetex=True)
rc('text.latex', preamble=r'\usepackage{cmbright}')
rc('font', **{'family': 'sans-serif', 'sans-serif': ['Helvetica']})

%matplotlib inline

# This enables SVG graphics inline. 
%config InlineBackend.figure_formats = {'png', 'retina'}

# JB's favorite Seaborn settings for notebooks
rc = {'lines.linewidth': 2, 
      'axes.labelsize': 18, 
      'axes.titlesize': 18, 
      'axes.facecolor': 'DFDFE5'}
sns.set_context('notebook', rc=rc)
sns.set_style("dark")

mpl.rcParams['xtick.labelsize'] = 16 
mpl.rcParams['ytick.labelsize'] = 16 
mpl.rcParams['legend.fontsize'] = 14

In [2]:
q = 0.1
genmap = pd.read_csv('../sleuth/rna_seq_info.txt', sep=' ', comment='#')
tidy  = pd.read_csv('../output/SI1_dpy_22_results.csv')
mediator = pd.read_csv('../input/complexes.csv')

In [3]:
tissue = tea.fetch_dictionary('tissue')
phenotype = tea.fetch_dictionary('phenotype')
go = tea.fetch_dictionary('go')
dicts = {'tissue': tissue, 'phenotype': phenotype, 'go': go}

In [4]:
# filter dictionaries and keep only transcripts that were detected
#at any level
for key, d in dicts.items():
    d = d[d.wbid.isin(tidy.ens_gene.unique())]
    dicts[key] = d

In [5]:
# perform all enrichment analysis and store them in a hash
analysis = {}
for phenoclass, group in tidy.groupby('phenotypic class'):
    frames = {}
    for k, d in dicts.items():
        df = tea.enrichment_analysis(group.ens_gene.unique(), d, show=False)
        frames[k] = df
    analysis[phenoclass] = frames

In [6]:
# pretty print the results:
for phenoclass, f in analysis.items():
    for k, d in f.items():
        # print only sig results (q < 10^-3)
        d['logQ'] = -d['Q value'].apply(np.log10)
        sig = (d['Q value'] < 10**-3)
        if d[sig].shape[0] == 0:
            continue

        # trim names for easier printing
        if k.lower() == 'tissue':
            d['minTerm'] = d.Term.str[:-13]
        if k.lower() == 'phenotype':
            d['minTerm'] = d.Term.str[:-20]
        if k.lower() == 'go':
            d['minTerm'] = d.Term.str[:-10]

        # subset dataframe to sig terms and make sure there's >2 observations
        # per term
        tmp = d[sig & (d.Observed > 2)]

        if tmp.shape[0] == 0:
            continue

        print(phenoclass, k)        
        print(tmp[['minTerm', 'logQ', 'Observed']].round(0))
        print('\n\n')

bx93 associated tissue
      minTerm  logQ  Observed
77  intestine   5.0       134



bx93 associated go
                            minTerm  logQ  Observed
59           immune system process    6.0        17
58  organic acid metabolic process    4.0        15
3      response to biotic stimulus    4.0        10



sy622 associated tissue
                 minTerm  logQ  Observed
33  cephalic sheath cell   5.0        31



sy622 associated go
                                 minTerm  logQ  Observed
71       organic acid metabolic process    6.0        37
72                immune system process    6.0        33
70  protein heterodimerization activity    4.0        12



sy622 specific tissue
               minTerm  logQ  Observed
246          intestine   8.0       649
64     muscular system   4.0       465
186  epithelial system   3.0       406



sy622 specific phenotype
                  minTerm  logQ  Observed
27  avoids bacterial lawn   3.0        62



sy622 specific go
             