# Introduction

We're trying to understand how the spikes (and genes) behave on our single cell datasets

# Index

* Spike Scatter plots
* <a href="#Hs_purkinje-spike-scatter">Hs_purkinje spike scatter</a>
* <a href="#Hs_asp_purkinje_UMB5294-spike-scatter">Hs_asp_purkinje_UMB5294 spike scatter</a>
* <a href="#Mm_purkinje-spike-scatter">Mm_purkinje spike scatter</a>
* <a href="#Mm_pyramidal-spike-scatter">Mm_pyramidal spike scatter</a>

* <a href="#Spike-Correlation-Histograms">Spike Correlation Histograms</a>
  * <a href="#Rafa-Spearman-ERCC-Spike-Correlation-Histograms">Rafa Spearman Spike Correlation Histograms</a>
    * <a href="#Hs_purkinje-ERCC-spike-Rafa-Spearman-histogram">Hs_purkinje ERCC spike Rafa Spearman histogram</a>
    * <a href="#Mm_purkinje-ERCC-spike-Rafa-Spearman-histogram">Mm_purkinje ERCC spike Rafa Spearman histogram</a>
    * <a href="#Mm_layer_V_pyramidal-ERCC-spike-Rafa-Spearman-histogram">Mm_layer_V_pyramidal ERCC spike Rafa Spearman histogram</a>
  * <a href="#Naive-Spearman-ERCC-Spike-Correlation-Histograms">Naive Spearman ERCC Spike Correlation Histograms</a>
    * <a href="#Hs_purkinje-ERCC-spike-Rafa-Spearman-histogram">Hs_purkinje ERCC spike Rafa Spearman histogram</a>
    * <a href="#Mm_purkinje-ERCC-spike-Rafa-Spearman-histogram">Mm_purkinje ERCC spike Rafa Spearman histogram</a>
    * <a href="#Mm_layer_V_pyramidal-ERCC-spike-Rafa-Spearman-histogram">Mm_layer_V_pyramidal ERCC spike Rafa Spearman * <a href="#All-genes-correlation-histograms">All genes correlation histograms</a>
  * <a href="#Rafa-Speraman-Correlation-Histograms">Rafa Speraman Correlation Histograms</a>
  * <a href="#Naive-Spearman-Correlation-Histograms">Naive Spearman Correlation Histograms</a>
* <a href="#Genes-Detected">Genes Detected</a>
* <a href="#Hs_purkinje_poolsplit-Covariance">Hs_purkine_poolsplit Covariance</a>
* <a href="#Hs_purkinje_poolsplit-shuffled-Covariance">Hs_purkine_poolsplit Shuffled Covariance</a>
* <a href="#Hs_purkinje_poolsplit-shuffled-Genes-Detected">Hs_purkine_poolsplit Shuffled Genes Detected</a>

In [1]:
import pandas
import numpy
import os
import sys
import collections

import bokeh
from bokeh import mpl
from bokeh.plotting import figure, show, ColumnDataSource, output_file
from bokeh.io import output_notebook
import bokeh.io
import bokeh.resources
import bokeh.charts
from bokeh.models import HoverTool
from bokeh.palettes import *

In [2]:
output_notebook()

In [3]:
LONG_RNA_SEQ = os.path.expanduser('~diane/proj/long-rna-seq-condor')
if LONG_RNA_SEQ not in sys.path:
    sys.path.append(LONG_RNA_SEQ)
from models import get_single_spike_cpc
from madqc import compute_all_vs_all_scores

In [4]:
def create_figure(xname, yname, extra_title='', **kwargs):
    hover = HoverTool(
        tooltips = [
            (xname, '@'+xname),
            (yname, '@'+yname),
            ('gene_id', '@gene_id'),
            ('library_id', '@library_id'),
            ('name', '@experiment_name'),
            # description is too long for my tooltip box
            #('description', '@description'),
            #('organism', '@organism'),
            #('biosample', '@biosample'),
            #('source', '@biosample_lab'),
            #('starting', '@starting'),
            #('age', '@age'),
            #('lab', '@lab'),
            #('rfa', '@rfa'),
        ]
    )    

    p = figure(
        title = "{} vs {} {}".format(xname, yname, extra_title),
        tools=['box_zoom', 'wheel_zoom', 'pan', hover, 'save', 'reset'],
        **kwargs
    )
    p.xaxis.axis_label = xname
    p.yaxis.axis_label = yname
    
    return p

def setdefault_style(**kwargs):
    extra = kwargs.copy()
    extra.setdefault('fill_alpha', 0.4)
    extra.setdefault('size', 7)
    extra.setdefault('line_color', 'black')
    extra.setdefault('line_alpha', 0.4)
    return extra



In [5]:
rsems = pandas.HDFStore('rsem-genes.h5', 'r')

In [6]:
library_to_name = {}
experiment_names = {}
for i, row in rsems['metadata'].iterrows():
    library_to_name[str(row.library_id)] = row.experiment_name
    experiment_names.setdefault(row.experiment_name, []).append(row.library_id)

In [7]:
for k in sorted(experiment_names.keys()):
    print(k, ','.join(experiment_names[k]))

Hs_asp_purkinje_UMB5294_poolsplit 13843,13844,13845,13846,13847,13848,13849,13850,13851,13852,13853,13854,13855,13856,13857,13858,13859,13860
Hs_asp_purkinje_UMB5294_single 13824,13825,13826,13827,13828,13829,13830,13831,13832,13833,13834,13835,13836,13837,13838,13839,13840,13841,13842
Hs_purkinje_poolsplit 13645,13646,13647,13648,13649,13650,13651,13652,13653,13654,13655,13656,13657,13658,13659,13660,13661,13662,13663,13664
Hs_purkinje_single 13625,13626,13627,13628,13629,13630,13631,13632,13633,13634,13635,13636,13637,13638,13639,13640,13641,13642,13643,13644
Mm_layer_V_pyramidal_poolsplit 15304,15305,15306,15307,15308,15309,15310,15311,15356,15357,15358,15359,15360,15361,15362
Mm_layer_V_pyramidal_single 15272,15273,15275,15276,15277,15278,15279,15280,15281,15282,15283,15284,15285,15286,15287,15352,15353,15354
Mm_purkinje_poolsplit 15288,15289,15290,15291,15292,15293,15294,15295,15296,15297,15298,15299,15300,15301,15302,15303
Mm_purkinje_single 15256,15257,15258,15259,15260,15261,15

In [8]:
name_to_color = {
    'Hs_asp_purkinje_UMB5294_poolsplit': Purples3[0],
    'Hs_asp_purkinje_UMB5294_single': Purples3[1],
    
    'Hs_purkinje_single': Oranges3[0],
    'Hs_purkinje_poolsplit': Oranges3[1],
    
    'Mm_purkinje_single': Blues3[0],
    'Mm_purkinje_poolsplit': Blues3[1],
    
    'Mm_layer_V_pyramidal_single': Greens3[0],
    'Mm_layer_V_pyramidal_poolsplit': Greens3[1],
}

def make_linked_axes():
    hs_purkinje_x = bokeh.models.DataRange1d()
    hs_asp_purkinje_x = bokeh.models.DataRange1d()
    mm_purkinje_x = bokeh.models.DataRange1d()
    mm_pyramidal_x = bokeh.models.DataRange1d()

    hs_purkinje_y = bokeh.models.DataRange1d()
    hs_asp_purkinje_y = bokeh.models.DataRange1d()
    mm_purkinje_y = bokeh.models.DataRange1d()
    mm_pyramidal_y = bokeh.models.DataRange1d()

    name = {}
    name['y'] = {
        'Hs_asp_purkinje_UMB5294_poolsplit': hs_asp_purkinje_y,
        'Hs_asp_purkinje_UMB5294_single': hs_asp_purkinje_y,
    
        'Hs_purkinje_single': hs_purkinje_y,
        'Hs_purkinje_poolsplit': hs_purkinje_y,
    
        'Mm_purkinje_single': mm_purkinje_y,
        'Mm_purkinje_poolsplit': mm_purkinje_y,
    
        'Mm_layer_V_pyramidal_single': mm_pyramidal_y,
        'Mm_layer_V_pyramidal_poolsplit': mm_pyramidal_y,
    }

    name['x'] = {
        'Hs_asp_purkinje_UMB5294_poolsplit': hs_asp_purkinje_x,
        'Hs_asp_purkinje_UMB5294_single': hs_asp_purkinje_x,
    
        'Hs_purkinje_single': hs_purkinje_x,
        'Hs_purkinje_poolsplit': hs_purkinje_x,
    
        'Mm_purkinje_single': mm_purkinje_x,
        'Mm_purkinje_poolsplit': mm_purkinje_x,
    
        'Mm_layer_V_pyramidal_single': mm_pyramidal_x,
        'Mm_layer_V_pyramidal_poolsplit': mm_pyramidal_x,
    }
    return name

In [9]:
spike_cpcs = dict(get_single_spike_cpc())

def make_spike_colors(cpcs):
    colors = [
        Blues3[0],
        Greens3[0],
        Reds3[0],
        Purples3[0],
        Greys3[0],
        BrBG3[2],
        Greys4[3],
    ]
    spike_color_map = {}
    cpc_counter = collections.Counter()
    
    for name, cpc in cpcs.items():
        spike_color_map[name] = colors[cpc_counter[cpc]]
        cpc_counter[cpc] += 1
    
    return spike_color_map

spike_color_map = make_spike_colors(spike_cpcs)

In [10]:
def jitter_spike(name):
    cpc = spike_cpcs.get(name, None)
    if cpc:
        spread = min((0.1 * cpc), 100)
        cpc = cpc + (spread * (numpy.random.random() - 0.5))
        return cpc
    return numpy.nan

In [11]:
#pool_fpkm = create_figure('spike_cpc', 'FPKM', 'Pool Split')
#single_fpkm = create_figure('spike_cpc', 'FPKM', 'Single')

name_axes = make_linked_axes()
plots  = {name: create_figure('spike_cpc', 'FPKM', name, 
                              y_range=name_axes['y'][name],
                              x_range=name_axes['x'][name],
                             ) for name in name_to_color }
extra = setdefault_style(fill_alpha=0.6)
missing_spike_data = collections.Counter()

for key in [ x for x in rsems.keys() if x.startswith('/genes/')]:
    library_id = key[-5:]
    experiment_name = library_to_name[library_id]
    rsem = rsems[key]
    spike_filter = rsem['gene_id'].map(lambda x: x.startswith('gSpikein'))

    rsem['library_id'] = library_id
    rsem['experiment_name'] = experiment_name
    rsem['spike_cpc'] = rsem[spike_filter]['gene_id'].map(jitter_spike)
    rsem['spike_color'] = rsem[spike_filter]['gene_id'].map(lambda x: spike_color_map.get(x))
    
    for spike_name in rsem[spike_filter]['gene_id']:
        if spike_name not in spike_color_map:
            missing_spike_data[spike_name] += 1
        
    plots[experiment_name].circle(
        'spike_cpc', 'FPKM', 
        source=ColumnDataSource(rsem[spike_filter]),
        color=rsem[spike_filter]['spike_color'],            
        #legend=name,
        **extra)

missing_spike_data

Counter({'gSpikein_ERCC-00018': 142, 'gSpikein_ERCC-00128': 142, 'gSpikein_ERCC-00007': 142, 'gSpikein_ERCC-00023': 142})

In [12]:
def show_and_save_pairs(plots, left, right, title):
    p = bokeh.io.hplot(plots[left], plots[right])
    filename = os.path.join('/dev/shm/{}.html'.format(title.lower().replace(' ', '_')))
    resources = bokeh.resources.Resources(
        mode='server',
        root_url='/~diane/bokeh/0.10.0/'
    )
    bokeh.io.save(obj=p, filename=filename, resources=resources, title=title)
    show(p)

# Hs_purkinje spike scatter

<a href="#Index">Back to index</a>

In [13]:
show_and_save_pairs(plots, 'Hs_purkinje_poolsplit', 'Hs_purkinje_single', 'HS_purkinje cpt vs FPKM')

# Hs_asp_purkinje_UMB5294 spike scatter

<a href="#Index">Back to index</a>

In [14]:
show_and_save_pairs(plots, 'Hs_asp_purkinje_UMB5294_poolsplit', 'Hs_asp_purkinje_UMB5294_single', 'HS_asp_purkinje_UMB5294 cpt vs FPKM')

#Mm_purkinje spike scatter

<a href="#Index">Back to index</a>

In [15]:
show_and_save_pairs(plots, 'Mm_purkinje_poolsplit', 'Mm_purkinje_single', 'Mm_purkinje cpt vs FPKM')

#Mm_pyramidal spike scatter

<a href="#Index">Back to index</a>

In [16]:
show_and_save_pairs(plots, 'Mm_layer_V_pyramidal_poolsplit', 'Mm_layer_V_pyramidal_single', 'Mm_pyramidal cpt vs FPKM')

In [17]:
gencode_store = pandas.HDFStore('gencode.vV19-tRNAs-ERCC.h5', 'r')

In [18]:
query_type = 'gene'
contamination_genes = collections.OrderedDict()
for gene_name in ['KRT5', 'KRT14', 'KRT17', 'KRT4', 'CRNN']:
    gene_id = gencode_store.select(
        'v19_tRNAs_ERCC',
        where='gene_name == gene_name & type == query_type')['gene_id'].values[0]
    contamination_genes[gene_id] = gene_name

In [19]:
contamination_genes

OrderedDict([('ENSG00000186081.7', 'KRT5'), ('ENSG00000186847.5', 'KRT14'), ('ENSG00000128422.11', 'KRT17'), ('ENSG00000170477.8', 'KRT4'), ('ENSG00000143536.7', 'CRNN')])

In [20]:
def get_gene_expression(rsems, replicates, gene_ids=None):
    results = {}
    for lib_id in replicates:
        lib = rsems['/genes/library_{}'.format(lib_id)].copy()
        lib.index = lib['gene_id']
        if gene_ids:
            lib = lib.loc[list(gene_ids)]
            
        results[str(lib_id)] = lib['FPKM']

    df =  pandas.DataFrame(results)
    #df['gene_id'] = df.index.map(lambda x: gene_ids[x])
    return df

#Report Contamination Genes in Human experiments

##Hs_purkinje_poolsplit

In [21]:
df = get_gene_expression(rsems, experiment_names['Hs_purkinje_poolsplit'], contamination_genes)
df

Unnamed: 0_level_0,13645,13646,13647,13648,13649,13650,13651,13652,13653,13654,13655,13656,13657,13658,13659,13660,13661,13662,13663,13664
gene_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
ENSG00000186081.7,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0,0,0,0
ENSG00000186847.5,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0,0,0,0
ENSG00000128422.11,0.5,1.22,0.86,1.18,0,1.06,0,1.39,4.33,2.78,5.39,2.07,0,0,0,4.6,0,0,0,0
ENSG00000170477.8,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0,0,0,0
ENSG00000143536.7,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0,0,0,0


##Hs_purkinje_single

In [22]:
df = get_gene_expression(rsems, experiment_names['Hs_purkinje_single'], contamination_genes)
df

Unnamed: 0_level_0,13625,13626,13627,13628,13629,13630,13631,13632,13633,13634,13635,13636,13637,13638,13639,13640,13641,13642,13643,13644
gene_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
ENSG00000186081.7,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0
ENSG00000186847.5,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0
ENSG00000128422.11,3.39,0,0.76,6.26,5.67,4.86,1.07,1.66,0.61,1.72,0,0,3.81,0.41,4.55,1.19,0,0.17,3.31,3.05
ENSG00000170477.8,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0
ENSG00000143536.7,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0


##Hs_asp_purkinje_UMB5294 poolsplit

In [23]:
df = get_gene_expression(rsems, experiment_names['Hs_asp_purkinje_UMB5294_poolsplit'], contamination_genes)
df

Unnamed: 0_level_0,13843,13844,13845,13846,13847,13848,13849,13850,13851,13852,13853,13854,13855,13856,13857,13858,13859,13860
gene_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
ENSG00000186081.7,0,0,0,0,0.0,0,0,0.0,0,0,0,0,0.0,0,0,0,0,0
ENSG00000186847.5,0,0,0,0,0.0,0,0,0.0,0,0,0,0,0.0,0,0,0,0,0
ENSG00000128422.11,0,0,0,0,0.3,0,0,1.09,0,0,0,0,6.48,0,0,0,0,0
ENSG00000170477.8,0,0,0,0,0.0,0,0,0.0,0,0,0,0,0.0,0,0,0,0,0
ENSG00000143536.7,0,0,0,0,0.0,0,0,0.0,0,0,0,0,0.0,0,0,0,0,0


##Hs_asp_purkinje_UMB5294 single

In [24]:
df = get_gene_expression(rsems, experiment_names['Hs_asp_purkinje_UMB5294_single'], contamination_genes)
df

Unnamed: 0_level_0,13824,13825,13826,13827,13828,13829,13830,13831,13832,13833,13834,13835,13836,13837,13838,13839,13840,13841,13842
gene_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
ENSG00000186081.7,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ENSG00000186847.5,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ENSG00000128422.11,0,5.53,2.26,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ENSG00000170477.8,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ENSG00000143536.7,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
def build_all_correlations_for_poolsplit_single(name):
    poolsplit_filename = name + '_poolsplit_FPKM.h5'
    single_filename = name + '_single_FPKM.h5'
    poolsplit_store = pandas.HDFStore(poolsplit_filename, 'r')
    single_store = pandas.HDFStore(single_filename, 'r')
    poolsplit_fpkm = poolsplit_store['/quantifications']
    single_fpkm = single_store['/quantifications']
    poolsplit = compute_all_vs_all_scores(poolsplit_fpkm)
    single = compute_all_vs_all_scores(single_fpkm)
    poolsplit_store.close()
    single_store.close()
    return (poolsplit, single)

In [26]:
def build_spike_correlations_for_poolsplit_single(name):
    def isspike(x):
        return x.startswith('gSpike')

    poolsplit_filename = name + '_poolsplit_FPKM.h5'
    single_filename = name + '_single_FPKM.h5'
    poolsplit_store = pandas.HDFStore(poolsplit_filename, 'r')
    single_store = pandas.HDFStore(single_filename, 'r')
    poolsplit_fpkm = poolsplit_store['/quantifications']
    single_fpkm = single_store['/quantifications']
    poolsplit = compute_all_vs_all_scores(poolsplit_fpkm[poolsplit_fpkm.index.map(isspike)])
    single = compute_all_vs_all_scores(single_fpkm[single_fpkm.index.map(isspike)])
    poolsplit_store.close()
    single_store.close()
    return (poolsplit, single)

In [27]:
def score_upper_triangular(df):
    scores = []
    for i,j in zip(*numpy.triu_indices(len(df), k=1)):
        scores.append(df.ix[i,j])
    return pandas.Series(scores)

In [28]:
def make_merged_scores(score_tuple, score_name):
    poolsplit, single = score_tuple
    scores = []
    for name, df in [('poolsplit', poolsplit), ('single', single)]:
        for i, j in zip(*numpy.triu_indices(len(df[score_name]), k=1)):
            scores.append((df[score_name].ix[i,j], name))
    scores = pandas.DataFrame(scores, columns=[score_name, 'type'])
    return scores

In [29]:
hs_purkinje_all_scores = build_all_correlations_for_poolsplit_single('Hs_purkinje')
#hs_asp_purkinje = build_correlations_for_poolsplit_single('Hs_asp_purkinje_UMB')
mm_purkinje_all_scores = build_all_correlations_for_poolsplit_single('Mm_purkinje')
mm_pyramidal_all_scores = build_all_correlations_for_poolsplit_single('Mm_layer_V_pyramidal')

In [30]:
hs_purkinje_spike_scores = build_spike_correlations_for_poolsplit_single('Hs_purkinje')
#hs_asp_purkinje = build_correlations_for_poolsplit_single('Hs_asp_purkinje_UMB')
mm_purkinje_spike_scores = build_spike_correlations_for_poolsplit_single('Mm_purkinje')
mm_pyramidal_spike_scores = build_spike_correlations_for_poolsplit_single('Mm_layer_V_pyramidal')

#Spike Correlation Histograms

##Rafa Spearman ERCC Spike Correlation Histograms

###Hs_purkinje ERCC spike Rafa Spearman histogram

<a href="#Index">Back to index</a>

In [31]:
show(bokeh.charts.Histogram(make_merged_scores(hs_purkinje_spike_scores, 'rafa_spearman'), 
                            color='type', legend='top_left', title='Hs_purkine ERCC Spikes Rafa Spearman'))

  df = df.sort(columns=columns)


###Mm_purkinje ERCC spike Rafa Spearman histogram

<a href="#Index">Back to Index</a>

In [32]:
show(bokeh.charts.Histogram(make_merged_scores(mm_purkinje_spike_scores, 'rafa_spearman'), 
                            color='type', legend='top_left', title='Mm_purkinje ERCC Spikes Rafa Spearman'))

  df = df.sort(columns=columns)


###Mm_layer_V_pyramidal ERCC spike Rafa Spearman histogram

<a href="#Index">Back to index</a>

In [33]:
show(bokeh.charts.Histogram(make_merged_scores(mm_pyramidal_spike_scores, 'rafa_spearman'), 
                            color='type', legend='top_left', title='Mm_layer_V_pyramidal ERCC Spikes Rafa Spearman'))

  df = df.sort(columns=columns)


##Naive Spearman Spike Correlation Histogram

###Hs_purkine ERCC Spikes naive Spearman

<a href="#Index">Back to index</a>

In [34]:
show(bokeh.charts.Histogram(make_merged_scores(hs_purkinje_spike_scores, 'naive_spearman'), 
                            color='type', legend='top_left', title='Hs_purkine ERCC Spikes naive Spearman'))

  df = df.sort(columns=columns)


In [35]:
show(bokeh.charts.Histogram(make_merged_scores(mm_purkinje_spike_scores, 'naive_spearman'), 
                            color='type', legend='top_left', title='Mm_purkinje ERCC Spikes naive Spearman'))

  df = df.sort(columns=columns)


In [36]:
show(bokeh.charts.Histogram(make_merged_scores(mm_pyramidal_spike_scores, 'naive_spearman'), 
                            color='type', legend='top_left', title='Mm_layer_V_pyramidal ERCC Spikes naive Spearman'))

  df = df.sort(columns=columns)


#All Correlations

##All correlations Rafa Spearman

<a href="#Index">Back to index</a>

In [37]:
show(bokeh.charts.Histogram(make_merged_scores(hs_purkinje_all_scores, 'rafa_spearman'), 
                            color='type', legend='top_left', title='Hs_purkine all rafa spearman'))

  df = df.sort(columns=columns)


In [38]:
show(bokeh.charts.Histogram(make_merged_scores(mm_purkinje_all_scores, 'rafa_spearman'), 
                            color='type', legend='top_left', title='Mm_purkine all rafa spearman'))

  df = df.sort(columns=columns)


In [39]:
show(bokeh.charts.Histogram(make_merged_scores(mm_pyramidal_all_scores, 'rafa_spearman'), 
                            color='type', legend='top_left', title='Mm_layer_V_pyramidal all rafa spearman'))

  df = df.sort(columns=columns)


In [40]:
show(bokeh.charts.Histogram(make_merged_scores(hs_purkinje_all_scores, 'naive_spearman'), 
                            color='type', legend='top_left', title='Hs_purkine all naive Spearman'))

  df = df.sort(columns=columns)


In [41]:
show(bokeh.charts.Histogram(make_merged_scores(mm_purkinje_all_scores, 'rafa_spearman'), 
                            color='type', legend='top_left', title='Mm_purkine all naive Spearman'))

  df = df.sort(columns=columns)


In [42]:
show(bokeh.charts.Histogram(make_merged_scores(mm_pyramidal_all_scores, 'rafa_spearman'), 
                            color='type', legend='top_left', title='Mm_layer_V_pyramidal all naive Spearman'))

  df = df.sort(columns=columns)


#Genes Detected

In [43]:
def genes_detected_histogram(experiment_name, threshold=0):
    filename = experiment_name + '_FPKM.h5'
    store = pandas.HDFStore(filename, 'r')
    q = store['/quantifications']
    genes_detected = q[q > 0].count(axis=1)
    genes_detected.name = 'Genes Detected'
    return bokeh.charts.Histogram(genes_detected, title=experiment_name)

In [44]:
show(genes_detected_histogram('Hs_purkinje_poolsplit'))

In [45]:
show(genes_detected_histogram('Hs_purkinje_single'))

In [46]:
show(genes_detected_histogram('Mm_layer_V_pyramidal_poolsplit'))
show(genes_detected_histogram('Mm_layer_V_pyramidal_single'))

In [47]:
show(genes_detected_histogram('Mm_purkinje_poolsplit'))
show(genes_detected_histogram('Mm_purkinje_single'))

#Hs_purkinje_poolsplit Covariance

<a href="#Index">Back to index</a>

In [48]:
store = pandas.HDFStore('Hs_purkinje_poolsplit_FPKM.h5', 'r')
q = store['/quantifications']
store.close()

Compute covariance of good data

In [49]:
q.cov()

Unnamed: 0,13645,13646,13647,13648,13649,13650,13651,13652,13653,13654,13655,13656,13657,13658,13659,13660,13661,13662,13663,13664
13645,54592.017917,60896.788047,55670.680715,51849.86933,55283.802642,55109.557565,55528.154426,51953.000504,59992.981652,56009.737842,59517.216922,54562.463596,58542.462882,51475.429507,61575.265364,57816.337431,62800.503012,60392.437258,52912.524373,55585.135283
13646,60896.788047,68605.095655,62410.068441,58059.432175,61796.794864,61624.024835,62183.287207,58163.510672,67227.441827,62763.596471,66755.069859,61097.79511,65544.474304,57511.931721,68994.283674,64807.434883,70484.413908,67707.533498,59282.154699,62160.295368
13647,55670.680715,62410.068441,57419.229252,53234.760063,56445.219878,56404.30295,57077.403372,53156.111164,61525.560784,57458.148254,61036.211538,55905.218581,59955.830936,52713.075582,63099.696247,59212.449362,64532.680681,62027.009411,54507.113228,56760.743749
13648,51849.86933,58059.432175,53234.760063,50113.032733,52966.335491,52772.451292,53525.505954,49596.687906,57663.623365,54062.097551,57543.867748,52301.586885,56149.131868,49489.083525,59323.313774,55566.810261,60678.226502,58319.401967,50836.411234,52961.499913
13649,55283.802642,61796.794864,56445.219878,52966.335491,57373.453999,56342.343146,56779.707442,53130.834938,61415.339856,57695.860171,61477.378327,55700.285193,59880.220197,52983.553625,63134.42082,59347.904408,64513.087026,61903.195789,53542.093335,56973.780987
13650,55109.557565,61624.024835,56404.30295,52772.451292,56342.343146,56512.725654,56585.681166,52577.654066,61319.417643,57364.873152,60962.681142,55422.659852,59630.910057,52466.992254,63042.790181,59054.411512,64323.199434,61856.616848,53639.981752,56438.515435
13651,55528.154426,62183.287207,57077.403372,53525.505954,56779.707442,56585.681166,58425.620395,53119.607344,62274.925688,58461.151247,62482.105127,56138.215102,60599.971145,53131.432687,64251.195833,59954.453451,66157.772976,63175.461926,54775.776166,56664.326203
13652,51953.000504,58163.510672,53156.111164,49596.687906,53130.834938,52577.654066,53119.607344,50140.878856,57225.692293,53574.905835,56962.208279,52188.468687,55891.055512,49378.95989,58654.554642,55260.933899,59901.519594,57567.093483,50432.013542,53245.137195
13653,59992.981652,67227.441827,61525.560784,57663.623365,61415.339856,61319.417643,62274.925688,57225.692293,67876.494355,63241.521385,67506.158131,60665.733928,65388.617765,57275.682536,69573.506341,64935.381287,71342.316291,68304.406473,58729.541752,61394.291475
13654,56009.737842,62763.596471,57458.148254,54062.097551,57695.860171,57364.873152,58461.151247,53574.905835,63241.521385,59999.009472,63793.421452,56738.303854,61399.96079,53832.77346,65361.911262,60976.184343,67171.990724,64226.500189,54879.115925,57445.958058


#Hs_purkinje_poolsplit shuffled Covariance

<a href="#Index">Back to index</a>

Make bad data by shuffling values in two libraries

In [50]:
qbad = q.copy()

In [51]:
qbad['13648'] = numpy.random.permutation(qbad['13648'])
qbad['13652'] = numpy.random.permutation(qbad['13652'])

In [52]:
qbad.cov()

Unnamed: 0,13645,13646,13647,13648,13649,13650,13651,13652,13653,13654,13655,13656,13657,13658,13659,13660,13661,13662,13663,13664
13645,54592.017917,60896.788047,55670.680715,-27.045642,55283.802642,55109.557565,55528.154426,40.766074,59992.981652,56009.737842,59517.216922,54562.463596,58542.462882,51475.429507,61575.265364,57816.337431,62800.503012,60392.437258,52912.524373,55585.135283
13646,60896.788047,68605.095655,62410.068441,-31.625653,61796.794864,61624.024835,62183.287207,44.446824,67227.441827,62763.596471,66755.069859,61097.79511,65544.474304,57511.931721,68994.283674,64807.434883,70484.413908,67707.533498,59282.154699,62160.295368
13647,55670.680715,62410.068441,57419.229252,-36.803209,56445.219878,56404.30295,57077.403372,28.150679,61525.560784,57458.148254,61036.211538,55905.218581,59955.830936,52713.075582,63099.696247,59212.449362,64532.680681,62027.009411,54507.113228,56760.743749
13648,-27.045642,-31.625653,-36.803209,50113.032733,-29.97978,-32.252252,-39.034575,-77.323181,-41.444333,-14.252967,-39.801261,-29.227755,-31.453468,-14.165242,-34.269352,-31.009591,-37.970513,-17.641581,-40.309045,-26.447737
13649,55283.802642,61796.794864,56445.219878,-29.97978,57373.453999,56342.343146,56779.707442,53.832495,61415.339856,57695.860171,61477.378327,55700.285193,59880.220197,52983.553625,63134.42082,59347.904408,64513.087026,61903.195789,53542.093335,56973.780987
13650,55109.557565,61624.024835,56404.30295,-32.252252,56342.343146,56512.725654,56585.681166,29.171132,61319.417643,57364.873152,60962.681142,55422.659852,59630.910057,52466.992254,63042.790181,59054.411512,64323.199434,61856.616848,53639.981752,56438.515435
13651,55528.154426,62183.287207,57077.403372,-39.034575,56779.707442,56585.681166,58425.620395,53.969147,62274.925688,58461.151247,62482.105127,56138.215102,60599.971145,53131.432687,64251.195833,59954.453451,66157.772976,63175.461926,54775.776166,56664.326203
13652,40.766074,44.446824,28.150679,-77.323181,53.832495,29.171132,53.969147,50140.878856,44.827869,56.070124,49.528878,15.900914,56.236911,47.417225,51.412262,39.875613,41.08596,39.090537,32.712637,55.615983
13653,59992.981652,67227.441827,61525.560784,-41.444333,61415.339856,61319.417643,62274.925688,44.827869,67876.494355,63241.521385,67506.158131,60665.733928,65388.617765,57275.682536,69573.506341,64935.381287,71342.316291,68304.406473,58729.541752,61394.291475
13654,56009.737842,62763.596471,57458.148254,-14.252967,57695.860171,57364.873152,58461.151247,56.070124,63241.521385,59999.009472,63793.421452,56738.303854,61399.96079,53832.77346,65361.911262,60976.184343,67171.990724,64226.500189,54879.115925,57445.958058


#Hs_purkinje_poolsplit shuffled Genes Detected

<a href="#Index">Back to index</a>

With the shuffled data how does the Gene Detetected metric work?

In [53]:
genes_detected = qbad[qbad > 0].count(axis=1)
genes_detected.name = 'Genes Detected'
show(bokeh.charts.Histogram(genes_detected))

In [54]:
spikes_detected = genes_detected[genes_detected.index.map(lambda x: x.startswith('gSpikein'))]

In [55]:
spikes_detected[spikes_detected > 18]

gene_id
gSpikein_ERCC-00060    19
gSpikein_ERCC-00096    20
gSpikein_ERCC-00108    20
gSpikein_ERCC-00130    19
gSpikein_ERCC-00131    19
gSpikein_ERCC-00145    20
Name: Genes Detected, dtype: int64

In [56]:
spike_concentration = get_single_spike_cpc()

In [57]:
names = []
for name in spikes_detected[spikes_detected  >= 18].index:
    names.append((name, spikes_detected[name], spike_concentration[name]))
sorted_names = sorted(names, key=lambda x: x[2])
print(len(sorted_names))
for n in sorted_names:
    print('{} {} {:>8.2f}'.format(*n))

22
gSpikein_ERCC-00044 18    14.11
gSpikein_ERCC-00095 18    14.11
gSpikein_ERCC-00131 19    14.11
gSpikein_ERCC-00022 18    28.22
gSpikein_ERCC-00060 19    28.22
gSpikein_ERCC-00076 18    28.22
gSpikein_ERCC-00092 18    28.22
gSpikein_ERCC-00042 18    56.44
gSpikein_ERCC-00043 18    56.44
gSpikein_ERCC-00111 18    56.44
gSpikein_ERCC-00003 18   112.88
gSpikein_ERCC-00009 18   112.88
gSpikein_ERCC-00108 20   112.88
gSpikein_ERCC-00145 20   112.88
gSpikein_ERCC-00136 18   225.75
gSpikein_ERCC-00046 18   451.50
gSpikein_ERCC-00113 18   451.50
gSpikein_ERCC-00004 18   903.00
gSpikein_ERCC-00002 18  1806.00
gSpikein_ERCC-00074 18  1806.00
gSpikein_ERCC-00096 20  1806.00
gSpikein_ERCC-00130 19  3612.00


In [58]:
len(spike_concentration[spike_concentration > 10])

26

In [59]:
q[q.index == 'gSpikein_ERCC-00116']

Unnamed: 0_level_0,13645,13646,13647,13648,13649,13650,13651,13652,13653,13654,13655,13656,13657,13658,13659,13660,13661,13662,13663,13664
gene_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
gSpikein_ERCC-00116,5.08,4.69,0.12,7.13,15.59,0.32,0,13.14,6.17,0,8.02,5.3,0,10.89,15.77,0,12.16,15.23,5.2,0


In [60]:
q[q.index.map(lambda x: x in (spike_concentration > 1000).index)]

Unnamed: 0_level_0,13645,13646,13647,13648,13649,13650,13651,13652,13653,13654,13655,13656,13657,13658,13659,13660,13661,13662,13663,13664
gene_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
gSpikein_ERCC-00002,895.12,853.62,917.74,1251.12,1566.51,1347.99,1906.75,1062.25,1897.22,1899.75,2426.33,1076.80,1531.66,1771.68,1910.55,1627.40,2753.27,2097.75,1202.83,1046.50
gSpikein_ERCC-00003,28.15,70.10,60.51,98.80,90.54,165.73,181.10,71.58,118.10,123.05,175.37,94.35,111.86,126.40,153.44,80.76,249.63,112.32,87.18,89.09
gSpikein_ERCC-00004,421.71,381.82,356.88,574.31,803.20,537.84,892.14,462.32,889.64,898.49,1064.84,539.61,847.06,788.43,925.52,760.07,1381.76,1079.06,450.57,487.63
gSpikein_ERCC-00009,88.86,40.02,73.94,146.65,91.50,59.30,137.71,68.93,52.51,147.06,111.10,64.84,42.06,134.23,29.17,98.54,100.58,106.37,78.39,56.91
gSpikein_ERCC-00012,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,6.94,0.00
gSpikein_ERCC-00013,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
gSpikein_ERCC-00014,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.14,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
gSpikein_ERCC-00016,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
gSpikein_ERCC-00017,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
gSpikein_ERCC-00019,0.00,0.00,4.21,0.00,18.50,1.96,11.39,5.81,18.45,0.00,0.00,5.83,0.00,10.45,11.29,0.00,0.36,34.19,6.33,0.00


In [61]:
tpms = { k[-5:]: rsems[k][['gene_id', 'TPM']].set_index('gene_id') for k in list(rsems.keys()) if k.startswith('/genes') }

In [62]:
for k in tpms:
    tpms[k].columns = [k]

In [63]:
fulltable = pandas.concat(tpms.values(), axis=1)

In [64]:
tpmspikes = fulltable[fulltable.index.map(lambda x: x.startswith('gSpikein'))]

In [65]:
spikesfrac = tpmspikes.sum(axis=0) / fulltable.sum(axis=0)

In [66]:
spikesfrac[spikesfrac > .05]

13835    0.080269
13839    0.082819
13853    0.059240
13850    0.136319
13854    0.063183
13852    0.053719
13633    0.119541
13845    0.054122
13849    0.064380
13830    0.061038
dtype: float64

In [67]:
spikesfrac['15283']

0.03584150860196203

In [68]:
fpkms = { k[-5:]: rsems[k][['gene_id', 'FPKM']].set_index('gene_id') for k in list(rsems.keys()) if k.startswith('/genes') }

In [69]:
for k in fpkms:
    fpkms[k].columns = [k]
fullfpkms = pandas.concat(fpkms.values(), axis=1)

In [70]:
fpkmspikes = fullfpkms[fullfpkms.index.map(lambda x: x.startswith('gSpikein'))]

In [71]:
fpkmspikemean = fpkmspikes[fpkmspikes > 0].dropna(how='any').mean(axis=0)

In [72]:
fpkmspikemean[fpkmspikemean > 1000]

13857    2191.922667
13855    2138.768000
13844    1212.559333
13828    1413.066000
13835    3894.807333
13636    1474.187333
13837    1008.314000
13839    4141.052000
13843    1423.693333
13853    3494.657333
13848    1664.484000
13859    1413.497333
13661    1237.350667
13842    1102.703333
13850    8812.128000
13838    2104.490667
13846    1415.219333
13840    1730.453333
13851    1964.186667
13629    1435.518667
13655    1172.741333
13836    1228.704667
13854    3568.868667
13831    1338.320000
13856    1839.377333
13858    2103.948667
13834    1280.256000
13852    3219.029333
13824    1523.095333
13662    1033.822000
13832    1317.382667
15283    2202.813333
13626    1158.324667
13833    2040.344667
13633    6112.889333
13635    1332.896000
13845    3337.502000
13849    3679.216000
13860    2104.002000
13841    1179.164000
13637    1048.988000
13830    3273.722667
dtype: float64

In [73]:
hs_purkinje_poolsplit = get_gene_expression(rsems, experiment_names['Hs_purkinje_poolsplit'])

In [74]:
hs_purkinje_poolsplit_stable = compute_all_vs_all_scores(hs_purkinje_poolsplit)
hs_purkinje_poolsplit_stable['rafa_spearman']

Unnamed: 0,13645,13646,13647,13648,13649,13650,13651,13652,13653,13654,13655,13656,13657,13658,13659,13660,13661,13662,13663,13664
13645,,0.77774,0.762975,0.751179,0.730165,0.747416,0.70889,0.758559,0.727958,0.715513,0.681474,0.744802,0.727513,0.700966,0.708444,0.735445,0.683995,0.696985,0.734772,0.761446
13646,0.77774,,0.779696,0.75205,0.734792,0.752961,0.716179,0.762697,0.742698,0.713134,0.692597,0.761205,0.731197,0.713415,0.722162,0.741574,0.692212,0.700302,0.740642,0.77175
13647,0.762975,0.779696,,0.752918,0.722809,0.741193,0.717945,0.752549,0.729259,0.714934,0.682955,0.757721,0.730912,0.710068,0.719535,0.736208,0.68553,0.693936,0.73979,0.751817
13648,0.751179,0.75205,0.752918,,0.721738,0.728062,0.709073,0.748225,0.717127,0.710372,0.676382,0.74072,0.708798,0.70312,0.705224,0.720491,0.673362,0.686274,0.724567,0.743918
13649,0.730165,0.734792,0.722809,0.721738,,0.714646,0.683842,0.726255,0.704792,0.699151,0.670951,0.722042,0.698809,0.688029,0.692119,0.71902,0.663205,0.671067,0.699156,0.732924
13650,0.747416,0.752961,0.741193,0.728062,0.714646,,0.695098,0.739429,0.720378,0.698942,0.670203,0.735235,0.712604,0.693373,0.7051,0.717361,0.676734,0.691438,0.721417,0.745493
13651,0.70889,0.716179,0.717945,0.709073,0.683842,0.695098,,0.711097,0.693044,0.677343,0.664223,0.704856,0.693971,0.668435,0.681958,0.691535,0.66243,0.671454,0.694606,0.705987
13652,0.758559,0.762697,0.752549,0.748225,0.726255,0.739429,0.711097,,0.719249,0.709261,0.690597,0.747582,0.720802,0.703365,0.701886,0.738539,0.690006,0.692637,0.732679,0.748526
13653,0.727958,0.742698,0.729259,0.717127,0.704792,0.720378,0.693044,0.719249,,0.696884,0.67694,0.726695,0.704446,0.69166,0.707613,0.721635,0.676245,0.682451,0.705101,0.725875
13654,0.715513,0.713134,0.714934,0.710372,0.699151,0.698942,0.677343,0.709261,0.696884,,0.663537,0.713718,0.692166,0.671325,0.691326,0.700288,0.660798,0.670922,0.696362,0.710748


In [75]:
show(bokeh.charts.Histogram(score_upper_triangular(hs_purkinje_poolsplit_stable['rafa_spearman'])))

In [76]:
hs_purkinje_poolsplit_plus_one = get_gene_expression(rsems, experiment_names['Hs_purkinje_poolsplit'] + ['13843'])
hs_purkinje_poolsplit_plus_one_stable = compute_all_vs_all_scores(hs_purkinje_poolsplit_plus_one)
hs_purkinje_poolsplit_plus_one_stable['rafa_spearman']

Unnamed: 0,13645,13646,13647,13648,13649,13650,13651,13652,13653,13654,...,13656,13657,13658,13659,13660,13661,13662,13663,13664,13843
13645,,0.77774,0.762975,0.751179,0.730165,0.747416,0.70889,0.758559,0.727958,0.715513,...,0.744802,0.727513,0.700966,0.708444,0.735445,0.683995,0.696985,0.734772,0.761446,0.629931
13646,0.77774,,0.779696,0.75205,0.734792,0.752961,0.716179,0.762697,0.742698,0.713134,...,0.761205,0.731197,0.713415,0.722162,0.741574,0.692212,0.700302,0.740642,0.77175,0.635474
13647,0.762975,0.779696,,0.752918,0.722809,0.741193,0.717945,0.752549,0.729259,0.714934,...,0.757721,0.730912,0.710068,0.719535,0.736208,0.68553,0.693936,0.73979,0.751817,0.631147
13648,0.751179,0.75205,0.752918,,0.721738,0.728062,0.709073,0.748225,0.717127,0.710372,...,0.74072,0.708798,0.70312,0.705224,0.720491,0.673362,0.686274,0.724567,0.743918,0.623006
13649,0.730165,0.734792,0.722809,0.721738,,0.714646,0.683842,0.726255,0.704792,0.699151,...,0.722042,0.698809,0.688029,0.692119,0.71902,0.663205,0.671067,0.699156,0.732924,0.611924
13650,0.747416,0.752961,0.741193,0.728062,0.714646,,0.695098,0.739429,0.720378,0.698942,...,0.735235,0.712604,0.693373,0.7051,0.717361,0.676734,0.691438,0.721417,0.745493,0.610524
13651,0.70889,0.716179,0.717945,0.709073,0.683842,0.695098,,0.711097,0.693044,0.677343,...,0.704856,0.693971,0.668435,0.681958,0.691535,0.66243,0.671454,0.694606,0.705987,0.617395
13652,0.758559,0.762697,0.752549,0.748225,0.726255,0.739429,0.711097,,0.719249,0.709261,...,0.747582,0.720802,0.703365,0.701886,0.738539,0.690006,0.692637,0.732679,0.748526,0.637006
13653,0.727958,0.742698,0.729259,0.717127,0.704792,0.720378,0.693044,0.719249,,0.696884,...,0.726695,0.704446,0.69166,0.707613,0.721635,0.676245,0.682451,0.705101,0.725875,0.626544
13654,0.715513,0.713134,0.714934,0.710372,0.699151,0.698942,0.677343,0.709261,0.696884,,...,0.713718,0.692166,0.671325,0.691326,0.700288,0.660798,0.670922,0.696362,0.710748,0.617351


In [77]:
show(bokeh.charts.Histogram(score_upper_triangular(hs_purkinje_poolsplit_plus_one_stable['rafa_spearman'])))

In [78]:
mm_purkinje_poolsplit = get_gene_expression(rsems, experiment_names['Mm_purkinje_poolsplit'])

In [79]:
mm_purkinje_poolsplit_stable0 = compute_all_vs_all_scores(mm_purkinje_poolsplit, Acutoff=0)
mm_purkinje_poolsplit_stable0['rafa_spearman']

Unnamed: 0,15288,15289,15290,15291,15292,15293,15294,15295,15296,15297,15298,15299,15300,15301,15302,15303
15288,,0.857643,0.853034,0.852507,0.86261,0.856134,0.861973,0.855723,0.846385,0.849049,0.85769,0.854998,0.85805,0.858046,0.850122,0.86148
15289,0.857643,,0.858954,0.854077,0.864353,0.85658,0.865344,0.859745,0.854381,0.849741,0.861873,0.863684,0.860608,0.864294,0.852327,0.864887
15290,0.853034,0.858954,,0.856862,0.861287,0.853999,0.859612,0.858209,0.850137,0.845645,0.856899,0.857466,0.856482,0.855483,0.853051,0.863179
15291,0.852507,0.854077,0.856862,,0.86245,0.854837,0.857329,0.859585,0.847307,0.847246,0.859123,0.855576,0.860294,0.856708,0.853525,0.859847
15292,0.86261,0.864353,0.861287,0.86245,,0.864648,0.868334,0.866317,0.856004,0.850628,0.866016,0.864377,0.86552,0.868474,0.857016,0.870775
15293,0.856134,0.85658,0.853999,0.854837,0.864648,,0.864279,0.857339,0.849304,0.849376,0.860458,0.856455,0.859718,0.861099,0.850568,0.86278
15294,0.861973,0.865344,0.859612,0.857329,0.868334,0.864279,,0.869216,0.857357,0.851257,0.866245,0.860355,0.864508,0.868817,0.860069,0.871428
15295,0.855723,0.859745,0.858209,0.859585,0.866317,0.857339,0.869216,,0.85117,0.847095,0.862774,0.859247,0.863797,0.863251,0.857715,0.864068
15296,0.846385,0.854381,0.850137,0.847307,0.856004,0.849304,0.857357,0.85117,,0.849503,0.856479,0.855809,0.856432,0.856696,0.848447,0.858267
15297,0.849049,0.849741,0.845645,0.847246,0.850628,0.849376,0.851257,0.847095,0.849503,,0.852106,0.85049,0.852504,0.850364,0.845491,0.851428


In [81]:
show(bokeh.charts.Histogram(score_upper_triangular(mm_purkinje_poolsplit_stable0['rafa_spearman'])))

In [82]:
mm_purkinje_poolsplit_stable5 = compute_all_vs_all_scores(mm_purkinje_poolsplit, Acutoff=5)
mm_purkinje_poolsplit_stable5['rafa_spearman']

Unnamed: 0,15288,15289,15290,15291,15292,15293,15294,15295,15296,15297,15298,15299,15300,15301,15302,15303
15288,,0.839156,0.847677,0.83654,0.845841,0.844433,0.841417,0.833837,0.845291,0.825461,0.846015,0.837892,0.841029,0.843636,0.844036,0.83956
15289,0.839156,,0.843063,0.830479,0.849805,0.847704,0.84252,0.842392,0.833934,0.817725,0.842829,0.84711,0.835154,0.849189,0.839981,0.846128
15290,0.847677,0.843063,,0.84843,0.851911,0.850613,0.835508,0.841103,0.840491,0.833992,0.838381,0.840028,0.8472,0.841789,0.849642,0.84351
15291,0.83654,0.830479,0.84843,,0.852086,0.846391,0.839257,0.833778,0.834166,0.820625,0.842061,0.841099,0.836803,0.836135,0.847462,0.841297
15292,0.845841,0.849805,0.851911,0.852086,,0.855359,0.845945,0.852973,0.846529,0.829317,0.848444,0.844746,0.847243,0.853856,0.852722,0.853224
15293,0.844433,0.847704,0.850613,0.846391,0.855359,,0.853906,0.853926,0.839521,0.835003,0.848322,0.84431,0.844495,0.849653,0.855213,0.855215
15294,0.841417,0.84252,0.835508,0.839257,0.845945,0.853906,,0.847377,0.843034,0.817044,0.845271,0.840707,0.843572,0.849073,0.844235,0.84917
15295,0.833837,0.842392,0.841103,0.833778,0.852973,0.853926,0.847377,,0.838322,0.82087,0.8422,0.843667,0.847999,0.848454,0.848976,0.848878
15296,0.845291,0.833934,0.840491,0.834166,0.846529,0.839521,0.843034,0.838322,,0.843438,0.841077,0.833635,0.83811,0.844334,0.844778,0.835774
15297,0.825461,0.817725,0.833992,0.820625,0.829317,0.835003,0.817044,0.82087,0.843438,,0.833472,0.822202,0.825183,0.821703,0.825417,0.825997


In [83]:
mm_purkinje_poolsplit_plus_one = get_gene_expression(rsems, experiment_names['Mm_purkinje_poolsplit'] + ['15362'])

In [84]:
mm_purkinje_poolsplit_plus_one_stable0 = compute_all_vs_all_scores(mm_purkinje_poolsplit_plus_one, Acutoff=0)
mm_purkinje_poolsplit_plus_one_stable0['rafa_spearman']

Unnamed: 0,15288,15289,15290,15291,15292,15293,15294,15295,15296,15297,15298,15299,15300,15301,15302,15303,15362
15288,,0.857643,0.853034,0.852507,0.86261,0.856134,0.861973,0.855723,0.846385,0.849049,0.85769,0.854998,0.85805,0.858046,0.850122,0.86148,0.698674
15289,0.857643,,0.858954,0.854077,0.864353,0.85658,0.865344,0.859745,0.854381,0.849741,0.861873,0.863684,0.860608,0.864294,0.852327,0.864887,0.70746
15290,0.853034,0.858954,,0.856862,0.861287,0.853999,0.859612,0.858209,0.850137,0.845645,0.856899,0.857466,0.856482,0.855483,0.853051,0.863179,0.702936
15291,0.852507,0.854077,0.856862,,0.86245,0.854837,0.857329,0.859585,0.847307,0.847246,0.859123,0.855576,0.860294,0.856708,0.853525,0.859847,0.691867
15292,0.86261,0.864353,0.861287,0.86245,,0.864648,0.868334,0.866317,0.856004,0.850628,0.866016,0.864377,0.86552,0.868474,0.857016,0.870775,0.696219
15293,0.856134,0.85658,0.853999,0.854837,0.864648,,0.864279,0.857339,0.849304,0.849376,0.860458,0.856455,0.859718,0.861099,0.850568,0.86278,0.691964
15294,0.861973,0.865344,0.859612,0.857329,0.868334,0.864279,,0.869216,0.857357,0.851257,0.866245,0.860355,0.864508,0.868817,0.860069,0.871428,0.704171
15295,0.855723,0.859745,0.858209,0.859585,0.866317,0.857339,0.869216,,0.85117,0.847095,0.862774,0.859247,0.863797,0.863251,0.857715,0.864068,0.699161
15296,0.846385,0.854381,0.850137,0.847307,0.856004,0.849304,0.857357,0.85117,,0.849503,0.856479,0.855809,0.856432,0.856696,0.848447,0.858267,0.689513
15297,0.849049,0.849741,0.845645,0.847246,0.850628,0.849376,0.851257,0.847095,0.849503,,0.852106,0.85049,0.852504,0.850364,0.845491,0.851428,0.693064


In [85]:
show(bokeh.charts.Histogram(score_upper_triangular(mm_purkinje_poolsplit_plus_one_stable0['rafa_spearman'])))

In [86]:
mm_purkinje_poolsplit_plus_one_stable5 = compute_all_vs_all_scores(mm_purkinje_poolsplit_plus_one, Acutoff=5)
mm_purkinje_poolsplit_plus_one_stable5['rafa_spearman']

Unnamed: 0,15288,15289,15290,15291,15292,15293,15294,15295,15296,15297,15298,15299,15300,15301,15302,15303,15362
15288,,0.839156,0.847677,0.83654,0.845841,0.844433,0.841417,0.833837,0.845291,0.825461,0.846015,0.837892,0.841029,0.843636,0.844036,0.83956,0.586238
15289,0.839156,,0.843063,0.830479,0.849805,0.847704,0.84252,0.842392,0.833934,0.817725,0.842829,0.84711,0.835154,0.849189,0.839981,0.846128,0.581148
15290,0.847677,0.843063,,0.84843,0.851911,0.850613,0.835508,0.841103,0.840491,0.833992,0.838381,0.840028,0.8472,0.841789,0.849642,0.84351,0.592111
15291,0.83654,0.830479,0.84843,,0.852086,0.846391,0.839257,0.833778,0.834166,0.820625,0.842061,0.841099,0.836803,0.836135,0.847462,0.841297,0.570839
15292,0.845841,0.849805,0.851911,0.852086,,0.855359,0.845945,0.852973,0.846529,0.829317,0.848444,0.844746,0.847243,0.853856,0.852722,0.853224,0.579479
15293,0.844433,0.847704,0.850613,0.846391,0.855359,,0.853906,0.853926,0.839521,0.835003,0.848322,0.84431,0.844495,0.849653,0.855213,0.855215,0.58346
15294,0.841417,0.84252,0.835508,0.839257,0.845945,0.853906,,0.847377,0.843034,0.817044,0.845271,0.840707,0.843572,0.849073,0.844235,0.84917,0.572286
15295,0.833837,0.842392,0.841103,0.833778,0.852973,0.853926,0.847377,,0.838322,0.82087,0.8422,0.843667,0.847999,0.848454,0.848976,0.848878,0.58022
15296,0.845291,0.833934,0.840491,0.834166,0.846529,0.839521,0.843034,0.838322,,0.843438,0.841077,0.833635,0.83811,0.844334,0.844778,0.835774,0.577042
15297,0.825461,0.817725,0.833992,0.820625,0.829317,0.835003,0.817044,0.82087,0.843438,,0.833472,0.822202,0.825183,0.821703,0.825417,0.825997,0.569699


In [87]:
show(bokeh.charts.Histogram(score_upper_triangular(mm_purkinje_poolsplit_plus_one_stable5['rafa_spearman'])))

In [97]:
mm_purkinje_single = get_gene_expression(rsems, experiment_names['Mm_purkinje_single'])
mm_purkinje_single_score = compute_all_vs_all_scores(mm_purkinje_single)
show(bokeh.charts.Histogram(score_upper_triangular(mm_purkinje_single_score['rafa_spearman'])))

In [96]:
mm_purkinje_single_plus_one = get_gene_expression(rsems, experiment_names['Mm_purkinje_single'] + ['15272'])
mm_purkinje_single_plus_one_score = compute_all_vs_all_scores(mm_purkinje_poolsplit_plus_one)
show(bokeh.charts.Histogram(score_upper_triangular(mm_purkinje_single_plus_one_score['rafa_spearman'])))


- Compute distribution of how spikes correlate
- plot distribution of spikes for gingeras and us on the scatter plot style.
- how do the spikes perform on our 10ng when compared to our single cell.
- color spikes where the GC > .45 (or so), is their variance higher?
  figure out high GC.
  
- Also is 171's mates expressed higher than 5 fpkm expression?

- qc

  - Make sure at least N spikes are detected in a per tube
  - make sure at least N genes are detected in a per tube.
