In [None]:
# setting up jupyter
%matplotlib inline
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
mpl.rcParams['savefig.dpi'] = 300

# Example 3 - defined <i>in vitro</i> differentiation of mESCs 
This example features a merged dataset of 3 samples, each with its own control. In this context, we want to confirm the differentiation of motor neurons and cardiomyocytes, and further check transcriptional similarity for retinoic acid treated cells. This will include the use of <i>italic</i> labels, renaming sample elements, dealing with multiple samples instances and adjusting the minimum required detection threshold.

As always, we begin by initiating samples and targets. Since the input data for this experiment was gathered from different sources, the samples have separate controls. Therefore, we create three individual samples which take the expression data in the form of a <i>pandas.DataFrame</i> slice:

In [None]:
from DPre import samples
import pandas as pd

# iniatite multiple sample data instances because of different controls
ivd_expr = pd.read_csv('ivd_expression.tsv', sep='\t', index_col='ensg')
# cardiomyocytes 
cm_sample = samples(expression = ivd_expr.loc[:, ['ivd ESCs (cardio)', 'ivd cardiomyocytes']],    # syntax: loc[all rows, specifc columns] 
                    ctrl = 'ivd ESCs (cardio)',
                    name = 'in vitro differentiated cardiomyocytes')
# motor neurons
mn_sample = samples(expression = ivd_expr.loc[:, ['ivd ESCs (mneu)', 'ivd motor neurons']], 
                    ctrl = 'ivd ESCs (mneu)',
                    name = 'in vitro differentiated motor neurons')
# retinoic acid treated
ra_sample = samples(expression = ivd_expr.loc[:, ['ivd ESCs (ra)', 'ivd retinoic acid']],
                    ctrl = 'ivd ESCs (ra)',
                    name = 'in vitro differentiated ESCs +retinoic acid')


When initiating samples or targets, the number of detected genes is shown. In this data, the samples only list 10251 genes, which could result in low <b>proportional target marker gene detection</b> values. We can visualize those proportions with the <b><i>plot_detec_mgs_prop()</i></b> function:

In [None]:
from DPre import preset_targets

# initatie the target to compare against
t = preset_targets('mouse')
hist = t.plot_detec_mgs_prop(samples = cm_sample,
                             filename = 'trg_mgs_detec_prop.png',
                             plt_show = True     
                            )


Especially the blood mesoderm shows very low detection values. The default threshold of 15% delivered solid results in our testing, however, if too many targets are dropped or similarity scores seem off for low detection targets, feel free to adjust this value in <i>conifg.DROP_TARGET_DETEC_THR</i>. For temporal changes pass DROP_TARGET_DETEC_THR to the plotting function. Note however that the change only applies if passed to the first plot in the script. For applying changes in the command line interface, the constant must be manually changed in the config file.

Next we want to make the 'in vitro' in the <b>labels italic</b>. This can be achieved by using LaTeX and DPre's <b>elements name setting</b> functionality (not implemented for the command line interface):

In [None]:
# make the `in vitro` in the labels italic
it_in_vit = '$\mathit{in}$ $\mathit{vitro}$'     # TEX expression for italic 'in vitro' string
cm_sample.names = ({'ivd cardiomyocytes': it_in_vit+' differentiated cardiomyocytes'})    # passing a mapping of old name -> new name
mn_sample.names = ({'ivd motor neurons': it_in_vit+' differentiated motor neurons'})
ra_sample.names = ({'ivd retinoic acid': 'ESCs +retinoic acid'})

To identify the peak differentiation bias, we use the <i>ranked_similarity_barplot()</i> function on the three samples:

In [None]:
# draw the ranked similarity plot iteratively
for sam in (cm_sample, mn_sample, ra_sample):
    t.ranked_similarity_barplot(samples = sam, 
                                n_targets = 10, 
                                xlim_range = [-1.7, 1.7],    # ensure consistent ranges across plots
                                display_negative = True,
                                pivot = True,
                                targetlabels_size = .95,      # downscale the labels for space saving
                                targetlabels_space = .6,
                                BP_TOP = .5,
                                BP_BARSPACE = .6,
                                BP_BOTTOM = .2,
                                title = sam.names[1],      # set the element name as the title
                                filename = 'ranked_sim.png',    # when .png, DPre autiamically uniquely extends the filename
                                plt_show = True,
                                )

or with the command line interface in three commands:

In [None]:
# copy and paste into your terminal
> python ../../dpre.py -pt "mouse" -se "ivd_expression.tsv" -c "ivd ESCs (cardio)" -ss "ivd cardiomyocytes"  -ss "ivd ESCs (cardio)"  -sn "in vitro differentiated cardiomyocytes" ranked_sim -nt 10 -x -1.7 -x 1.7 -din -pi -tas .95 -ta .6 -f "ranked_sim.png"

In [None]:
> python ../../dpre.py -pt "mouse" -se "ivd_expression.tsv" -c "ivd ESCs (mneu)" -ss "ivd motor neurons"  -ss "ivd ESCs (mneu)" -sn "in vitro differentiated motor neurons" ranked_sim -nt 10 -x -1.7 -x 1.7 -din -pi -tas .95 -ta .6 -f "ranked_sim.png"

In [None]:
> python ../../dpre.py -pt "mouse" -se "ivd_expression.tsv" -c "ivd ESCs (ra)" -ss "ivd retinoic acid" -ss "ivd ESCs (ra)" -sn "in vitro differentiated mESCs +retinoic acid" ranked_sim  -nt 10 -x -1.7 -x 1.7 -din -pi -tas .95 -ta .6 -f "ranked_sim.png"

The differential similarity has the advantage of identifying the <i>bias</i> in differentiation. Instead of judging whether a cell kept its original identity or transitioned into another cell type, only the direction is suggested. However, in highly differentiated samples like in this example, the <b>absolute similarity</b> can give valuable insight on transcriptional identity. Since the absolute similarity doesn't require a control in the samples, we can merge the three into one samples instance and create one summarizing similarity heatmap:

In [None]:
import pandas as pd
from DPre import samples

# make a samples instance with the 3 samples above to produce one heatmap
all_expr = ivd_expr.loc[:, ['ivd cardiomyocytes', 'ivd motor neurons', 'ivd retinoic acid']]      # exclude controls
all_expr.columns = [it_in_vit+' differentiated cardiomyocytes',     # rename expression input for directly getting the correct names
                    it_in_vit+' differentiated motor neurons', 
                    'ESCs +retinoic acid']                                   
all_sample = samples(expression = all_expr, 
                     name = it_in_vit + ' differentiated ESCs')
hm = t.target_similarity_heatmap(samples = all_sample,
                                 differential = False,             # on command line this is passed by absolute = True, i.e. -a
                                 pivot = True,
    #                             cluster_targets = True          # cluster the targets and draw a dendrogram on top
                                 heatmap_height = 1.6,        # make heatmap higher
                                 heatmap_width = .13, 
                                 hide_targetlabels = True, 
                                 targetlabels_space = .45,
                                 samplelabels_space = 1,
                                 HM_TOP = .4,
                                 HM_RIGHT = .1,
                                 filename = 'abs_sim_hm.png',
                                 plt_show = True,
                                 )

or on the command line:

In [None]:
# copy and paste into your terminal
> python ../../dpre.py -pt "mouse" -se "ivd_expression.tsv" -ss "ivd cardiomyocytes"  -ss "ivd motor neurons"  -ss "ivd retinoic acid" -sn "in vitro differentiated ESCs" target_sim -a -pi -hh 1.6 -hw .13 -hta -ta .45 -sa 1 -f "abs_sim_hm.png"