# POSE pipeline

(Single-modal) In this tutorial, we demonstrate how to perform the POSE pipeline on multiple myeloma (MM) RNA data restricted to the apoptosis pathway. This analysis entails the following:

1. Upload data
2. Compute pairwise sample (WE) distances with respect to gene neigborhoods 
3. Compute global pairwise sample (WE) distance matrices
4. Convert global distances to a single sample pairwise similarity matrix
5. (Future work - dpt distance and/or multi-scale diffusion based distances)
6. Extract pseudo-organization (i.e., ordering) of samples.
7. Determine schema (i.e., branching).
8. Visualize schema
9. (Future work - further investigation into samples within different branches and differential analysis between branches)

- $d_W(i,j,v)$ - The Wasserstein distance of the 1-hop neighborhood around gene $v$ between sample $i$ and sample $j$
- $d_E(i,j,v)$ - The Euclidean distance of the 1-hop neighborhood around gene $v$ between sample $i$ and sample $j$
- $D_W(i,j) = |d_W(i,j,v)|$ - Net Wasserstein distance between sample $i$ and sample $j$ wrt all genes $v$
- $D_E(i,j) = |d_E(i,j,v)|$ - Net Euclidean distance between sample $i$ and sample $j$ wrt all genes $v$
- $K_W = e^{-\frac{\|D_W\|^2}{\sigma^2}}$ - Pairwise sample Wasserstein similarity matrix 
- $K_E = e^{-\frac{\|D_E\|^2}{\sigma^2}}$ - Pairwise sample Euclidean similarity matrix 
- (Multi-feature, multi-modal) $K = \frac{K_W + K_E}{2}$ - Fused pairwise sample similarity matrix
- $D = 1 - K$ - Fused pairwise sample distance matrix
- (Clustering) Determine branching according to lineage tracing algorithm using $D$ 
- (Visualizing) Pseudo-ordering of samples in branches according to distance from root node $r$, i.e., $D(r,:)$

First, import the necessary packages:

# Load libraries

In [1]:
import pathlib
import sys

import networkx as nx
import numpy as np
import pandas as pd

If ``netflow`` has not been installed, add the path to the library:

In [2]:
sys.path.insert(0, pathlib.Path(pathlib.Path('.').absolute()).parents[3].resolve().as_posix())
# sys.path.insert(0, pathlib.Path(pathlib.Path('.').absolute()).parents[0].resolve().as_posix())

From the ``netflow`` package, we load the following modules:
 - The ``InfoNet`` class is used to compute 1-hop neighborhood distances
 - The ``Keeper`` class is used to store and manipulate data/results

In [3]:
import netflow as nf

In [4]:
# from netflow.keepers import keeper 

# Load data

In [5]:
RNA_FNAME = '/Users/renae12/Library/CloudStorage/OneDrive-MemorialSloanKetteringCancerCenter/!GDriveMigratedData/My Documents/MSKCC/data/multiple_myeloma_apoptosis_ASK/data/rna_hgnc_apop140_660.csv'
E_FNAME = '/Users/renae12/Library/CloudStorage/OneDrive-MemorialSloanKetteringCancerCenter/!GDriveMigratedData/My Documents/MSKCC/data/multiple_myeloma_apoptosis_ASK/data/E_apop140.csv'

In [6]:
X = pd.read_csv(RNA_FNAME, header=0, index_col=0)
print(X.shape)

(140, 669)


In [7]:
E = pd.read_csv(E_FNAME, header=0)
G = nx.from_pandas_edgelist(E)
print(G)

Graph with 140 nodes and 672 edges


Upload data to the keeper:

In [8]:
# results = keeper.Keeper(data={'rna': X})

data_label = 'rna'
keeper = nf.Keeper(data={data_label: X})
keeper._check_observation_labels()
keeper._check_num_observations()

Add the PPI network to the misc data for storage and later reference.

In [9]:
graph_label = 'PPI'
keeper.add_misc(G, graph_label)

# Compute pairwise-sample 1-hop Wasserstein distances

In [10]:
inet = nf.InfoNet(keeper, graph_label, layer=data_label,
                  label='pw_sample_1hop_wass', outdir=None)

In [11]:
dhop = inet.compute_graph_distances(weight=None)

In [12]:
wds = inet.multiple_pairwise_sample_neighborhood_wass_distance(graph_distances=dhop, 
                                                               include_self=False,
                                                               desc='Computing pairwise 1-hop distances',
                                                               profiles_desc='t0')

Computing pairwise 1-hop distances:  92%|[33m██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊          [0m| 117/127 [10:24<00:51,  5.15s/it][0m

MSG      node 124: MMRF_1021_1_BM_CD138pos - MMRF_1541_1_BM_CD138pos: Cannot    
         compute Wasserstein distance between a zero-profile, returning         
         Wasserstein distance = nan.                                            
MSG      node 124: MMRF_1029_1_BM_CD138pos - MMRF_1541_1_BM_CD138pos: Cannot    
         compute Wasserstein distance between a zero-profile, returning         
         Wasserstein distance = nan.                                            
MSG      node 124: MMRF_1391_1_BM_CD138pos - MMRF_1541_1_BM_CD138pos: Cannot    
         compute Wasserstein distance between a zero-profile, returning         
         Wasserstein distance = nan.                                            
MSG      node 124: MMRF_1269_1_BM_CD138pos - MMRF_1541_1_BM_CD138pos: Cannot    
         compute Wasserstein distance between a zero-profile, returning         
         Wasserstein distance = nan.                                            
MSG      node 124: MMRF_1030

Computing pairwise 1-hop distances: 100%|[33m█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████[0m| 127/127 [11:16<00:00,  5.32s/it][0m


# Compute pairwise-sample 1-hop Euclidean distances

# Compute pairwise-sample profile Wasserstein distances

# Compute pairwise-sample profile Euclidean distances