# Compute the Moment Propagation Embeddings.

This shows how to retrieve the moment propagation (MomProp) embeddings after the input data have been preprocessed.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from MomPropEmbeddings import data, utils, features

## Create the MomProp Embeddings

The generation of the MomProp embeddings is a 2-step procedure, that is:
1. Generation of Moment Embeddings,
2. Generation of Propagation Embeddings.

For this purpose, a number of hyperparameters have to be set, that is:
* node_feature: the node feature to use.
* moments: the moments that should be used to represent a genes neighborhood.
* n_hops: The number of k-hop neighborhoods for which the moments have been generated. 
* n_steps: The number of network propagation steps.
* edge_weight: type of edge-weights to use ('std' or 'none')
* path_weight: type of path-weights to use ('mean' or 'max')

In [2]:
node_feature = 'log_pvalue'
moments = ['mean', 'std', 'skew', 'kurt']
n_hops = [1, 2]
n_steps = 2
edge_weight = 'std'
path_weight = 'max'

In [3]:
# load the basegraph.
basegraph_pkl = './data/basegraph.pkl'
bg = utils.load_pickle(basegraph_pkl)

if edge_weight is not 'none':
    # load the shortest path weights.
    weights = utils.load_pickle(f'./data/{edge_weight}_path_weights_{max(n_hops)}hop_{path_weight}.pkl')

## Generation of Moment Embeddings.

Generates the different embeddings of the nodes' local neighborhoods. Those are the features that will be propagated in the next steps.

__Remark__: This might take a few minutes for large data sets, and we recommend to run this step separately and pickle the resulting graph data.

In [4]:
if edge_weight == 'none':
    weight_dict = None
else:
    weight_dict = weights.shortest_path_weights

    
for m in moments:
    for k in n_hops:
        bg, _ = features.weighted_attr_kx(
            bg, node_feature, stat=m, 
            k=k, weight=edge_weight, weight_dict=weights.shortest_path_weights
        )
        
# save.
moment_embeddings_pkl = './data/moment_embeddings.pkl'
utils.write_pickle(bg, moment_embeddings_pkl)

In [5]:
print(f'Node attributes: {bg.vs.attributes()}')

Node attributes: ['name', 'pvalue', 'k1_neighbors', 'k1_stdWeights', 'k1_doremiWeights', 'log_pvalue', 'class_label', 'mean_k1_log_pvalue_stdWeight', 'mean_k2_log_pvalue_stdWeight', 'std_k1_log_pvalue_stdWeight', 'std_k2_log_pvalue_stdWeight', 'skew_k1_log_pvalue_stdWeight', 'skew_k2_log_pvalue_stdWeight', 'kurt_k1_log_pvalue_stdWeight', 'kurt_k2_log_pvalue_stdWeight']


## Generation of Propagation embeddings.

In [6]:
mp_data = data.MomProp(
    moment_embeddings_pkl, node_feature, n_steps, n_hops=max(n_hops),
    moments=moments, edge_weight=edge_weight, path_weight=path_weight
)

The mp_data has different attributes, that is:

* X: the (n_samples, n_features) data matrix
* y: the node-labels
* features: the names of the features.
* samples: the names of the samples.

In [7]:
print(f'Shape of data matrix: {mp_data.X.shape}')

Shape of data matrix: (100, 27)


# Downstream analysis

The resulting data sets can be used for downstream analysis, such as the prediction of the class label in `mp_data.y`. In the paper, the samples were genes, and the features were the log-transformed MuSig $p$-values of the genes. We generated MomProp embeddings to predict the cancer status of the genes.