# Compute the Moment Propagation Embeddings.

This shows how to retrieve the moment propagation (MomProp) embeddings after the input data have been preprocessed.

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from MomPropEmbeddings import data, utils

Set the pickle-file containing the graph.

In [5]:
basegraph_pkl = './data/basegraph.pkl'

## Create the MomProp Embeddings

To generate MomProp embeddings, different hyperparameters must be set, that is:
1. node_feature: the node feature to use.
2. moments: the moments that should be used to represent a genes neighborhood. This was generated during preprocessing.
3. n_hops: The number of k-hop neighborhoods for which the moments have been generated. This was done in the preprocessing step.
4. n_steps: The number of network propagation steps.
5. edge_weight: type of edge-weights to use ('std' or 'none')
6. path_weight: type of path-weights to use ('mean' or 'max')

In [10]:
node_feature = 'log_pvalue'
moments = ['mean', 'std', 'skew', 'kurt']
n_hops = 2
n_steps = 2
edge_weight = 'std'
path_weight = 'max'

mp_data = data.MomProp(
    basegraph_pkl, node_feature, n_steps, n_hops=n_hops,
    moments=moments, edge_weight=edge_weight, path_weight=path_weight
)

The mp_data has different attributes, that is:

* X: the (n_samples, n_features) data matrix
* y: the node-labels
* features: the names of the features.
* samples: the names of the samples.

In [17]:
print(f'Shape of data matrix: {mp_data.X.shape}')

Shape of data matrix: (100, 27)


# Downstream analysis

The resulting data sets can be used for downstream analysis, such as the prediction of the class label in `mp_data.y`. In the paper, the samples were genes, and the features were the log-transformed MuSig $p$-values of the genes. We generated MomProp embeddings to predict the cancer status of the genes.