# NEExT

### Network Embedding Exploration Tool

NEExT is a tool for exploring and building graph embeddings. This tool allows for:
* Cleansing and standardizing a collection of graph data.
* Creating node and structural features for nodes in the graph collection.
* Creating embeddings for graphs.

### Installation Process
NEExT uses Python 3.x (currently tested using Python 3.11).
You can install NEExT using the following:
```console
pip install NEExT
```

### Graph Data Format
You can use a few different data formats to upload data into NEExT. Currently, it allows for:
* CSV files
* NetworkX Objects (comming soon)
See below for examples of using different data formats.

#### Using CSV Files
Data can be categorized into the following groups:
* Edge File (captures which nodes are connected to which nodes)
* Node Graph Mapping (captures which belongs to which graph)
* Graph Label Mapping [optional] (captures labels for each graph)
* Node Features [optional] (captures the features for each node)

Below we show example of how each of the above files should be formatted:

##### Edge File:
|node_a|node_b|
|---|---|
|1|2|
|3|2|
|.|.|

#### Node Graph Mapping:
|node_id|graph_id|
|---|---|
|0|1|
|1|1|
|2|1|
|3|2|
|4|2|
|.|.|

#### Graph Label Mapping:
|graph_id|graph_label|
|---|---|
|0|0|
|1|0|
|2|1|
|3|0|
|4|1|
|.|.|

#### Node Features:
|node_id|node_feat_0|node_feat_1|...|
|---|---|---|---|
|0|0.34| 3.2| .|
|1|0.1| 2.9| .|
|2|1.9| 1.3| .|
|3|0.0| 2.2| .|
|4|11.2| 12.3| .|
|.|.| .| .|

Note that NEExT can not handle non-numerical features. Some feature engineering on the node features must be done by the end-user.
Data standardization, however, will be done.






# NEExT Tutorial [Getting Started]

In this notebook, we showcase how to use NEExT to analyze graph embeddings.

In [4]:
from NEExT.NEExT import NEExT

The following are link to some graph data, which we will use in this tutorial.
Note that we have Graph Labels in this dataset, which are optional data, for using NEExT. The datasets were genearted using the ABCD Framework found here (https://github.com/bkamins/ABCDGraphGenerator.jl)

## Loading Data

First we deine a path to the datasets. They are `csv` files, with format as defined in the README file.

In [5]:
edge_file = "https://raw.githubusercontent.com/elmspace/ugaf_experiments_data/main/abcd/xi_n/edge_file.csv"
graph_label_file = "https://raw.githubusercontent.com/elmspace/ugaf_experiments_data/main/abcd/xi_n/graph_label_mapping_file.csv"
node_graph_mapping_file = "https://raw.githubusercontent.com/elmspace/ugaf_experiments_data/main/abcd/xi_n/node_graph_mapping_file.csv"

Now we can instantiate a NEExT object.

In [6]:
nxt = NEExT(quiet_mode="on")

You can load data using the `load_data_from_csv` method:

In [7]:
nxt.load_data_from_csv(edge_file=edge_file, node_graph_mapping_file=node_graph_mapping_file, graph_label_file=graph_label_file)

## Building Features

You can now compute various features on nodes of the subgraphs in the graph collection loaded above.<br>
This can be done using the method `compute_graph_feature`. <br>
To get the list of available node features, you can use the function `get_list_of_graph_features`.

In [8]:
nxt.get_list_of_graph_features()

['lsme',
 'self_walk',
 'basic_expansion',
 'basic_node_features',
 'page_rank',
 'degree_centrality',
 'closeness_centrality',
 'load_centrality',
 'eigenvector_centrality']

These are the type of node features you can compute on every node on each graph in the graph collection. <br>
So for example, let's compute `page_rank`. We also need to defined what the feature vector size should be.

In [9]:
nxt.compute_graph_feature(feat_name="page_rank", feat_vect_len=4)

To compute additional features, simply use the same function, and provide the length of the vector size.<br>
Let's add degree centrality to the list of computed features.

In [10]:
nxt.compute_graph_feature(feat_name="degree_centrality", feat_vect_len=4)

## Building Global Feature Object

Right now, we have 2 features computed on every node, for every graph. We can use these features to construct a overall pooled feature vector, which can be used to construct graph embeddings. <br>
To do this, we can pool the features using the `pool_grpah_features` method.

In [11]:
nxt.pool_graph_features(pool_method="concat")

The overall feature (which we call global feature) is a concatenated vector of whatever features you have computed on the graph. In this example it would be a 8 dimensional vector of `page_rank` and `degree_centrality`.<br>
You can access the global vector by using the `get_global_feature_vector` method.

In [12]:
nxt.get_global_feature_vector()

Unnamed: 0,node_id,graph_id,feat_degree_centrality_0,feat_degree_centrality_1,feat_degree_centrality_2,feat_degree_centrality_3,feat_page_rank_0,feat_page_rank_1,feat_page_rank_2,feat_page_rank_3
0,0,0,4.094288,1.632019,1.723672,2.023497,4.014656,1.645432,1.825315,2.003575
1,1,0,2.682074,2.024244,1.689427,2.023497,2.651835,1.999918,1.745939,2.042548
2,2,0,2.682074,1.915292,1.578132,2.120736,2.672592,1.917080,1.696518,2.058271
3,3,0,1.975967,1.993115,2.082671,1.851304,1.968745,1.937933,2.028736,1.879435
4,4,0,1.975967,2.491178,1.355541,2.346133,1.940827,2.407239,1.384500,2.274468
...,...,...,...,...,...,...,...,...,...,...
7495,395,24,-0.853770,-1.468206,-1.059412,-0.963018,-0.829459,-1.399973,-1.030338,-0.951132
7496,396,24,-1.205938,-1.598620,-1.042336,-0.959924,-1.149447,-1.507187,-1.014597,-0.950959
7497,397,24,-1.205938,-1.598620,-0.936739,-1.010254,-1.155132,-1.514255,-0.920724,-0.995089
7498,398,24,-1.205938,-1.511677,-1.031103,-0.995703,-1.159577,-1.429980,-1.003237,-0.983561


## Dimensionality Reduction

We may wish to reduce the number of dimensions of our data, which could help downstream tasks such as Embedding generation or machine learning tasks. This can be done using the `apply_dim_reduc_to_graph_feats`.

In [13]:
nxt.apply_dim_reduc_to_graph_feats(dim_size=4, reducer_type="pca")

If we take a look at the `global feature vector` we can see that it is upaded with the new size of dimension.

In [14]:
nxt.get_global_feature_vector()

Unnamed: 0,node_id,graph_id,emb_0,emb_1,emb_2,emb_3
0,0,0,2.471714,3.577450,0.394070,0.779143
1,1,0,2.232913,1.420164,0.969629,0.912235
2,2,0,2.202837,1.494916,0.809437,1.537148
3,3,0,2.102230,0.403983,0.199739,-0.931054
4,4,0,2.164103,0.202613,2.194223,3.052554
...,...,...,...,...,...,...
7495,395,24,-1.150994,0.263208,-1.147955,0.737321
7496,396,24,-1.258108,-0.154415,-1.594372,0.813288
7497,397,24,-1.245211,-0.174141,-1.731441,0.256937
7498,398,24,-1.243488,-0.198692,-1.372653,0.566846


You still have access to the pre-dimensionality reduction global vector by using the method `get_archived_global_feature_vector`.

In [15]:
nxt.get_archived_global_feature_vector()

Unnamed: 0,node_id,graph_id,feat_degree_centrality_0,feat_degree_centrality_1,feat_degree_centrality_2,feat_degree_centrality_3,feat_page_rank_0,feat_page_rank_1,feat_page_rank_2,feat_page_rank_3
0,0,0,4.094288,1.632019,1.723672,2.023497,4.014656,1.645432,1.825315,2.003575
1,1,0,2.682074,2.024244,1.689427,2.023497,2.651835,1.999918,1.745939,2.042548
2,2,0,2.682074,1.915292,1.578132,2.120736,2.672592,1.917080,1.696518,2.058271
3,3,0,1.975967,1.993115,2.082671,1.851304,1.968745,1.937933,2.028736,1.879435
4,4,0,1.975967,2.491178,1.355541,2.346133,1.940827,2.407239,1.384500,2.274468
...,...,...,...,...,...,...,...,...,...,...
7495,395,24,-0.853770,-1.468206,-1.059412,-0.963018,-0.829459,-1.399973,-1.030338,-0.951132
7496,396,24,-1.205938,-1.598620,-1.042336,-0.959924,-1.149447,-1.507187,-1.014597,-0.950959
7497,397,24,-1.205938,-1.598620,-0.936739,-1.010254,-1.155132,-1.514255,-0.920724,-0.995089
7498,398,24,-1.205938,-1.511677,-1.031103,-0.995703,-1.159577,-1.429980,-1.003237,-0.983561


## Building Graph Embeddings

This function returns a Pandas DataFrame, with the collection features and how they map to the graphs and nodes. <br>
One thing to note is that the data is standardized across all graphs.

We can use the features computed on the graphs to build graph embeddings. To see what graph embedding engines are available to use, we can use the `get_list_of_graph_embedding_engines` function.

In [16]:
nxt.get_list_of_graph_embedding_engines()

['approx_wasserstein', 'wasserstein', 'sinkhornvectorizer']

Now, let's build a 3 dimensional embedding for every graph in graph collection using the Approximate Wasserstein embedding engine. This can be done by using the method `build_graph_embedding`.

In [17]:
nxt.build_graph_embedding(emb_dim_len=3, emb_engine="approx_wasserstein")

You can access the embedding results by using the method `get_graph_embeddings`.

In [18]:
nxt.get_graph_embeddings()

Unnamed: 0,emb_0,emb_1,emb_2,graph_id
0,2.038486,1.463379,0.080776,0
1,0.874913,1.535265,0.47548,1
2,0.02195,0.849217,-0.418307,2
3,-0.72605,0.75047,-0.317739,3
4,-1.313531,0.656964,0.077666,4
5,2.033228,0.316619,-0.397285,5
6,0.855314,0.122401,0.094769,6
7,0.028675,0.177609,-0.180609,7
8,-0.708892,0.214789,-0.117356,8
9,-1.307526,0.197537,0.116854,9


## Visualize Embeddings

You can use the builtin visualization function to gain quick insights into the performance of your embeddings. This can be done by using the method `visualize_graph_embedding`. If you have labels for your graph (like the case here), we can color the embedding distributions using the labels. By default, embeddings are not colored.