{{ message }}

# baglab / NetFUSES Public

NetFUSES - algorithm for combining networks when node identities are ambiguous

Switch branches/tags
Nothing to show

## Files

Failed to load latest commit information.
Type
Name
Commit time

# NetFUSES Network FUsion with SEmantic Similarity (NetFUSES) is a graph fusion algorithm for coalescing networks of distinct sets of nodes into a single network when node identity is unreliable or ambiguous. Graph fusion is a unique problem associated with combining nodes both across and inside of networks. NetFUSES combines nodes using a given similarity function, allowing researchers in different domains to use NetFUSES by introducing their own similarity measures.

## Example

Suppose we have three separate graphs: `G1`, `G2`, and `G3`. Associated with each node in these graphs is a written text entity or "sentence". We can capture the semantics of these natural language text by mapping each entity into a high-dimensional vector space, making a "sentence vector".

Here we can identify and fuse semantically-equivalent nodes across the three graphs using a cosine similarity function based on these sentence vectors:

```import netfuses as nf
import scipy.spatial

def cosine_similarity(u, v):
return 1 - scipy.spatial.distance.cosine(sentence_vectors[u], sentence_vectors[v])

G1, G2, G3 = IMPORT_KNOWLEDGE_GRAPHS(...)
sentence_vectors = IMPORT_SENTENCE_VECS(...)  # maps text entities to sentence embeddings

sentence_fuser = nf.NetworkFuser(cosine_similarity, threshold=0.95)

fused_sentences = sentence_fuser.fuse(G1, G2, G3)
G, node2fuseid = sentence_fuser.collapse(fused_sentences)```

NetFUSES lets us combine nodes based on the semantic information contained within these sentence vectors. Using these knowledge graphs and precomputed vectors, we can invoke NetFUSES to obtain a fused graph `G`, removing redundancies and giving a unified representation of the knowledge embedded within the separate entities.

## To Install

Change directories into the newly cloned repository with:

`cd NetFUSES`

Run:

`pip install --upgrade netfuses`

This will install both `netfuses` and its required dependency, `networkx==1.11`.

If pip is not installed on your machine or you would not like to use it, you can run the following command from inside of the NetFUSES directory:

`python setup.py install`

## Algorithm Overview

The NetFUSES algorithm consists of two stages - fusing and collapsing;

• Fusing draws analogs between nodes across the set of graphs to fuse
• Collapsing takes nodewise analog components and merges them into a single node by taking each member of the component and drawing edges between it and its neighbors. If the neighbor is already contained in the component, a self loop is drawn.

This version is implemented to support different similarity functions, manifested by a parameter to `NetworkFuser`, `simfn`. As described in the docstring, `simfn` takes as input two network nodes u and v, any other keyword arguments necessary.

### Algorithm Usage

NetFUSES is able to fuse a variety of networks. See below for an example.

Consider two directed networks of textual entities in which edges (i, j) exist if and only if there is evidence to show that i causes j. This network however is imperfect in that two nodes i and i' may be semantically identical based on some similarity. We'd like to get a more in depth, denser network structure to further understand causal relationships. Hence, we will apply NetFUSES to combine the two networks.

Here we can define different notions of similarity across text. In one case, we could map each textual entity to sentence embeddings and define similarity by cosine similarity between vector.

```sentence_vectors = IMPORT_SENTENCE_VECS(...)  # maps causal entities to sentence embeddings
import scipy.spatial

def cosine_similarity(u, v):
return 1 - scipy.spatial.distance.cosine(sentence_vectors[u], sentence_vectors[v])```

In another case, if we recognize that semantically identical nodes often times overlap in vocabulary and word placement (i.e, sweating and sweating bullets), we may utilize a version of the Levenshtein distance a method for determining how many moves it takes to change one string to another.

```from jellyfish import levenshtein_distance     # library for levenshtein distance

def levenshtein_similarity(u, v):
return 1 - (1/levenshtein_distance(u, v)```

Here we will apply NetFUSES to three causal attribution subgraphs centered around the word anxiety as shown above.

For the sake of the example we will assume these graphs are stored predefined as variables `wikidata`, `conceptnet`, and `iprnet`.

```import netfuses as nf

wikidata, conceptnet, iprnet = IMPORT_GRAPHS(...)

levenshtein_fuser = nf.NetworkFuser(levenshtein_similarity, threshold=0.95)
sentence_embedding_fuser = nf.NetworkFuser(cosine_similarity, threshold=0.95)

fused_lev = levenshtein_fuser.fuse(wikidata, conceptnet, iprnet)
collapsed_lev, node2fuseid_lev = levenshtein_fuser.collapse(fused_lev)

fused_sent = sentence_embedding_fuser.fuse(wikidata, conceptnet, iprnet)
collapsed_sent, node2fuseid_sent = sentence_embedding_fuser.collapse(fused_sent)

final_fused_lev = nf.convert_graph(collapsed_lev)
final_fused_sent = nf.convert_graph(collapsed_sent)```

## Citation Information

```@article{berenberg2018,
author    = {Daniel Berenberg and James P. Bagrow},
title     = {Inferring the size of the causal universe: features and fusion of causal attribution networks},
journal   = {CoRR},
volume    = {abs/1812.06038},
year      = {2018},
url       = {http://arxiv.org/abs/1812.06038},
archivePrefix = {arXiv},
eprint    = {1812.06038},
}```

NetFUSES - algorithm for combining networks when node identities are ambiguous