<a href="https://colab.research.google.com/github/tomasonjo/blogs/blob/master/GDS_Multigraph/GDS%20multigraph.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Countries of the world
* Updated to GDS 2.0 version
* Link to original blog post: https://towardsdatascience.com/analyzing-multigraphs-in-neo4j-graph-data-science-library-35c9b6d20099

In [1]:
!pip install neo4j

Collecting neo4j
  Downloading neo4j-4.4.2.tar.gz (89 kB)
[?25l[K     |███▋                            | 10 kB 19.7 MB/s eta 0:00:01[K     |███████▎                        | 20 kB 11.7 MB/s eta 0:00:01[K     |███████████                     | 30 kB 9.4 MB/s eta 0:00:01[K     |██████████████▋                 | 40 kB 8.6 MB/s eta 0:00:01[K     |██████████████████▎             | 51 kB 4.5 MB/s eta 0:00:01[K     |██████████████████████          | 61 kB 5.3 MB/s eta 0:00:01[K     |█████████████████████████▋      | 71 kB 5.4 MB/s eta 0:00:01[K     |█████████████████████████████▎  | 81 kB 6.0 MB/s eta 0:00:01[K     |████████████████████████████████| 89 kB 3.6 MB/s 
Building wheels for collected packages: neo4j
  Building wheel for neo4j (setup.py) ... [?25l[?25hdone
  Created wheel for neo4j: filename=neo4j-4.4.2-py3-none-any.whl size=115365 sha256=6c51df58ee1c80464d849e1d3c9a4220ade9ca8512ba25124abaa9900d8f2143
  Stored in directory: /root/.cache/pip/wheels/10/d6/28/9502

I recommend you setup a [blank project on Neo4j Sandbox environment](https://sandbox.neo4j.com/?usecase=blank-sandbox), but you can also use other environment versions

In [2]:
# Define Neo4j connections
from neo4j import GraphDatabase
host = 'bolt://3.235.2.228:7687'
user = 'neo4j'
password = 'seats-drunks-carbon'
driver = GraphDatabase.driver(host,auth=(user, password))

def drop_graph(name):
    with driver.session() as session:
        drop_graph_query = """
        CALL gds.graph.drop('{}');
        """.format(name)
        session.run(drop_graph_query)

In [3]:
# Import libraries
import pandas as pd

def read_query(query):
    with driver.session() as session:
        result = session.run(query)
        return pd.DataFrame([r.values() for r in result], columns=result.keys())

In [4]:
# Import the graph

import_query = """
CREATE (t:Entity{name:'Tomaz'}),
       (n:Entity{name:'Neo4j'})
CREATE (t)-[:LIKES{weight:1}]->(n),
       (t)-[:LOVES{weight:2}]->(n),
       (t)-[:PRESENTED_FOR{weight:0.5}]->(n),
       (t)-[:PRESENTED_FOR{weight:1.5}]->(n);
"""
read_query(import_query)

## Relationships without own identity

In the context of the GDS library, relationships without own identity imply that we ignore the type of relationships in the process of projecting the graph.

### Native projection

We will start with native projection examples. If we use the wildcard operator * to define the relationships we want to project, we ignore their type and bundle them all together. This can be understood as losing their own identity (type in the context of Neo4j).

#### Default aggregation strategy

In the first example, we will observe the default behavior of the graph projection process.

In [5]:
default_agg_strategy = """

CALL gds.graph.project('default_agg','*','*',
    {relationshipProperties: ['weight']})

"""

read_query(default_agg_strategy)

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,projectMillis
0,"{'__ALL__': {'label': '*', 'properties': {}}}","{'__ALL__': {'orientation': 'NATURAL', 'aggreg...",default_agg,2,4,80


The default aggregation strategy actually doesn't perform any aggregations and projects all the relationships from the stored graph to memory without any transformations. If we check the relationshipCount, we observe that four relationships have been projected. To double-check the projected graph, we can use the degree centrality.

In [6]:
default_agg_strategy_check = """

CALL gds.degree.stream('default_agg')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, 
       score AS degree
ORDER BY degree DESC

"""

read_query(default_agg_strategy_check)

Unnamed: 0,name,degree
0,Tomaz,4.0
1,Neo4j,0.0


As we expected, all four relationships have been projected. To have a reference for the future let's also calculate the weighted degree centrality. By adding the <code>relationshipWeightProperty</code> parameter, we indicate we want to use the weighted variant of the algorithm.

In [7]:
default_agg_strategy_weight_check = """

CALL gds.degree.stream('default_agg', 
    {relationshipWeightProperty:'weight'})
YIELD nodeId, score 
RETURN gds.util.asNode(nodeId).name AS name,
       score AS weighted_degree ORDER BY weighted_degree DESC
"""

read_query(default_agg_strategy_weight_check)

Unnamed: 0,name,weighted_degree
0,Tomaz,5.0
1,Neo4j,0.0


The result is the sum of weights of all the considered relationships. We have no use of this projected graph anymore, so remember to release it from memory.

In [8]:
drop_graph('default_agg')

#### Single-graph strategy

Depending on the use case, we might want to reduce our multigraph to a single graph during the projection process. This can be easily achieved with the <code>aggregation</code> parameter. We have to use the configuration map variant for the relationship definition.

In [9]:
single_rel_graph = """
CALL gds.graph.project('single_rel_strategy','*', 
    {TYPE:{type:'*', aggregation:'SINGLE'}})

"""

read_query(single_rel_graph)

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,projectMillis
0,"{'__ALL__': {'label': '*', 'properties': {}}}","{'TYPE': {'orientation': 'NATURAL', 'aggregati...",single_rel_strategy,2,1,93


We notice by looking at the <code>relationshipCount</code>, that only a single relationship has been projected. If we want to double-check with the degree centrality:

In [10]:
single_rel_graph_check = """

CALL gds.degree.stream('single_rel_strategy')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name,
       score AS degree
ORDER BY degree DESC

"""

read_query(single_rel_graph_check)

Unnamed: 0,name,degree
0,Tomaz,1.0
1,Neo4j,0.0


In [11]:
drop_graph('single_rel_strategy')

#### Property aggregation strategies

We have looked at the unweighted multigraph so far. Now it is time to look at what happens when we are dealing with a weighted multigraph and we want to reduce it to a single graph. There are three different strategies we can pick for property aggregations:

* MIN: minimum value of all weights is projected
* MAX: maximum value of all weights is projected
* SUM: the sum of all weights is projected

In our next example, we will use the <code>MIN</code> property aggregation strategy to reduce a weighted multigraph to a single graph. By providing the property aggregation parameter, we indicate we want to reduce the stored graph to a single graph in the projection process.

In [12]:
min_agg_strategy = """

CALL gds.graph.project('min_aggregation','*','*',
    {relationshipProperties: {weight: {property: 'weight', aggregation: 'MIN'}}})

"""

read_query(min_agg_strategy)

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,projectMillis
0,"{'__ALL__': {'label': '*', 'properties': {}}}","{'__ALL__': {'orientation': 'NATURAL', 'aggreg...",min_aggregation,2,1,16


We can observe that the <code>relationshipCount</code> is 1, which means our multigraph has been successfully reduced to a single graph. To validate the <code>MIN</code> property aggregation, let's also calculate the weighted degree centrality.

In [13]:
min_agg_strategy_check = """

CALL gds.degree.stream('min_aggregation', 
    {relationshipWeightProperty:'weight'})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, 
       score AS weighted_degree
ORDER BY weighted_degree DESC

"""

read_query(min_agg_strategy_check)

Unnamed: 0,name,weighted_degree
0,Tomaz,0.5
1,Neo4j,0.0


As we expected with the MIN property aggregation strategy, the reduced single weight was the smallest one. Again, as we finished with the example,  don't forget to drop the projected graph.

In [14]:
drop_graph('min_aggregation')

### Cypher projection

Let's recreate the above examples with cypher projection. To lose the identity of the relationships and bundle them all together, we avoid providing the type column in the return of the relationship statement.

#### Default aggregation strategy

Similarly to native projection, the default setting in cypher projection is to project all the relationships without any transformation during the projection process.

In [15]:
cypher_default_agg = """

CALL gds.graph.project.cypher('cypher_default_strategy', 
    'MATCH (n:Entity) RETURN id(n) AS id', 
    'MATCH (n:Entity)-[r]->(m:Entity)
     RETURN id(n) AS source, id(m) AS target')

"""

read_query(cypher_default_agg)

Unnamed: 0,nodeQuery,relationshipQuery,graphName,nodeCount,relationshipCount,projectMillis
0,MATCH (n:Entity) RETURN id(n) AS id,MATCH (n:Entity)-[r]->(m:Entity)\n RETURN ...,cypher_default_strategy,2,4,74


By looking at the relationshipCount, we observe that all four relationships have been projected as intended.To verify the projected graph, we run the degree centrality.

In [16]:
cypher_default_agg_check = """

CALL gds.degree.stream('cypher_default_strategy')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name,
       score AS degree
ORDER BY degree DESC

"""

read_query(cypher_default_agg_check)

Unnamed: 0,name,degree
0,Tomaz,4.0
1,Neo4j,0.0


#### Single relationship strategy

With cypher projection, we don't have access to relationship level aggregation strategies. This is no problem at all as it is very easy to reduce the multigraph to a single graph using only the cypher query language. We simply add the <code>DISTINCT</code> clause in the return of the relationship statement and it should be good to go.

In [17]:
cypher_single_agg = """

CALL gds.graph.project.cypher('cypher_single_strategy',
    'MATCH (n:Entity) RETURN id(n) AS id',
    'MATCH (n:Entity)-[r]->(m:Entity)
     RETURN DISTINCT id(n) AS source, id(m) AS target' )

"""

read_query(cypher_single_agg)

Unnamed: 0,nodeQuery,relationshipQuery,graphName,nodeCount,relationshipCount,projectMillis
0,MATCH (n:Entity) RETURN id(n) AS id,MATCH (n:Entity)-[r]->(m:Entity)\n RETURN ...,cypher_single_strategy,2,1,11


The relationship count is one, which means we have successfully reduced the multigraph. Remember to drop the projected graph.

In [18]:
drop_graph('cypher_single_strategy')

#### Property aggregation strategies

On the other hand, with cypher projection, we do have access to property level aggregation strategies. We don't really "need" them as we can accomplish all the transformation using only cypher. To show you what I mean by that, we can apply the minimum property strategy aggregation using plain cypher like:

In [19]:
cypher_min_agg = """

CALL gds.graph.project.cypher('cypher_min_strategy', 
    'MATCH (n:Entity) RETURN id(n) AS id', 
    'MATCH (n:Entity)-[r]->(m:Entity)
     RETURN id(n) AS source, id(m) AS target, min(r.weight) as weight' )

"""

read_query(cypher_min_agg)

Unnamed: 0,nodeQuery,relationshipQuery,graphName,nodeCount,relationshipCount,projectMillis
0,MATCH (n:Entity) RETURN id(n) AS id,MATCH (n:Entity)-[r]->(m:Entity)\n RETURN ...,cypher_min_strategy,2,1,66


The relationshipCount is 1, which confirms our successful multigraph reduction. Just to make sure, we can run the weighted centrality and validate results.

In [23]:
cypher_min_agg_check = """

CALL gds.degree.stream('cypher_min_strategy',
    {relationshipWeightProperty:'weight'})
YIELD nodeId, score 
RETURN gds.util.asNode(nodeId).name AS name,
       score AS weighted_degree
ORDER BY weighted_degree DESC

"""

read_query(cypher_min_agg_check)

Unnamed: 0,name,weighted_degree
0,Tomaz,0.5
1,Neo4j,0.0


With everything in order, we can release both projected graphs from memory.

In [24]:
drop_graph('cypher_min_strategy')

## Relationships with own identity

We also have the option to retain the type of relationships during the projection process. Among other things, this allows us to perform additional filtering when executing graph algorithms. However, we have to be careful, as projecting relationships with a preserved type is a bit different in the context of multigraphs.

### Native projection

It is simple to declare that we want to preserve the type of relationships with the native projection. All we have to do is specify which relationship types we want to consider and the GDS engine will automatically bundle relationships under the specific relationship type. Let's take a look at some examples to gain a better understanding.

#### Default aggregation strategy

From previous examples we already know that the default aggregation strategy does not perform any transformations. By defining the relationship types we indicate to the GDS library we want to retain their type after the projection process.

In [25]:
default_type = """

CALL gds.graph.project('type_default','*',
    ['PRESENTED_FOR','LIKES','LOVES'])

"""

read_query(default_type)

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,projectMillis
0,"{'__ALL__': {'label': '*', 'properties': {}}}","{'LOVES': {'orientation': 'NATURAL', 'aggregat...",type_default,2,4,69


As expected, the <code>relationshipsCount</code> is 4.

In [26]:
drop_graph('type_default')

#### Single relationship strategy

Like before, we can reduce our unweighted multigraph to a single graph with the relationship level aggregation parameter. We have to provide the aggregation parameter for each relationship type separately.

In [27]:
type_single_agg = """

CALL gds.graph.project('type_single','*',
   {LIKES:{type:'LIKES',aggregation:'SINGLE'},
    LOVES:{type:'LOVES',aggregation:'SINGLE'},
    PRESENTED_FOR:{type:'PRESENTED_FOR',aggregation:'SINGLE'}})

"""

read_query(type_single_agg)

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,projectMillis
0,"{'__ALL__': {'label': '*', 'properties': {}}}","{'LOVES': {'orientation': 'NATURAL', 'aggregat...",type_single,2,3,75


Ok, so we reduced to a single graph, but the relationshipCount is 3. Why is it so? The multigraph reduction process works on the relationship type level and because we have three relationship types, a single relationship for each type has been projected. Let's calculate the degree centrality on the whole in-memory graph.

In [28]:
type_single_agg_check = """

CALL gds.degree.stream('type_single')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name,
       score AS degree
ORDER BY degree DESC

"""

read_query(type_single_agg_check)

Unnamed: 0,name,degree
0,Tomaz,3.0
1,Neo4j,0.0


As we explained, even though we have reduced each relationship type separately, we are still dealing with a multigraph on the whole. When running graph algorithms, you have to pay close attention to whether you are dealing with multigraph or not, have you projected multiple relationship types or just a single one and have you performed any transformations, as all of this will affect the algorithm results. We can now drop this graph.

In [29]:
drop_graph('type_single')


#### Property aggregation strategies

Property aggregation strategies are very similar to before when we were dealing with relationships without identity. The only change is that now the aggregations are grouped by the relationship type.

In [31]:
type_min_agg = """

CALL gds.graph.project('type_min','*',
    ['PRESENTED_FOR','LIKES','LOVES'], 
    {relationshipProperties: {weight: {property: 'weight',
                                       aggregation: 'MIN'}}})

"""

read_query(type_min_agg)

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,projectMillis
0,"{'__ALL__': {'label': '*', 'properties': {}}}","{'LOVES': {'orientation': 'NATURAL', 'aggregat...",type_min,2,3,113


We get 3 relationships projected as we have learned that the aggregations happen on the relationship type level. We will double-check the results with the weighted degree.

In [32]:
type_min_agg_check = """

CALL gds.degree.stream('type_min',
    {relationshipWeightProperty:'weight'})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name,
       score AS weighted_degree
ORDER BY weighted_degree DESC

"""

read_query(type_min_agg_check)

Unnamed: 0,name,weighted_degree
0,Tomaz,3.5
1,Neo4j,0.0


In [33]:
drop_graph('type_min')